Sophie: polyxmass-doc-0.9.0-1mdv2007.0 noarch

polyxmass-doc-0.9.0-1mdv2007.0.noarch.rpm

\chapter[\pxe] {\pxe: A Powerful Simulator} 
\label{chap:polyxedit}

After having completed this chapter you will be able to perform
sophisticated polymer chemistry simulations on polymer sequences
---that can be edited in place--- along with automatic mass
recalculations.

\renewcommand{\sectitle}{\pxe\ Invocation}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

The \pxe module is easily called by pulling down the ``\pxe'' menu
item from the \pxm program's menu. The user may start the \pxe module
by:

\begin{itemize}
\item Ask that a polymer sequence be loaded from disk;
\item Ask that a new polymer sequence be started \textit{ex nihilo}.
\end{itemize}


\renewcommand{\sectitle}{\pxe\ Operation: \textit {In Medias Res}}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-polchemdef-open-def-init-seq-wnd.png}
  \end{center}
  \caption[Initializing a new polymer sequence in
  \pxe]{\textbf{Initializing a new polymer sequence in \pxe} When
    starting a new sequence from scratch, it is necessary to seed the
    program with a number of data that the user is invited to give in
    this window.}
  \label{fig:polyxedit-polchemdef-open-def-init-seq-wnd}
\end{figure}


When starting a new polymer sequence from scratch, the first thing the
program does is to provide the user with a window
(Figure~\vref{fig:polyxedit-polchemdef-open-def-init-seq-wnd}) where the
user is invited to:

\begin{itemize}
\item Select the polymer chemistry definition (\guilabel{Def. Type})
  to be used to interpret the polymer sequence (compulsory datum);
\item Enter a \guilabel{Sequence Name} for the polymer sequence
  (non-compulsory datum);
\item Enter a \guilabel{Sequence Code} for the polymer sequence
  (non-compulsory datum);
\item Choose a file \guilabel{Name} for the polymer sequence file.
\end{itemize}

\noindent Once all the data have been selected/entered, then the user
clicks onto the \guilabel{Validate} button and the program open an
empty sequence window as shown on
Figure~\vref{fig:polyxedit-seqeditor-empty}.

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-seqeditor-empty.png}
  \end{center}
  \caption[An empty \pxe window]{\textbf{An empty \pxe window} This
    figure shows an empty \pxe window, waiting for the user to either
    paste a sequence from the clipboard or edit one from the
    keyboard.}
  \label{fig:polyxedit-seqeditor-empty}
\end{figure}

At this point, when the user starts editing a sequence, the characters
entered at the keyboard, or pasted from the clipboard, will be
interpreted using the polymer chemistry definition that was selected
in the initialization window described above.

Now, of course, editing a polymer sequence is not enough for a mass
spectrometric-oriented software suite; what we want is to compute
masses! When the \pxm software program is started, the window
displaying the masses of the sequence being edited is not displayed.
Go to the main menu of the program and select the item
\guimenu{\pxe}\guimenuitem{View} and activate the checkbutton menu
\guilabel{Display Masses Window}.


\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-masses-display-wnd.png}
  \end{center}
  \caption[The window displaying the masses]{\textbf{The window
      displaying the masses} This figure shows the window that
    displays masses for the currently edited polymer sequence. As can
    be seen the identity of the polymer sequence is shown along with
    masses computed for the sequence. }
  \label{fig:polyxedit-masses-display-wnd}
\end{figure}

The window that displays the masses for the currently edited polymer
sequence is show in Figure~\vref{fig:polyxedit-masses-display-wnd},
where the reader can see that two different types of masses are
displayed: 

\begin {itemize}
\item \guilabel{Whole Sequence} These are the monoisotopic and average
  masses computed for the whole polymer sequence;
\item \guilabel{Selection} These are the monoisotopic and average
  masses computed for the selected portion of the polymer sequence;
\end{itemize}

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-calc-engine-options-wnd.png}
  \end{center}
  \caption[Configuring the mass calculation
  engine]{\textbf{Configuring the mass calculation engine} This figure
    shows the detail in which the mass calculation engine can be
    configured. See the text for details.}
  \label{fig:polyxedit-calc-engine-options-wnd}
\end{figure}


As the user can see, the protein sequence that we did initialize
earlier is empty (the only visible item is the cursor), and the masses
displayed correspond to an empty protein. But if there is no polymer
sequence, then how come \textit{nihil} weighs some 19~mass~units? Well
that's because we still have to show how polymer sequence masses are
computed: by adding the masses of each monomer in the sequence, that's
for sure. But also ---depending on the configuration set by the
user--- on other parameters.
Figure~\vref{fig:polyxedit-calc-engine-options-wnd} shows to what
extent the way masses are computed can be configured. The window that
is shown in this figure was shown as a result of right-clicking in the
polymer sequence editor, selecting ---from the contextual menu that
pops up--- the \guimenu{View}\guimenuitem{Calc. Options} menu.

We'll review the different items in this window:

\begin{itemize}
\item \guilabel{Sequence Name}: This entry widget holds the name of
  the polymer sequence for which the mass computations are being
  configured;
\item \guilabel{ID Number}: Unambiguous identification of the polymer
  sequence (this is useful in case the same identical polymer sequence
  file is loaded twice in \pxe since this ID number will differ);
\item \guilabel{Left Capped}: If checked, the left cap of the polymer
  definition corresponding to this polymer sequence will be taken into
  account when computing masses
\item \guilabel{Right Capped}: Same as for \guilabel{Left Capped} but
  for the right end of the polymer;
\item \guilabel{Account Left End Modif}: If checked, take into account
  the modification that might be set to the left end of the polymer
  sequence;
\item \guilabel{Account Right End Modif}: If checked, take into
  account the modification that might be set to the right end of the
  polymer sequence;
\item \guilabel{Monomer -- Account Modifs}: If checked, take into
  account the chemical modifications that might be set to monomers in
  the polymer sequence (or selection portion of it);
\item \guilabel{Ionization Rules -- Actform}: What action-formula to
  apply to the polymer sequence when ionization is computed;
\item \guilabel{Ionization Rules -- Unitary Charge}: What is the
  charge that is brought by the action-formula mentioned above;
\item \guilabel{Ionization Rules -- Level}: How many times the polymer
  sequence should be ionized according to the two data elements above.
\end{itemize}

\noindent The fact that the user can specify ionization rules should
make it clear that the masses that are displayed are actually
$\mathrm{\frac{m}{z}}$ ratios, as long as one ionization occurs\dots\
Also, note that the masses that are displayed in the window shown in
Figure~\vref{fig:polyxedit-masses-display-wnd}, are updated
automatically anytime something ``ponderable'' happens with the
polymer sequence (\guilabel{Whole Sequence} masses) or anytime the
cursor is moved in the sequence (this is equivalent to selecting from
the beginning of the sequence up to the cursor point) or a selection
is modified (\guilabel{Selection} masses).

\bigskip For the moment that should be enough. Let's delve more into
the capabilities of the \pxe module of the \pxm mass spectrometric
software suite.


\renewcommand{\sectitle}{\pxe The Polymer Sequence Menu}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

There are two menus available to the user in the polymer sequence
editor window. The first menu is a conventional menu sitting on top of
the sequence editor window. The second menu pops up when the user
right-clicks onto the sequence-displaying area onto a monomer icon.
The general rule of thumb is rather simple: whenever a menu item
allows to perform an action onto a specific sequence graphical
rendering item (I mean a specific sequence as displayed in a specific
canvas), the menu to explore first is the popup menu. Conversely, if
the action to be triggered more about the sequence itself, and less
about its actual graphical rendering, then the menu to explore first
is the main window menu.

The sequence editor window main menu comprises the items described
below:

\begin{itemize}
  %%%%%%%%%%%%%%%%%%%%%%%% 
\item \guimenu{File}
  \begin{itemize}
  \item \guimenuitem{Save}\dots\ Save the polymer sequence;
  \item \guimenuitem{Save As}\dots\ Save the polymer sequence with a new
    name;
  \item \guimenuitem{Close}\dots\ Close the polymer sequence;
  \end{itemize}
  %%%%%%%%%%%%%%%%%%%%%%%% 
\item \guimenu{Edit}
  \begin{itemize}
  \item \guimenuitem{Polymer Sequence Properties}\dots\ Edit the
    polymer sequence properties, such as sequence name, sequence code,
    for example. Note that the annotation process will let you enter
    as many notes as required to the polymer sequence;
  \end{itemize}
  %%%%%%%%%%%%%%%%%%%%%%%% 
\item \guimenu{View}
  \begin{itemize}
  \item \guimenuitem{Calc. Options}\dots\ View/Modify the way
    calculations are performed, be them mass calculations or elemental
    composition calculations;
  \end{itemize}
  %%%%%%%%%%%%%%%%%%%%%%%% 
\item \guimenu{Chemistry}
  \begin{itemize}
  \item \guimenuitem{Cleave}\dots\ Open a window so that a polymer
    sequence can be cleaved;
  \item \guimenuitem{Fragment}\dots\ Open a window so that a polymer
    sequence can be fragmented;
  \item \guimenuitem{Compositions}
    \begin{itemize}
    \item \guimenuitem{Elemental}\dots\ Open a window so that options
      can be set for the program to compute the elemental composition of
      the polymer sequence or a region of it;
    \item \guimenuitem{Monomeric}\dots\ Open a window so that options
      can be set for the program to compute the monomeric composition of
      the polymer sequence or a region of it;
    \end{itemize}
  \item \guimenuitem{pKa-pH-pI}
    \begin{itemize}
    \item \guimenuitem{(Re)Load The Data}\dots\ Ask that the
      \filename{acidobasic.xml} file be read (or re-read) from disk;
    \item \guimenuitem{Calculations}\dots\ Open a window so that options
      can be set for the program to compute the charges of the polymer
      sequence (or its isoelectric point);
    \end{itemize}
  \item \guimenuitem{m/z Ratio Calculations}\dots\ Open a window to
    perform m/z ratio calculations;
  \item \guimenuitem{Search Mass(es)}\dots\ Open a window so that
    options can be set for the program to search arbitrary oligomers
    in the polymer sequence that have the same mass as the one(s)
    searched for.
  \end{itemize} 
  %%%%%%%%%%%%%%%%%%%%%%%% 
\item \guimenu{Reports}
  \begin{itemize}
  \item \guimenuitem{Make Reports}\dots\ Open the window management
    facility to let the user choose windows to make reports of their
    contents;
  \item \guimenuitem{Report Options}\dots\ Configure the way reports
    are prepared.
  \end{itemize}
\end{itemize}


\vspace{\baselineskip}

\noindent Note that each action undertaken as the response to choosing
one menu item is performed onto the polymer sequence being edited in
the polymer sequence editor from which the menu was selected.

\vspace{\baselineskip}

\noindent When the user right-clicks onto a monomer icon, an
\guilabel{Edit} contextual menu pops-up that has the following menu
structure:

\begin{itemize}
  %%%%%%%%%%%%%%%%%%%%%%%% 
\item \guimenu{Edit}
  \begin{itemize}
  \item \guimenuitem{Copy}\dots\ Copy to the clipboard the currently
    selected sequence;
  \item \guimenuitem{Cut}\dots\ Copy to the clipboard the currently
    selected sequence and remove it from the sequence;
  \item \guimenuitem{Paste}\dots\ Paste the sequence from the
    clipboard to the current location of the cursor in the polymer
    sequence editor;
  \item \guimenuitem{Find Replace}\dots\ Extremely flexible
    Find/Replace functionality;
  \item \guimenu{Annotation}
    \begin{itemize}
    \item \guimenuitem{Monomer}\dots\ Edit (add/remove/modify) the notes
      for the monomer lying below the cursor when the menu was elicited;
    \item \guimenuitem{Polymer}\dots\ Edit (add/remove/modify) the
      notes for the polymer being edited in the polymer sequence
      editor;
    \end{itemize}
  \item \guimenuitem{List Completions}\dots\ Show the list of available
    monomer code completions according to what is already typed in the
    sequence editor and the monomer codes defined in the polymer
    chemistry definition;
  \item \guimenu{Select All}\dots\ Selects the whole sequence in the
    polymer sequence editor;
  \end{itemize}
  %%%%%%%%%%%%%%%%%%%%%%%% 
\item \guimenu{Chemistry}
  \begin{itemize}
  \item \guimenuitem{Monomer Modifications}\dots\ Open a window so
    that a monomer (or any combination of monomers) can be modified or
    unmodified;
  \item \guimenuitem{Polymer Modifications}\dots\ Open a window so
    that the polymer sequence can be modified or unmodified either
    on its left end or on its right end (or both);
  \end{itemize}
  %%%%%%%%%%%%%%%%%%%%%%%% 
\item \guimenuitem{Self Read Sequence To File}\dots\ Write to file a
  configurable list of sound files to play the sequence aloud;
\end{itemize}


\renewcommand{\sectitle}{Editing Polymer Sequences}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

As we have seen in the \pxd module, the user may stipulate that a
polymer chemistry definition allows more than one character in order
to define the codes of the different monomers of this same polymer
chemistry (see section~\vref{sect:monomers}). Remember that it is not
because the number of allowed characters is \cfgval{3}, for example,
that all your monomer codes must be defined using three characters. 
\cfgval{3} is the \emph{max} number of characters that you may use. 
This means that you are perfectly entitled, in this case, to have
single-character or bi-character monomer codes in this polymer
chemistry definition. Let's start by looking at how the polymer
sequence editor window behaves when the user tries to enter
multi-character monomer codes. Next, we'll see that whatever the
length of a monomer code, if its very first character is unambiguous,
the behaviour of the polymer sequence editor is flexible and powerful. 

\subsection*{Multi-Character Monomer Codes}

\begin{figure}
  \begin{center}
    \includegraphics[scale=1.75]
    {figures/raster/polyxedit-multi-char-code-editing.png}
  \end{center}
  \caption[Multi-character code sequence editing in
  \pxe]{\textbf{Multi-character code sequence editing in \pxe.} This
    figure shows the process by which it is made possible to edit
    polymer sequences with a code set that allows more than one
    character per code.} 
  \label{fig:polyxedit-multi-char-code-editing}
\end{figure}

In this section we will describe the editing of a polymer sequence for
which monomers can be described using more than one character. 

The Figure~\vref{fig:polyxedit-multi-char-code-editing} shows the case
of a polymer sequence that is of a polymer chemistry definition that
allows three characters to define monomer codes. Let's now assume that
the user wants to edit the sequence by insertion ---at the cursor
point--- of a new monomer ``Aspartate'', of which the user knows only
that its code starts with an `A'. The cursor is located between the
two ``Ala'' monomers at positions 15 and 16 (panel~1). 

The user keys-in \kbdKey{A} (panel~2). To her dismay, nothing happens
in the polymer sequence, but she sees an `A' character now displayed
in the left text widget under the label \guilabel{Editing Feedback}. 
The reason why we have this behaviour is related to the fact that we
are allowed up to 3 characters to describe a monomer code. If no
monomer icon is displayed in the polymer sequence, that may simply
mean that more than one monomer code start with an `A' character: \pxe
cannot figure out which monomer code the user actually means when
keying-in \kbdKey{A}. 

There is a way, called \emph{code completion}, to know which monomer
code(s) ---in the current polymer chemistry definition--- do start
with the keyed-in character(s) (`A' for us, now). The user can always
enter the \emph{code completion mode} by hitting the tabulation
\kbdKey{TAB}~key.  This is what is shown in the panel~A. We see that,
in the current polymer chemistry definition, four monomer codes start
with an `A' character, and these are ``Ala'', ``Arg'', ``Asp'' and
``Asn''. We could be selecting the monomer of choice by
double-clicking onto the proper list item, which would insert the
corresponding monomer icon (``monicon'') in the polymer sequence at
the cursor location.  But, since this is a manual, we are going
through another step. 

Let's continue editing the polymer sequence and key-in a \kbdKey{s}
(we did not forget that we wanted to enter an ``Asp'' monomer code in
the first place, did we?). The result is shown in panel~3. What we see
here is that, this time also, nothing changed in the polymer sequence. 
What changed is that there is now a ``As'' character string in the
left text widget under the label \guilabel{Editing Feedback}.  Let's
key-in once more the \kbdKey{TAB}~key, and we get the small window
show in panel~B. This time, only two items are listed: ``Asp'' and
``Asn''. This is easy to understand: there are only two monomer codes
that start with the two letters `A' and `s' (``As'') that we have
keyed-in so far.  At this time, we either select one of the items (we
wanted to enter the ``Aspartate'' monomer, so we'll double-click onto
the first item of the list), or we just key-in a last character:
\kbdKey{p}. At this point, the monomer is effectively inserted in the
polymer sequence, as the ``Asp'' monomer left of the cursor, as shown
in panel~4. 

\subsection*{Unambiguous Single-/Multi-Character Monomer Codes}

Let's imagine that we have a polymer chemistry definition that allows
up to 3 characters for the definition of monomer codes, but that we
have one monomer code (let's say the one for the ``Glutamate''
monomer) that is `E'. This monomer code `E' is the only one of the
polymer chemistry definition that starts (and ends, since it is
mono-character) with an `E'. In this case, when we key-in \kbdKey{E},
we'll observe that the monomer code is immediately validated and that
its corresponding monomer icon is also immediately inserted in the
polymer sequence.  This is because, \emph{if there is no ambiguity,
  \pxe will immediately validate the code being edited}.  This means
that you are absolutely free to define \emph{only single-character
  monomer codes} in your polymer chemistry definition, so that you are
not even conscious that the powerful multi-character feature exists! 
Indeed, in this 1-character monomer code configuration, each time
you'll key-in an uppercase character, you'll be inserting its
corresponding monomer into the polymer sequence immediately. 

\subsection*{Displaying All The Available Monomer Codes}

Equally interesting is the fact that if you key-in the
\kbdKey{TAB}~key while no monomer code is being edited (that is: the
left text widget under the label \guilabel{Editing Feedback} is
empty), all the monomer codes available in the polymer chemistry
definition currently in use are displayed, exactly as shown in the
panel~C, Figure~\vref{fig:polyxedit-multi-char-code-editing}. 


\subsection*{Erroneous Monomer Codes}

Let's see now what happens when the user keys-in bad characters in the
polymer sequence editor window. This is described in the
Figure~\vref{fig:polyxedit-bad-lowercase-char-code}. If the user enters
a lowercase character as the first character of a monomer code, the
program immediately complains in the right text widget under the label
\guilabel{Editing Feedback}. In this case, the monomer code is not put
into the left text widget, which means it is simply ignored. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2.5]
    {figures/raster/polyxedit-bad-lowercase-char-code.png}
  \end{center}
  \caption[Bad code character in \pxe\ sequence editor]{\textbf{Bad
      code character in \pxe\ sequence editor.} This figure shows the
    feedback that the user is provided by the code editing engine,
    when a bad character code is keyed-in.} 
  \label{fig:polyxedit-bad-lowercase-char-code}
\end{figure}

If the user starts keying-in valid monomer character codes, like for
example we did earlier with ``As'', and that she wants to erase these
characters because she changed her mind, she \emph{must not} use the
\kbdKey{BACKSPACE} key, because this key will erase the monomer left
of the cursor point in the polymer sequence! The way that the user has
to remove the characters currently displayed in the left text widget
under the label \guilabel{Editing Feedback}, is to key-in the
\kbdKey{Esc} key once for each character. For example, let's say I've
already keyed-in \kbdKey{A} and \kbdKey{s}. In this case the left text
widget, under label \guilabel{Editing Feedback}, displays these two
characters: ``As''. Now, \emph{I change my mind} and do not want to
enter the ``Asp'' monomer code anymore. I want to enter the ``Gly''
code. All I have to do is key-in the \kbdKey{Esc} key once for the `s'
character (which disappears) and once more to remove the remaining `A'
character which disappears also. At this point I can start fresh with
the ``Gly'' monomer code by keying-in sequentially \kbdKey{G},
\kbdKey{l} and finally \kbdKey{y}. 



\renewcommand{\sectitle}{Clipboard-Importing Of Sequences}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

Very often, the user will make a sequence search on the web and be
provided with a polymer sequence that is crippled with non-code
characters. The user typically selects all the text provided by the
remote site, pastes that sequence in the \pxm\ polymer sequence editor
window and finally encounters invalid codes in it. It might be
uncomfortable to have to trigger ---prior to pasting a correct
sequence in \pxe--- a text editor only to ``purify'' that
sequence\dots\ 

\pxm provides a convenient way to spot non-valid characters from a
polymer sequence and to let the user ``purify'' the imported sequence.
A clipboard-imported sequence is systematically parsed. When invalid
characters are found the window depicted in
Figure~\vref{fig:polyxedit-check-import-sequence} is presented to the
user for her to make appropriate adjustments. The sequence is
presented to the user in a textview widget (\guilabel{Imported
  Sequence}) with the improper characters tagged in red color. The
rationale for tagging characters in red colour is by comparing the
imported sequence with the monomer codes available in the current
polymer chemistry definition. As soon as a character does not
correspond to any valid monomer code, it is tagged in red. At that
point, if the user clicks onto the \guilabel{Remove All Tagged}
button, all the red-tagged characters will be automatically removed.

Also, the user is provided with an automatic ``purification''
procedure whereby it is possible to remove one or more classes of
characters from the imported sequence (\guilabel{Remove Characters}
frame widget). Checking one or more of the \guilabel{Digits} or
\guilabel{Punctuation} or \guilabel{Space} checkbuttons, or even
entering other user-specified characters in the \guilabel{Other} text
entry widget, will elicit their removal from the imported sequence
after the user clicks the \guilabel{Purify Sequence} button.

\begin{figure}
  \begin{center}
    \includegraphics[scale=2.5]
    {figures/raster/polyxedit-check-import-sequence.png}
  \end{center}
  \caption[Clipboard-imported sequence
  error-checking]{\textbf{Clipboard-imported sequence error-checking.}
    If a sequence that is imported through the clipboard to the \pxe\
    sequence editor contains invalid characters, the user is provided
    with a facility to ``purify'' the sequence. This facility is
    provided to the user through the window depicted in this figure.}
  \label{fig:polyxedit-check-import-sequence}
\end{figure}

When the user is confident that almost all the erroneous characters
have been removed, she can click the \guilabel{Check Sequence} button,
which will trigger a ``re-reading'' of the sequence in the
\guilabel{Imported Sequence} textview widget. If erroneous characters
are still found, they are presented to the user in red color.

Note that, for maximum flexibility, the user is allowed an immediate
and direct editing of the imported sequence in the textview widget
(that is, the textview widget is \emph{not} read-only).

Once the sequence if finally depured from all the invalid characters,
the user can select it in the textview on the left of the window and
can paste it in the \pxe\ sequence editor. This time, the paste
operation will be error-free.


\renewcommand{\sectitle}{Importing Of Sequences As Raw Text Files}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

It might be of interest to be able to import a sequence from a raw
file. To this end, the user is provided a menu item
\guimenu{Edit}\guimenuitem{Import Raw Text} from the contextual menu
of the sequence editor widget (available by right-clicking on the
polymer sequence editor region). Using that menu, the user will be
provided a file selection window from which to choose the file to
import. The program then iterates in the lines of that file and checks
their content for validity. If errors are found, then the same process
as described earlier for clipboard-imported sequences is started. The
user can then purify the sequence imported from the file and finally
integrate that sequence in the polymer sequence currently edited. Note
that if any sequence portion is currently selected, it will be
replaced by the one that is being imported.


\renewcommand{\sectitle}{Sequence Selections: The Various X Mechanisms}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

As any text editor, the \pxe\ polymer sequence editor can perform the
usual clipboard operations. In the \software{X window} world, there is
another process to copy text and paste it into another place: the
\software{X window} primary selection mechanism. That process is easy:
text is first selected (either using the keyboard or the mouse; that
makes the \emph{selection}), and when that selected text needs to be
pasted, the user just clicks the mouse's middle button at the
destination location. The copy/cut/paste process, much usual in the
\OSname{MS Windows} system, is implemented also. Thus, the users of
\pxe\ get the best features of selection and pasting. 

When the user tries to paste a sequence element from the clipboard
(say, after copying it from a web browser), the program checks that
sequence very thoroughly. If an invalid character is found, the whole
process is stopped with a message logged to the console; the sequence
is not modified in any way and the user may verify that sequence so
that she removes the invalid characters or codes. 

When the users copies/cuts a sequence from the \pxe\ sequence editor
window to the clipboard, what is actually copied in the clipboard is a
text string that is made with all the monomer codes of the polymer
sequence that was selected the copying/cutting operation was
performed. 


\renewcommand{\sectitle}{Visual Feedback In The Editor}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\begin{figure}
  \begin{center}
    \includegraphics[scale=2.5]
    {figures/raster/polyxedit-editor-visual-feedback.png}
  \end{center}
  \caption[Visual feedback in the \pxe\ sequence
  editor]{\textbf{Visual feedback in the \pxe\ sequence editor.}  This
    figure shows the feedback that the user is provided when moving
    the mouse cursor over monomer icons. See the text for details.} 
  \label{fig:polyxedit-editor-visual-feedback}
\end{figure}

The polymer sequence editor provides a number of widgets to inform in
real time the user about what is going on in it. These widgets are
briefly reviewed below, and the user is invited to look at
Figure~\vref{fig:polyxedit-editor-visual-feedback}:

\begin{itemize}
\item The \guilabel{Un/Modified} label informs the user if the polymer
  sequence was modified or not since it was either last written to a
  file on disk or last read from a file;
\item The monomer status flag (here it is red-green-red) is supposed
  to inform the user about the status of the monomer onto which the
  mouse cursor is positioned (in the image example, that is monomer
  `S', at position 22). The flag is interpreted in the following
  manner:
  \begin{itemize}
  \item The first flag element (red in the example) tells if the
    monomer contains properties. That is a flag about the internal
    status of the monomer. This flag is mainly interesting to the
    power user who goes in the source code and modifies it to adapt it
    to her specific needs.  Red means that the monomer has at least
    one ``prop'' object in it.  Green means that it has no such
    ``prop'' in it. If this flag element is green, then the two
    remaining flags are necessarily green.  This is because the two
    other flag elements tell the presence or the absence of monomer
    characteristics that are subsets of the ``prop'' object;
  \item The second flag element (green in our example) tells if the
    monomer has been \emph{annotated} at least once. The green color
    indicates that no note is found in the monomer. That flag would be
    red if the monomer had been annotated at least once;
  \item The third flag element (red in our example) tells if the
    monomer has undergone a \emph{chemical modification}. In our
    example that flag is red, because as the reader can see, the `S'
    monomer at position 22 is indeed modified: it is a phosphorylated
    seryl residue! If the monomer had not been modified, then that
    flag element would have been green. 
  \end{itemize}
\item The label that is located left of the monomer status flag (it
  indicates \guival{8} on the figure) tells the sequence position of
  the monomer onto which the cursor is positioned at any given
  time\footnote{The cursor is not visible because the screen dump
    function in \software{The Gimp} removes it to clean the image.}. 
\end{itemize}

\renewcommand{\sectitle}{Sequence Annotation: The Various Mechanisms}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

The annotation of polymer sequences is very often required in projects
for which a number of scientist-made observations are to be
``connected'' in a time-lasting manner either to a polymer sequence
(as a whole object \textit{per se}) or to any monomer in a polymer
sequence. 

\pxe\ allows the annotation of the whole polymer and/or of any (and
any number) of monomers in the polymer sequence. There is no
limitation on the number of notes that can be set to the polymer or
any given monomer. Further, the user is provided with two mechanisms
by which she can set notes to monomers (annotate): \emph{single-mode}
monomer annotation and \emph{range-mode} monomer annotation. All these
polymer/monomer note-setting processes are described in detail below. 

First, however, I should tell you, respected reader, that a note is
basically an envelope that contains a number of elements:
\begin{itemize}
\item A textual element that is the \textit{name of the note};
\item Any number of paired data, called \textit{noteval} objects (like
  ``note value''). A noteval is made of two data:
  \begin{itemize}
  \item A datum describing the type of the noteval: either
    \textit{string}, or \textit{integer}, or \textit{double};
  \item The contents of the noteval object. 
  \end{itemize}
\end{itemize}

\noindent The notes are stored in the polymer sequence file and are easily
managed graphically, as we'll describe now. 

\subsection*{Managing Polymer Notes}

The user may set/modify/remove polymer notes using the following
contextual menu:

\centerline{\guimenu{Edit}\guimenuitem{Annotation}\guimenuitem{Polymer}}

The Figure~\vref{fig:polyxedit-note-editing-center-polymer} shows the
window that pops up to let the user perform a number of note-related
actions that are rather self-explanatory. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2.33]
    {figures/raster/polyxedit-note-editing-center-polymer.png}
  \end{center}
  \caption[Annotating polymer sequences]{\textbf{Annotating polymer
      sequences.} This figure shows the graphical interface to the
    annotation of polymer sequences.} 
  \label{fig:polyxedit-note-editing-center-polymer}
\end{figure}

A note that is set to a polymer sequence is set to that sequence as a
whole, and not to any specific monomer or monomer range. If all the
monomers in the annotated polymer sequence were removed, that (empty)
polymer sequence would still bear the annotation. In order to add
notes, the user must first fill-in the note~\guilabel{Name} field. 
Once this field is filled, the user clicks the \guilabel{Add New Note}
button. The note name will be listed in the \guilabel{Name} column of
the \guilabel{Notes Already Set} treeview. 

It is only once a note (name) has been added, as described above, that
the user can add notevalue objects to that note. Remember, we said
earlier that a note was made of a name and of any number of [value
type+value contents] noteval pairs. The note, or one of its noteval
objects, has to be selected in the treeview on the left hand side of
the window, so that the user can add a noteval object, by:
\begin{itemize}
\item Choosing the type of the value (string, integer or double) by
  selecting the radiobutton of choice in the \guilabel{Type} widget;
\item Entering the data proper in the \guilabel{Contents} textview
  widget;
\item Clicking onto the \guilabel{Add New Value} button. 
\end{itemize}

Accomplishing the tasks above will create a new subitem in the
treeview: a new noteval object will be listed under the node
corresponding to the note name under which a new noteval
[type-contents] pair has been defined. 

It is possible to change the note name of a note that is selected in
the treeview or to change the type or contents of a noteval object
that is currently selected in the treeview. Most intuitively, these
changes are done by editing the data in their respective widgets, and
then clicking either \guilabel{Apply Note Changes} or \guilabel{Apply
  Value Changes}. 

It is also possible to remove any item that is currently selected in
the treeview. The menu entitled \guilabel{Notes Specific Actions} will
popup when clicked, to show the menu items shown on the
Figure~\vref{fig:polyxedit-note-editing-center-notes-menu-single}. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-note-editing-center-notes-menu-single.png}
  \end{center}
  \caption[The menu governing actions on note items]{\textbf{The menu
      governing actions on note items.} This figure shows the menu
    that the user may use in order to remove any item currently
    selected in the treeview. When the window is opened in
    single-mode, the range-mode actions are inactive.} 
  \label{fig:polyxedit-note-editing-center-notes-menu-single}
\end{figure}


\bigskip

Setting notes to the polymer sequence as a whole is conceptually
simpler than what we are about to visit: the annotation of any monomer
in either single-mode or range-mode. 


\subsection*{Managing Monomer Notes}



As stated earlier, monomer notes can be set in two
distinct modes: \emph{single-mode} and \emph{range-mode}. Setting
notes to a monomer is as easy as setting notes to a polymer sequence. 
However, before starting doing any annotation work, it should be
understood what kind of note is appropriate for the specific
annotation task.  Let's first see the simplest mode of monomer
annotation: \emph{single-mode}. 

\subsubsection*{Managing Monomer Notes In Single-Mode}

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-note-editing-center-monomer-single.png}
  \end{center}
  \caption[Annotating monomers in single-mode]{\textbf{Annotating
      monomers in single-mode.} This figure shows the graphical
    interface to the annotation of monomers in single-mode.} 
  \label{fig:polyxedit-note-editing-center-monomer-single}
\end{figure}

If the annotation pertains to a single monomer in the
sequence,\footnote{Like indicating that this specific residue is
  polymorphic, for example, or entering any kind of comment.} the user
should hit the corresponding monomer icon with the mouse and
right-click onto it so that the following menu item can be selected
out of the contextual menu that pops up:

\centerline{\guimenu{Edit}\guimenuitem{Annotation}%
\guimenuitem{Monomer}\guimenuitem{Single}}

The precise mouse-clicking of that specific monomer icon will trigger
internal calculations that will lead to the proper initialization of
the popped up window, as shown in
Figure~\ref{fig:polyxedit-note-editing-center-monomer-single}, where
the \guilabel{Ref. Monomer Code/Pos.} label indicates \guival{F/15}. 
That example means that the user wanted to annotate a phenylalanine
residue located at position 15 of the polymer (protein) sequence. 
Note, by the way, that the \guilabel{Range} label indicates no
specific value (\guival{-$\;$-}). We'll see later that this bit of
information is useful in other cases. 

Once the window shown in that example is displayed, the managing of
monomer notes is identical to the managing of polymer notes (as was
previously described). 


\subsubsection*{Managing Monomer Notes In Range-Mode}

Sometimes it is desirable to be able to set an identical note to a
range of consecutive monomers. For example, one user might want to set
to a range of residues in a protein a note (with a name
\textsl{``TRYPSIN''} and a number of notevalue objects describing
scientific observations (either text or numerical) and interrogations,
for example). That note will be set in each monomer of the range of
monomers. Once the range-mode annotation has been performed, each note
in each monomer will behave exactly the same way as notes set using
the \emph{single-mode} annotation procedures.  See
Figure~\vref{fig:polyxedit-note-editing-center-monomer-range} for a
good example of such note. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-note-editing-center-monomer-range.png}
  \end{center}
  \caption[Annotating monomers in range-mode]{\textbf{Annotating
      monomers in range-mode.} This figure shows the graphical
    interface to the annotation of monomers in range-mode.} 
  \label{fig:polyxedit-note-editing-center-monomer-range}
\end{figure}

So, how are range-mode annotations actually carried out by the
program?  The very first thing is to select --in the polymer sequence
editor-- the range of monomers to be annotated. Once that range of
monomers is effectively selected, the user can mouse-click with the
right button one specific monomer, in that range of selected monomers. 
In order to elicit the displaying of a window like the one represented
in Figure~\vref{fig:polyxedit-note-editing-center-monomer-range}, the
user must select the following menu item from the contextual menu:

\centerline{\guimenu{Edit}\guimenuitem{Annotation}%
\guimenuitem{Monomer}\guimenuitem{Range}}

As can be seen on that figure, this time the \guilabel{Range} label
gives an indication in the form \guival{[xx->yy]}. This means that the
user wanted to edit a note for all the monomers comprised in that
range (from position \guival{xx} to position \guival{yy}). That makes
a range-mode annotation action that is taken on three monomers. 

One interesting question is: ---\textsl{``Given the fact that the user
  is performing a range-mode annotation, to which monomer do belong
  the notes shown in the \guilabel{Notes Already Set} list on the left
  hand side of the window?''} That's undoubtedly a good question. The
answer is that the notes that are listed there belong to the
\emph{reference monomer}, that is the monomer that was actually
pointed while right-clicking the sequence (to elicit the popping up of
the contextual menu). This \emph{reference monomer} is very important,
as we'll see in a moment. 

The Figure~\vref{fig:polyxedit-note-editing-center-monomer-range}
shows that range-mode annotations are performed much like monomer
single annotations or polymer annotations (same window, in fact, with
same widgets). The big difference comes with the notes menu, that
lists menu items that are specific to the \emph{range-mode} actions
(Figure~\vref{fig:polyxedit-note-editing-center-monomer-range}):
\begin{itemize}
\item The menu item \guimenuitem{Remove Item (Range)} will remove the
  selected item (note item) from all the monomers in the range;
\item The menu item \guimenuitem{Propagate Item (Range)} will make a
  copy of a newly created note into all the other monomers in the
  range. Note that, if a note by the same name exists already in any
  of the monomers in the range, the note is not added to it. The user
  will be informed by a dialog window that a given monomer was
  skipped. 
\end{itemize}

Note that the single-mode menu item (\guilabel{Remove Item (Single)}
will perform the action, when in range-mode, on the reference monomer,
that is the one that was right-clicked upon when the note editing
process was triggered (see above for the definition of the
\emph{reference monomer}. 

\begin{center}
\noindent\fbox{\parbox{0.9\textwidth}{It is important to grasp that in
    the range-mode annotations, when an action cannot be performed in
    one of the monomers in the selected range of monomers, then this
    does not prevent the process from trying to accomplish the task on
    the other monomers of the range.  For example, the user selects a
    stretch of twenty monomoners in a polymer sequence, and then
    elicits a range-mode annotation process (namely the addition of a
    note) onto these twenty monomers. Let's say that the to-be-added
    note is identical to a note present in the fifth monomer of the
    monomer range. The note addition --for this monomer-- is going to
    fail.  That does not mean that the whole process is stopped: if
    the to-be-added note is not found identical in any other monomer,
    it is going to be successfully added into all the remaining
    monomers.  In other words, one failure does not abort the whole
    range-mode annotation process.}} 
\end{center}

\bigskip

Without bothering the reader with more descriptions, I would suggest
that she experiments with the features described here. The design has
been conceived as the most flexible possible. Notheworthy is that
flexibility sometimes goes with risky programmatic behaviours: the
user must know what she does when clicking onto a button! The
\guilabel{Save As} menu item is your friend \emph{before}
experimenting that annotation feature. 


\renewcommand{\sectitle}{Chemically Modifying Polymer Sequences}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

It very much often happens that the (bio)~chemist uses chemical
reactions to modify the polymer sequence she is working on. Mass
spectrometry is then often used to check if the reaction proceeded
properly or not. Further, in nature, chemical modifications of
biopolymer sequences are very often encountered. For example, protein
sequences get often modified as a means to regulate their function
(phophorylations, namely). Nucleic acid sequences are very often and
extensively modified with modifications such as methylation\dots

It is thus crucial that \pxm\ be able to model with high precision and
flexibility the various chemical reactions that can be either made in
the chemistry lab or found in nature. The \pxm\ program provides two
different chemical modification processes:

\begin{itemize} 
\item A process by which monomers in the polymer sequence can be
  individually modified;
\item A process by which the whole polymer sequence can be modified,
  either on its left end or on its right end or even on both ends. 
\end{itemize}

\noindent We shall review these two processes separately in the two sections
below. 

\subsection*{Chemical Modification Of Monomers}

\subsubsection*{Modification Of Monomers}

There are a number of manners in which monomers can be modified in a
polymer sequence. The Figure~\vref{fig:polyxedit-monomer-modif} shows
the simplest manner: the user first selects the monomer icon to
modify, next calls the
\guimenu{Chemistry}\guimenuitem{Modifications}\guimenuitem{Monomer}
menu and --as a result-- is provided with a window where all the
modifications currently available in the polymer chemistry definition
are listed. Since a monomer icon was initially selected in the editor
window, the \guilabel{Selected Monomer} target radiobutton is on by
default. It is then simply a matter of choosing the right modification
from the \guilabel{Available Modifications} list and clicking onto the
\guilabel{Modify} button. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-monomer-modif.png}
  \end{center}
  \caption[Modification of a monomer in a polymer 
  sequence]{\textbf{Modification of a monomer in a polymer sequence.} 
    This figure shows the graphical rendering of a phosphorylation of
    a seryl residue in a protein polymer sequence.} 
  \label{fig:polyxedit-monomer-modif}
\end{figure}

The modified seryl residue is shown in the polymer sequence editor
window: a transparent graphics object (a red `P') was overlaid onto
the corresponding seryl monicon. 

While the \guilabel{Modification Target(s)} frame widget contains
radiobuttons the signification of which is rather easy to understand,
we want to detail one of these: the \guilabel{Specific Monomer
  Locations} frame. If the user selects the radiobutton inside that
specific frame (labelled \guilabel{Positions Should Be Separated With
  ';'}), she also has to write the locations in the text entry widget
below it.  This text entry widget receives textual strings that should
describe what locations on the polymer sequence should be modified. 
The syntax of the descriptive string allows logical positions to be
indicated.  The user is invited to experiment, maybe using variations
on the themes described below as examples:

\begin{itemize}
\item \guival{ALL} That would mean that the currently selected
  modification in the \guilabel{Available modifications} list is to be
  applied to all the monomers in the polymer sequence. This is equal
  to selecting the radiobutton labelled \guilabel{All Monomers};
\item \guival{EVEN} or \guival{even} This will modify all monomers at
  even positions: 2, 4, 6\dots
\item \guival{ODD} or \guival{odd} This will modify all monomers at
  odd positions: 1, 3, 5\dots
\item \guival{EVEN;ODD} is identical to \guival{ALL};
\item \guival{[1-10];[20-30,odd]} This will modify all the monomers
  from position~1 to position~10 inclusive, and all the odd-positioned
  monomers between position~20 and position~30 inclusive;
\end{itemize}

\noindent The user is responsible for correctly reading the results
that are published in the paned textview lying between the upper pane
(labelled \guilabel{Monomer Modification Rules}) and the two buttons
at the bottom of the window. Further, when a modification or
un-modification is performed, the count of successful events and of
failed events is displayed in the messages' text widget at the very
bottom of the window. The messages that are displayed in this widget
are not permanent, they last some seconds and disappear. Care should
be taken at what is displayed in this messages' text widget. 

\medskip

\begin{center}
\noindent\fbox{\parbox{0.9\textwidth}{Attention should be paid to the
    fact that the user is responsible for applying chemical
    modifications to monomers that are listed as modifiable with the
    modification used. For example, if a phosphorylation modification
    is applied to a monomer that is not listed as phosphorylatable in
    the relevant configuration file, then the modification is applied
    to it (which means that --internally-- the monomer is modified)
    but its corresponding monicon is not graphically changed because
    no graphical rule is associated with the phosphorylation of this
    monomer (see section~\vref{subsect:monicons.dic}, the file
    of interest is \filename{monicons.dic}).}} 
\end{center}

\medskip

\noindent It is important to understand that, when a monomer is
modified, its previous modification (if any) is overwritten with the
new one. The user is invited to experiment a bit with the monomer
modification process, so as to be confident of the results that she is
going to obtain when real polymer chemistry work is to be modelled in
\pxm. 


\subsubsection*{Un-Modification Of Monomers}

If a monomer is modified, then it also should be possible to revert
the chemical reaction: to un-modify it. There is, however, a subtlety
here, that we ought to put into the limelight: an example will do. 

Let's say that all the seryl residues of our protein polymer sequence
are phosphorylated.\footnote{That's protein chemistry stuff.}  Only
seryl residues are phosphorylated in this polymer sequence. We thus
see all their corresponding monicons overlaid with a small `P' on them
(see the example above). Other monomers are acetylated, like lysyl
residues, for example. What we want to do is un-modify all the
phosphorylated seryl monomers in one go. We thus open the monomer
modification window, select the monomer code corresponding to the
seryl residue in the \guilabel{Monomers} list, select the rabiobutton
labelled \guilabel{Monomers From The List}, we select
``Phosphorylation'' in the \guilabel{Available Modifications} list and
finally we click the \guilabel{Unmodify} button. All the seryl
residues currently phosphorylated are un-modified. This is OK. 

Now, let's assume that we had not selected ``Phosphorylation'' in the
list of available modifications, but ``Acetylation'', for example: no
phosphorylated seryl residue would have been un-modified. This is a
foolproof feature: if you select a modification name from the list of
available modifications, and next click onto the \guilabel{Unmodify}
button, that means that your un-modifying action has --as targets--
monomers that are currently modified with the modification that you
selected. 

That means that if, in our example, you had selected, as monomer
targets to the un-modification, the \guilabel{All Monomers}
radiobutton, selected the ``Phosphorylation'' modification and clicked
onto the \guilabel{Unmodify} button, \emph{only} the phosphorylated
monomers\footnote{Whatever they be, because the \guilabel{All
    Monomers} radiobutton was selected.} would have been un-modified. 

Now, if you un-select all the items in the list of available
modifications\footnote{You may need to maintain the \kbdKey{Ctrl} key
  pressed while clicking onto the currently selected item to unselect
  it.}, that you select the \guilabel{All Monomers} radiobutton and
next click onto the \guilabel{Unmodify} button, then you'll un-modify
absolutely \emph{all} the monomers, because you are not restricting
the monomer targets neither by their code, neither by the identity of
their potential modification. 

\bigskip

The user is encouraged to play with these features\dots\ Also of great
importance is to understand that the modifications that can be set to
the monomers do disappear when the monomer is removed from the polymer
sequence. These modifications are \emph{monomer modifications}, they
belong to the monomer that is modified. We say that these
modifications are \emph{intrinsic}. 


\subsection*{Chemical Modification Of The Polymer Sequence}

We have seen above that it is possible to modify any monomer in the
polymer sequence and that when the modified monomer is removed, the
modification associated to it disappears also. 

The modifications that we describe here are not of this kind. They can
be applied to either the left end of the polymer sequence or its right
end.  But these modifications do belong to the polymer sequence
\textit{per se} and are not removed from it even if the polymer
sequence is edited by removing the left end monomer or the right end
monomer.  We say that these \emph{polymer modifications} are
\emph{permanent}. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-polymer-modif.png}
  \end{center}
  \caption[Modification of the left end of a polymer
  sequence]{\textbf{Modification of the left end of a polymer
      sequence} This figure shows how simple it is to permanently
    modify a polymer sequence on either or both its left/right ends. 
    The permanent modifications currently set to a polymer sequence
    are conveniently listed in two text widgets located under the
    polymer sequence rendering area.} 
  \label{fig:polyxedit-polymer-modif}
\end{figure}

The way in which a polymer sequence is modified using \emph{polymer
  modifications} is much easier than the previous \emph{monomer
  modifications} case. The modification window is opened by choosing
the
\guimenu{Chemistry}\guimenuitem{Modifications}\guimenuitem{Polymer}
menu or the \guilabel{Edit} button below the polymer sequence
rendering area. The Figure~\vref{fig:polyxedit-polymer-modif} shows
that window. 

The modification is absolutely easy to perform, with a clear feedback
provided to the user (by listing the permanent modifications in two
convenient text widgets located under the polymer sequence graphical
rendering area, under label \guilabel{Left and Right Ends'
  Modifications}). In the example
(Figure~\vref{fig:polyxedit-polymer-modif}), the top polymer sequence
is not yet modified. By using the window on the right, the polymer
sequence is modified on its left end using the ``Acetylation''
modification. The newly modified polymer sequence is shown in the
window below, with the left text widget displaying the name of the
left end modification. 

The \guilabel{Unmodify} button is responsible for the un-modification
of the selected polymer sequence end (left/right), so that reverting a
modification is perfectly feasible. 


\renewcommand{\sectitle}{Finding and Replacing Sequence Motifs}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\label{sect:find-replace}

It is very much often the case that one wants to find a given sequence
motif quickly. \pxm allows this easily by selecting in the contextual
menu the following menu item:\\

\centerline{\guimenu{Edit}\guimenuitem{Find Replace}}
\medskip

Using that menu item will provide an options window, as described in
Figure~\vref{fig:polyxmass-find-replace-options-wnd}.

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxmass-find-replace-options-wnd.png}
  \end{center}
  \caption[Find/Replace options window]{\textbf{Find/Replace options
      window.}  This figure shows the window with which the user is
    provided when she performs a polymer sequence find/replace
    operation. The two sequence editing regions are full blown
    sequence editor widgets in which the user edits sequence motifs
    exactly the same way she edits a sequence in the polymer sequence
    editor. This allows for flexible find/replace operations.}
  \label{fig:polyxmass-find-replace-options-wnd}
\end{figure}

What is interesting with this
Figure~\vref{fig:polyxmass-find-replace-options-wnd} is that it shows
how flexible the functionality is: the user has two sequence editor
widgets at hand. The left one \guilabel{Find Motif} is where the motif
to find should be entered. The right one \guilabel{Replace Motif} is
where the motif to be used in order to replace the found motif is
edited. As visible on the right hand widget, the monomers entered in
these two widgets might be modified (by chemical modification) or
annotated (by monomer annotation) exactly in the same way as the user
is used to do in the polymer sequence editor. The sequence editor
widgets in Figure~\vref{fig:polyxmass-find-replace-options-wnd} are
actually \emph{the same} as the ones that are located in the polymer
sequence editor windows.

Let us see some of the available options: \\

\begin{itemize} 
\item \guilabel{Start At Point} The find operation will not start from
  the very first monomer in the polymer sequence, but at the position
  where the cursor is located (\emph {the point});

\item \guilabel{Backward} Normally, the find operation is performed
  downstream of the current location; thus the next found motif will
  necessarily occur at positions in the polymer sequence greater
  than the current. With this option, however, it is possible to
  reverse the direction of the search. \guilabel{Backward} instructs
  the search engine to look for motifs in the upstream sequence with
  respect to the current location ; thus any found motif will be at
  a position lesser than the current position;

\item \guilabel{Matching Strictness (M1 and M2 matching rules)}
  These matching rules will govern the way monomers in the polymer
  sequence are considered as matching the monomers in the
  \guilabel{Find Motif} motif sequence or how stringent the
  replacement using \guilabel{Replace Motif} should be:

  \begin{itemize}
  \item The \guilabel{Find} matching rules:
    \begin{itemize}
    \item \guilabel{M1 Identical To M2}: \guilabel{M1} is a given
      monomer in the polymer sequence and \guilabel{M2} is a monomer
      in the \guilabel{Find Motif} motif sequence; both monomer are
      being compared, and will be considered to actually match only
      if both are absolutely \emph{identical};
    \item \guilabel{M2 Is Subset of M1}: \guilabel{M1} is a given
      monomer in the polymer sequence and \guilabel{M2} is a monomer
      in the \guilabel{Find Motif} motif sequence; both monomer are
      being compared, and will be considered to actually match if
      all the modification and/or note(s) present in M2 are found in
      M1, \emph{even if} M1 might contain other modification and/or
      note(s);
    \end{itemize}

  \item The \guilabel{Replace} matching rules:
    \begin{itemize}
    \item \guilabel{New Identical To M2}: \guilabel{New} is the
      monomer that will be in the polymer sequence after the
      replacement is performed and \guilabel{M2} is the monomer from
      the \guilabel{Replace Motif} sequence that was used to guide
      the replacement process; the new monomer will be identical to
      M2;
    \item \guilabel{New Superset Of M2}: \guilabel{New} is the
      monomer that will be in the polymer sequence after the
      replacement is performed and \guilabel{M2} is the monomer from
      the \guilabel{Replace Motif} sequence that was used to guide
      the replacement process; upon replacement all the modification
      and/or notes from M2 will be present in the \guilabel{New}
      monomer, but if the original monomer in polymer sequence had
      modification and/or notes not present in M2, then these will
      be retained; thus, \guilabel{New} will be a superset of
      \guilabel{M2};
    \end{itemize}
    
  \end{itemize}

\end{itemize}

\noindent It is obvious that the \guilabel{Replace Motif} sequence might be
empty when performing Find or Replace operations.

The way Replace operations are performed is sequential: first the user
clicks onto the \guilabel{Find} button. If a sequence element is found
to match the \guilabel{Find Motif} sequence it is selected in the
polymer sequence editor window. At this time the user might click onto
the \guilabel{Replace} button. Once the replacement is performed, the
search engine is automatically asked to find a new occurrence of the
\guilabel{Find Motif} sequence, and so on\dots\

The user is invited to experiment with the series of options described
above as these render the operations rather flexible.

\renewcommand{\sectitle}{Cleavage Of Polymer Sequences}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\label{sect:cleave-polymer-sequences}

It happens very often that polymer sequences get cleaved in a
sequence-specific manner. These specific cleavages do occur very often
in nature, and are made by enzymes that do cleave biopolymer
sequences, like the glycosidases (cleaving saccharides), the proteases
(cleaving proteins) or the nucleases (cleaving nucleic acids). But the
scientist also uses purified enzymes to perform such cleavages in the
test tube.  \pxm\ must be able to perform thoses cleavages \textit{in
  silico}.  Let's see how a polymer sequence can be cleaved using
\pxm. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-cleave-options.png}
  \end{center}
  \caption[Cleavage options window]{\textbf{Cleavage options window.} 
    This figure shows the window with which the user is provided when
    she performs a polymer sequence cleavage. The user can select one
    cleavage specification and specify what level of partial cleavage
    the chemical cleavage should perform.} 
  \label{fig:polyxedit-cleave-options}
\end{figure}

It is a matter of having a polymer sequence opened in an editor window
and selecting the \guimenu{Chemistry}\guimenuitem{Cleave} menu. The
user is provided with a window where a number of cleavage
specifications are listed (Figure~\vref{fig:polyxedit-cleave-options}). 
These cleavage specifications are listed by looking into the polymer
chemistry definition corresponding to the polymer sequence to be
cleaved. The program knows, for example, that the polymer sequence to
be cleaved is of the ``protein'' chemistry type, and thus will list
all the cleavage specifications that were defined in the ``protein''
polymer chemistry definition. The cleavage specifications are
available for the user to select one of them to perform the cleavage. 

The user selects the cleavage specification of interest and also sets
the number of partial cleavages that the cleaving agent may yield. In
our example, \cfgval{2} was entered, which means that the cleavage
reaction will yield the set of oligomers corresponding to a total
cleavage (no missed cleavages=partial cleavages 0) along with the set
of oligomers corresponding to 1 missed cleavage and to 2 missed
cleavages. The calculating process is extremely rapid, so the user may
enter rather high values here. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-cleave-results-wnd-seq-tab.png}
  \end{center}
  \caption[Cleavage-generated oligomers
  window]{\textbf{Cleavage-generated oligomers window.} This figure
    shows the window that is opened so that the oligomers generated
    upon cleavage of a polymer sequence can be displayed. Other data
    are also displayed (see text for details).} 
  \label{fig:polyxedit-cleave-results-wnd-seq-tab}
\end{figure}

Upon successful termination of the cleavage reaction, the user is
provided with a new window
(Figure~\vref{fig:polyxedit-cleave-results-wnd-seq-tab}) in which all
the oligomers that were generated are listed (upper pane). The
listview widget on the upper pane sports a number of columns. Each row
of this listview widget describes the properties of a single
oligomer. The different columns are detailed below:

\begin{itemize}
\item \guilabel{Part. Cleav.} This is the missed cleavage level for
  which the oligomer was generated;
\item \guilabel{Number} This is the number of the oligomer, so that
  the user may refer to it simply. The syntax is simple:
  p\emph{x}-n\emph{y} means that this oligomer is the oligomer number
  \emph{y} from the set of oligomers obtained in the \emph{x}-missed
  cleavages series;
\item \guilabel{Coordinates} These are the coordinates of the oligomer
  as it is occurring in the polymer sequence that was cleaved in the
  first place. For example, ``[19-38]'' would mean that the oligomer
  starts at position~19 and ends at position~38 of the polymer
  sequence, both values being inclusive;
\item \guilabel{Mono Mass} This is the monoisotopic mass of the
  oligomer, computed using the options that are set in the
  \guilabel{Calculation Options} window (see above);
\item \guilabel{Avg Mass} Same as above, but for the average mass;
\item \guilabel{Modified} Indicates if the oligomer contains an
  intrinsically-modified monomer (it does not mean that the
  modification's mass was taken into account, it simply says that at
  least one monomer is modified in the oligomer. See below for
  details). 
\end{itemize}

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-cleave-results-oligodata-tab.png}
  \end{center}
  \caption[Cleavage-generated oligomers'
  data]{\textbf{Cleavage-generated oligomers' data.} This figure shows
    the notebook tab in which data pertaining to a selected oligomer
    are displayed. In particular, this tab contains a listview where
    monomer modifications of the selected oligomer (if any) are
    displayed.} 
  \label{fig:polyxedit-cleave-results-oligodata-tab}
\end{figure}

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-cleave-results-cleavedata-tab.png}
  \end{center}
  \caption[Cleavage specification data]{\textbf{Cleavage specification
      data.} This figure shows the notebook tab in which data
    pertaining to the cleavage operation are displayed.} 
  \label{fig:polyxedit-cleave-results-cleavedata-tab}
\end{figure}

\noindent The lower pane of the \guilabel{Cleavage Results} window
contains a number of additional data, displayed in a set of pages
belonging to the \guilabel{Selected Oligomer Data} notebook widget:

\begin{itemize}
\item \guilabel{Sequence}
  (Figure~\vref{fig:polyxedit-cleave-results-wnd-seq-tab}) This is the
  sequence that is displayed when an oligomer is selected in the
  listview displaying the oligomers (in the upper pane);
\item \guilabel{Oligomer Data}
  (Figure~\vref{fig:polyxedit-cleave-results-oligodata-tab}) This is
  the place where monomer modifications are listed as soon as an
  oligomer that contains modified monomers is selected in the
  listview. Note that each modified monomer in the selected oligomer
  will show up as a row in this listview. 
\item \guilabel{Cleavage Data}
  (Figure~\vref{fig:polyxedit-cleave-results-cleavedata-tab}) This is
  the place where the cleavage operation configuration is reported, so
  that each cleavage results' displaying window is self-traceable to
  both the cleavage configuration and the polymer sequence that was
  cleaved in the first place. 
\end{itemize}

The button labelled \guilabel{Find} will allow the user to find masses
in the oligomers that were generated upon the cleavage reaction
simulation (see section~\vref{sect:find-masses-in-results})



\renewcommand{\sectitle}{Fragmentation Of Polymer Sequences}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\label{sect:fragmentation-polymer-sequence}

It happens very often that polymer sequences need to be fragmented in
the gas phase (in the mass spectrometer) so that structure
characterizations may be performed. For protein chemistry, this
happens very often in order to get sequence information for a given
peptide ion selected in the gas phase. \pxm\ must be able to perform
those fragmentations \textit{in silico}.  Let's see how a polymer
sequence can be fragmented using \pxm. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-fragment-options.png}
  \end{center}
  \caption[Fragmentation options window]{\textbf{Fragmentation options
      window.}  This figure shows the window with which the user is
    provided when she performs a polymer sequence fragmentation. The
    user can select one or more fragmentation specifications
    (patterns).} 
  \label{fig:polyxedit-fragment-options}
\end{figure}

It is a matter of having a polymer sequence opened in an editor window
and selecting the sequence region to be fragmented. Once this is done,
the user selects the \guimenu{Chemistry}\guimenuitem{Fragment} menu. 
The user is provided with a window where a number of fragmentation
specifications are listed
(Figure~\vref{fig:polyxedit-fragment-options}). These fragmentation
specifications are listed by looking into the polymer chemistry
definition corresponding to the polymer sequence to be fragmented. The
program knows, for example, that the polymer sequence to be cleaved is
of the ``protein'' chemistry type, and thus will list all the
fragmentation specifications that were defined in the ``protein''
polymer chemistry definition. 

The user selects the fragmentation specification(s) of interest and
clicks the \guilabel{Fragment} button. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-fragment-results.png}
  \end{center}
  \caption[Cleavage-generated oligomers
  window]{\textbf{Fragmentation-generated oligomers window.} This
    figure shows the window that is opened so that the oligomers
    generated upon fragmentation of a polymer sequence can be
    displayed.} 
  \label{fig:polyxedit-fragment-results}
\end{figure}

Upon successful termination of the fragmentation reaction, the user is
provided with a new window
(Figure~\vref{fig:polyxedit-fragment-results}) in which all the
oligomers that were generated are listed (upper pane). The listview
widget on the upper pane sports a number of columns. Each row of this
listview widget describes the properties of a single oligomer. The
different columns are detailed below:

\begin{itemize}
\item \guilabel{Frag. Spec.} This is the name of the fragmentation
  specification that was used to compute the corresponding fragment;
\item \guilabel{Name} This is the name of the oligomer, so that the
  user may refer to it simply. The syntax is simple: \emph{x}-\emph{y}
  means that this oligomer is the oligomer number \emph{y} from the
  fragmentation specification \emph{x};
\item \guilabel{Mono Mass} This is the monoisotopic mass of the
  oligomer, computed using the options that are set in the
  \guilabel{Calculation Options} window (see earlier explanations);
\item \guilabel{Avg Mass} Same as above, but for the average mass;
\item \guilabel{Modified} Indicates if the oligomer contains an
  intrinsically-modified monomer (it does not mean that the
  modification's mass was taken into account, it simply says that at
  least one monomer is modified in the oligomer. See below for
  details). 
\end{itemize}

The \guilabel{Sequence}, \guilabel{Oligomer Data} and
\guilabel{Fragmentation Data} pages of the notebook in the
\guilabel{Selected Oligomer Data} frame widget are conceptually
identical to the ones described at the
section~\vref{sect:cleave-polymer-sequences}). 

The button labelled \guilabel{Find} will allow the user to find masses
in the oligomers that were generated upon the fragmentation reaction
simulation (see section~\vref{sect:find-masses-in-results}). 


\renewcommand{\sectitle}{Finding Masses In The Results}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\label{sect:find-masses-in-results}

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-fragres-mass-find-options.png}
  \end{center}
  \caption[Finding masses in a set of oligomers]{\textbf{Finding
      masses in a set of oligomers.} This figure shows how to ask that
    masses be found in a set of oligomers that result, for example,
    from the fragmentation of a polymer sequence.} 
  \label{fig:polyxedit-fragres-mass-find-options}
\end{figure}

It is often necessary to make sure that a mass --observed in the real
mass spectrum-- actually corresponds to an oligomer that was generated
during a previous simulation experiment (like a cleaving of the
polymer sequence with a given cleavage agent or a fragmentation of a
simple mass searching operation --see
section~\vref{sect:search-masses-polymer-sequence}). To allow this,
and as shown in
Figures~\vrefrange{fig:polyxedit-cleave-results-wnd-seq-tab}%
{fig:polyxedit-fragment-results}, it is possible to ask that masses be
found into the oligomers resulting from any previous simulation
(cleavage or fragmentation of a polymer sequence or arbitrary mass
search operations). Indeed, the button labelled \guilabel{Find} will
open a window where the user may enter masses to be found. 

The Figure~\vref{fig:polyxedit-fragres-mass-find-options}
illustrates how easy it is to defines the mass(es) to be found in a
set of oligomers, either in the monoisotopic mass list or in the
average mass list.  There are two ways to actually trigger the mass
finding operation:

\begin{itemize}
\item When the \guilabel{Unique Mass Find Mode} checkbutton \emph{is}
  checked: the user must enter one mass in the single-line text entry
  widget and hitting the \guilabel{Find} button or the \kbdKey{ENTER}
  issues the ``Find Mass'' request. For this to happen properly, it is
  necessary that only one of the two single-line text entry widgets be
  filled with a mass (either monoisotopic or average). This is because
  if there are two masses entered in the widgets, the program would
  not know which one of the monoisotopic or average masses is to be
  found in the set of oligomers. 
\item When the \guilabel{Unique Mass Find Mode} checkbutton is
  \emph{not} checked: the user may enter masses in whatever the
  single- or multi-line widgets (either by keying-in one mass per line
  or by pasting a preformatted list of masses). In the present case,
  hitting the \kbdKey{ENTER} key will trigger the ``multi-mass'' mass
  finding operation only if the \guilabel{Find} button has the focus. 
  A click onto the \guilabel{Find} button will do! 
\end{itemize}

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-fragres-mass-find-options-tolerances.png}
  \end{center}
  \caption[Tolerances available in finding masses]{\textbf{Tolerances
      available in finding masses.} This figure shows the three
    different ways that tolerances can be configured.} 
  \label{fig:polyxedit-fragres-mass-find-options-tolerances}
\end{figure}

Prior to asking that masses be found, it is required that tolerances
be entered for either monoisotopic or average masses (or both if both
kinds of masses are of interest) in their respective text entry
widget. In the example of
Figure~\vref{fig:polyxedit-fragres-mass-find-options}, the tolerance
that is given to the mass finding operation on monoisotopic masses is
of \cfgval{0.1}~amu, while the one for the average masses is greater
(\cfgval{1}~amu). These values must be understood in a ``broad''
manner (\emph{i.e.}~$\pm$~tolerance): for example, if we searched for
a mass \cfgval{1000} with a \cfgval{0.5}~amu tolerance, we would get
all the oligomers having masses ranging [$\mathrm
{1000-0.5\,\rightarrow \,1000+0.5}$] (which is [999.5--1000.5]
\emph{and not [999.75--1000.25]}). The
Figure~\vref{fig:polyxedit-fragres-mass-find-options-tolerances}
shows that there are two other means to define the tolerance with
which masses should be found. They all are self-explanatory and should
also be understood in the same ``broad'' manner described above. 

The oligomers that were found to comply with the masses to find and
with the tolerances defined are displayed in a window similar to the
one shown in Figure~\vref{fig:polyxedit-fragres-mass-find-results}. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-fragres-mass-find-results.png}
  \end{center}
  \caption[Finding masses in a set of oligomers]{\textbf{Finding
      masses in a set of oligomers.} This figure shows oligomers that
    were found in a set of oligomers after a mass finding operation
    has been performed.} 
  \label{fig:polyxedit-fragres-mass-find-results}
\end{figure}

Note that here also the traceability of the data is ensured using
unambiguous identity numbers (\guilabel{Results' Set ID Number}). This
identity number is unique and describes the results window in which
the user has asked that masses be found (see
Figure~\vref{fig:polyxedit-fragres-mass-find-options}). 


\renewcommand{\sectitle}{Searching Masses In The Polymer Sequence}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\label{sect:search-masses-polymer-sequence}

It may happen that the scientist needs to know if some polymer
sequence region would have a given mass. \pxm\ allows for mass
searching operations in the polymer sequence. This is done by using
the menu \guimenu{Chemistry}\guimenuitem{Search Mass(es)}. The window
illustrated in Figure~\vref{fig:polyxedit-search-mass-options} shows up
and the user enters masses to search for (see
section\vref{sect:find-masses-in-results} for details on the workings
of a very similar window). 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-search-mass-options.png}
  \end{center}
  \caption[Searching masses in a a polymer sequence]{\textbf{Finding
      masses in a polymer sequence.} This figure shows how to ask that
    masses be searched in a polymer sequence.} 
  \label{fig:polyxedit-search-mass-options}
\end{figure}

Once the masses have been searched, if results are found they are
displayed in the window shown in
Figure~\vref{fig:polyxedit-search-mass-results}. This window has very
similar characteristics to the ones of the previously described
results' windows (see section~\vref{sect:cleave-polymer-sequences},
for example). 


\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-search-mass-results.png}
  \end{center}
  \caption[Results window after searching masses in a a
  polymer sequence]{\textbf{Results window after searching masses in a
      a polymer sequence.} This figure shows the oligomers that were
    found upon a mass search operation.} 
  \label{fig:polyxedit-search-mass-results}
\end{figure}

The button labelled \guilabel{Find} will allow the user to find masses
in the oligomers that were generated upon the mass searching operation
(see section~\vref{sect:find-masses-in-results}). 



\renewcommand{\sectitle}{The acido-basic calculations: pH, pI and charges}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\label{sect:acido-basic-calculations}

When preparing biochemical experiments, very often users need to know
how many charges a given polymer sequence will bear at any given pH. 
Equally important is the ability to know at which pH value the polymer
sequence will have a net charge near to zero. The pH value for which a
given polymer sequence has a net charge near to zero (typically this
means that the number of positive charges equals the number of
negative charges) is called the isoelectric point ---the pI. 

Such computations are pretty computer-intensive and require a very
precise knowledge of the chemical structure of the different monomers
that take part in the definition of the polymer chemistry. A file,
called \filename{acidobasic.xml} is located in the polymer chemistry
definition directory. This file lists all the chemical groups that are
possibly charged; each monomer of the polymer definition is
represented by a \verb|<mnm>| element in which data are defined for
any chemical group of that monomer that might bear a charge at any
given pH. You can find the listing of the \filename{acidobasic.xml}
file in chapter\vref{chap:appendices}.  We'll discuss any aspect of
this file's contents in the next sections with enough detail that the
user will be able to write one such file for her specific polymer
chemistry. 

At the moment, two entities in the polymer chemistry definition might
have chemical groups bearing charges: monomers and modifications. 
We will first review monomers, and modifications next. 

\subsection*{Monomers might have ionized chemical group(s)}

\subsubsection*{Some theory first}



Monomers are the building blocks of polymer sequences. These blocks
must have at least two reactive groups so that they can be polymerized
into a polymer sequence thread. Reactive groups are often chargeable
groups; for example, the amino group of amino-acids is such that it
gets protonated (positively charged) at a pH inferior to its pKa (that
is a physiological pH). Similarly, the carboxylate group ---that is
the other reactive group of amino-acids--- is charged at physiological
pH: it is in its carboxylate form (that is singly negatively charged;
$\rm COO^-$) instead of being in its carboxylic form (that is
non-charged; $\rm COOH$). 

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.9]
    {figures/raster/protein-monomers-acidobasic.png}
  \end{center}
  \caption[Different pKa values for a number of amino-acids' chemical
  groups]{\textbf{Different pKa values for a number of amino-acids'
      chemical groups.} All of the twenty amino-acids are represented
    here, which each amino-acid's lateral chain fully represented. 
    Above each chemical group ---for which the value makes sense from
    a biological perspective--- the pKa value is indicated.} 
  \label{fig:protein-monomers-acidobasic}
\end{figure}

For the non-biochemist reader, amino-acids involved in the formation
of proteins have always at least two chemical groups that are of
inverted electrical charge, at physiological pH values (see
Figure~\ref{fig:protein-monomers-acidobasic}):

\begin{itemize}
\item The amino group (called $\rm \alpha NH_2$) has a typical pKa
  value of 9.6. This means that, at physiological pH values (between
  6.5 and 7.5), the amino group will find the environment rather
  acidic, and will thus be protonated, leading to a positively-charged
  species ($\rm \alpha NH_3^+$);
\item The carboxylic group (called $\rm \alpha COOH$) has a typical pKa
  value of 2.35. This means that, at physiological pH values, the
  carboxylic group will be in a rather basic environment, and will
  thus be deprotonated, leading to a negatively-charged species ($\rm
  \alpha COO^-$). 
\end{itemize}

\noindent It should be clear that, at physiological pH values the two
$\rm \alpha$ chemical groups have a net charge of 0. But proteins are
charged, and this is because some of the twenty common amino-acids
have other chemical groups beyond the two others already described. 

Indeed, some amino-acids have lateral chains that bear groups that
might be charged depending on the pH: seryl residues have an alcohol
group that has a pKa of 13, for example; that means that it is almost
always uncharged (form ROH at physiological pH values). The lateral
chain of lysine has a pKa of 10.53, which means that at pH values
below this pKa value, the $\rm \epsilon NH_2$ gets protonated,
introducing a positive charge in the protein. Similarly, amino-acids
glutamate and aspartate do have a lateral chain ended with a $\rm
\gamma COOH$ and a $\rm \beta COOH$, respectively.  Their pKa values
are below 4.5, and thus the groups are negatively charged a
physiological pH values. 

When the net charge of a polymer sequence has to be computed for a
given pH condition, the program iterates in the sequence, and for each
monomer will check which one of its chemical group(s) is possibly
charged.  For this to happen, it is required that a number of data be
known for each monomer's chemical group that might play a role in the
determination of the polymer sequence's electrical charge. Thus, for
each chemical group a number of data should be listed in the
\filename{acidobasic.xml} file (please, see that file in the
chapter\vref{chap:appendices}):

\begin{itemize}
\item the chemical group's <name> element is required.  {\footnotesize
    Examples: ``$\rm \alpha NH_2$'' or ``$\rm \epsilon NH_2$'' or
    ``$\alpha$COOH'';}
\item the chemical group's <pka> element is optional, but is the basis
  for the charge calculation. {\footnotesize Examples: 9.6 for the
    ``$\alpha$NH$\rm _2$'' or 2.35 for ``$\alpha$COOH'';}
\item the <acidcharged> element is required if the <pka> element is
  given. This element is responsible for telling if the chemical group
  is charged (positively) when the pH is lower than pKa (that is when
  the medium is acidic with respect to the pKa). 
  {\footnotesize Examples: an amine is positively charged when it is in
    its acidic form (protonated); a carboxylic acid is \emph{not}
    charged when it is in its acidic form;}
\item there can be none, one or more <polrule> element(s) for each
  chemgroup. The <polrule> element gives informations about the way
  the chemical group at hand might be ``trapped'' (or not) in the
  formation of inter-monomer bonds (while the monomer is polymerized
  into the polymer sequence). The value ``left\_trapped'' means that
  the chemical group ceases to be involved in charge calculations as
  soon as it has a monomer at its left end. The value
  ``right\_trapped'' means the same as above, but when a monomer is
  polymerized at its right end. For a chemical group that is
  ``left\_trapped'', we understand that it is only effectively
  evaluated if it is at the left end of the polymer sequence, since in
  this case it does not have a monomer at its left side. Conversely, a
  chemical group that has a <polrule> element with value
  ``right\_trapped'', will be evaluated only if the monomer is
  actually the right end monomer in the polymer sequence. Finally, the
  typical lateral chains of amino-acids have a <polrule> element with
  a value ``never\_trapped'', as these chemical groups do not take
  part in the formation of the inter-monomer bond;
\item there can be none, one or more <chemgrouprule> element(s) for
  each chemgroup. A chemgrouprule element should contain the
  following:
  \begin{itemize}
  \item there must be an <entity> element that indicates what is the
    chemical entity being dealt with in the current chemgroup element. 
    Valid values for this element are ``LE\_PLM\_MODIF'',
    ``RE\_PLM\_MODIF'' or ``MNM\_MODIF'';
  \item there must be a <name> element naming the chemical entity
    properly;
  \item there must be an <outcome> element telling what action should
    be taken when encountering the <entity> on the chemgroup. Valid
    values are either ``LOST'' or ``PRESERVED''. 
  \end{itemize}
\end{itemize}


\subsubsection*{Understanding by example}



Let us take some examples in order to make sure we actually understand
the process of describing how an electrical net charge is calculated
for a given polymer sequence and at any given pH value. 

Let us see the example of the aspartate amino-acid, of which the
lateral chain is nothing but $\rm CH_2COOH$:

\begin{alltt}
    <mnm>
      <code>D</code>
      <chemgroup>
        <name>N-term NH2</name>
        <pka>9.6</pka>
        <acidcharged>TRUE</acidcharged>
        <polrule>left_trapped</polrule>
        <chemgrouprule>
          <entity>LE_PLM_MODIF</entity>
          <name>Acetylation</name>
          <outcome>LOST</outcome>
        </chemgrouprule>
      </chemgroup>
      <chemgroup>
        <name>C-term COOH</name>
        <pka>2.36</pka>
        <acidcharged>FALSE</acidcharged>
        <polrule>right_trapped</polrule>
      </chemgroup>
      <chemgroup>
        <name>Lateral COOH</name>
        <pka>3.65</pka>
        <acidcharged>FALSE</acidcharged>
        <polrule>never_trapped</polrule>
        <chemgrouprule>
          <entity>MNM_MODIF</entity>
          <name>AmidationAsp</name>
          <outcome>LOST</outcome>
        </chemgrouprule>
      </chemgroup>
    </mnm>
\end{alltt}

\noindent We see that the code of the monomer for which acid-basic
data are being defined is `D' and that this monomer has three chemical
groups that might bring electrical charges. These chemical groups are
described by three \verb|<chemgroup>| elements that we will review in
detail below (see Figure~\vref{fig:protein-monomers-acidobasic}). 

\medskip

The first \verb|<chemgroup>| element is related to the $\rm \alpha
NH_2$ amino group of the amino-acid:

\begin{itemize}
\item \verb|<name>N-term NH2</name>| The name of the chemical group is
  not immediately useful, but will be used when reports are to be
  prepared for the calculation;
\item \verb|<pka>9.6</pka>| This element is optional. However, of
  course, if the chemical group might be electrically charged, the pKa
  value will be essential in order to compute the charge that is
  brought by this chemical group at any given pH;
\item \verb|<acidcharged>TRUE</acidcharged>| This element is also
  optional, however, if the previous element is given, then this one
  is compulsory. Telling if the conjugated acid form is charged (that
  is protonated) is essential in order to know what sign the charge
  has to be when the chemical group is ionized. The value ``TRUE''
  indicates that when the pH is lower than the pKa, the chemical group
  is charged, thus protonated (in the form $\rm NH_3^+$). 
  Consequently, if the pH is higher than the pKa, then the chemical
  group is neutral (in the form $\rm NH_2$);
\item \verb|<polrule>left_trapped</polrule>| This element indicates
  that the chemical group should only be taken into account in the
  eventuality that the monomer bearing it (code `D') is the left end
  monomer of the polymer sequence. This can easily be understood, as
  this chemical group is responsible for the establishment of the
  inter-monomer bond towards the left end of the polymer sequence;
\item \verb|<chemgrouprule>| This element provides further details on
  the chemistry that the chemical group at hand ($\rm \alpha NH_2$)
  might be involved in:
  \begin{itemize}
  \item \verb|<entity>LE_PLM_MODIF</entity>| This element indicates
    that the supplementary data in the current \verb|<chemgrouprule>|
    element are pertaining to the $\rm \alpha NH_2$ chemical group
    \emph{only} in case the polymer sequence is left end-modified
    (that is with a permanent left end modification) and the monomer
    (code `D') is located at the left end of the polymer sequence
    (that is: it is the first monomer of the sequence for which the
    electrical charge ---or pI--- computation is to be performed). 
  \item \verb|<name>Acetylation</name>| This element goes further in
    the detail of the potential chemistry of the $\rm \alpha NH_2$
    chemical group: if the left end permanent modification is
    ``Acetylation'', then the current chemgrouprule element can be
    further processed, otherwise it should be abandoned;
  \item \verb|<outcome>LOST</outcome>| This element actually indicates
    what should be done with the chemical group for which the
    chemgrouprule is being defined. What we see here is:
    ---\textsl{``If the $\rm \alpha NH_2$ chemical group, belonging to
      a `D' monomer located at the left end of a polymer sequence, is
      modified permanently with an ``Acetylation'' left end
      modification, it should not be taken into account when computing
      the charge that it could bring to the polymer sequence.''} 
  \end{itemize}
\end{itemize}

The second \verb|<chemgroup>| element is related to the $\rm \alpha
COOH$ carboxylic group of the amino-acid:

\begin{itemize}
\item \verb|<name>C-term COOH</name>| Same remark as above;
\item \verb|<pka>2.36</pka>| Same remark as above;
\item \verb|<acidcharged>FALSE</acidcharged>| Same remark as above. 
  However, as we can see, the value indicates that the acid conjugate
  (form $\rm COOH$) does not bring any charge. This means that when
  the basic conjugate is predominant (that is when pH > pKa), it
  brings a negative charge: the form is $\rm COO^-$;
\item \verb|<polrule>right_trapped</polrule>| The chemical group
  should not be evaluated if a monomer is linked to it at its right
  side. That means that the current chemical group is only evaluated
  if the monomer bearing it is located at the right end of the polymer
  sequence. This is easily understood, as the $\rm \alpha COOH$
  chemical group is involved in the formation of the inter-monomer
  bond towards the right end of the polymer sequence. 
\end{itemize}

The third \verb|<chemgroup>| element is related to the $\rm \beta
COOH$ carboxylic group of the amino-acid:

\begin{itemize}
\item \verb|<name>Lateral COOH</name>|;
\item \verb|<pka>3.65</pka>|;
\item \verb|<acidcharged>FALSE</acidcharged>|;
\item \verb|<polrule>never_trapped</polrule>| This element indicates
  that, whatever the position of the monomer bearing the chemical
  group in the polymer sequence (left end, right end or middle), the
  chemical group is to be evaluated;
\item \verb|<chemgrouprule>| This element provides further details on
  the chemistry that the chemical group at hand ($\rm \beta COOH$)
  might be involved in:
  \begin{itemize}
  \item \verb|<entity>MNM_MODIF</entity>| This element indicates that
    the supplementary data in the current \verb|<chemgrouprule>|
    element are pertaining to the $\rm \beta COOH$ chemical group
    \emph{only} in case the monomer bearing the chemical group is
    chemically modified;
  \item \verb|<name>AmidationAsp</name>| This is the modification by
    which the monomer should be modified in order to have the
    \verb|<chemgrouprule>| element effectively evaluated;
  \item \verb|<outcome>LOST</outcome>| This element actually indicates
    that if the monomer bearing the chemical group is modified with an
    ``AmidationAsp'' chemical modification, then the chemical group
    should not be evaluated any more for the electrical charge ---or
    pI--- calculations, since reacting a carboxylate group with an
    amino group produces an amide group which is not easily chargeable
    at physiological pH values. 
  \end{itemize}
\end{itemize}

\noindent At this point we should have made it clear how the charge
calculations can be configured for the different monomers in the
polymer chemistry definition. As usual, the more the polymer chemistry
definition is sophisticated, the more sophisticated the computations
allowed. 


\subsection*{Modifications might have ionized chemical group(s)}


In the excerpt from the \filename{acidobasic.xml} file below, we see
that chemical modifications can also bring charges. The example of the
chemical modification ``Phosphorylation'' shows that when a monomer is
phosphorylated, two chemical groups are brought in: the first has a
pKa value of 12 (that is it will always be protonated at physiological
pH values), the second has a pKa value of 7 (that is it will be
divided by half in a protonated (not charged) form and in an
un-protonated (negatively charged) form, leading to a net electrical
charge of $\rm -0.5$. 

\begin{alltt}
  <modifs>
    <mdf>
      <name>Phosphorylation</name>
      <chemgroup>
        <name>none_set</name>
        <pka>12</pka>
        <acidcharged>FALSE</acidcharged>
      </chemgroup>
      <chemgroup>
        <name>none_set</name>
        <pka>7</pka>
        <acidcharged>FALSE</acidcharged>
      </chemgroup>
    </mdf>
  </modifs>
\end{alltt}

\noindent At this point we should be able to study the way
computations are actually performed in the \pxe module. 

\subsection*{Performing pH, pI and charges computations}

The user willing to compute charges (positive, negative, net) or the
isoelectric point for the current polymer sequence uses the contextual
menu \guimenuitem{pKa-pH-pI}\guimenuitem{Computations} which triggers
the appearance of the window shown in
Figure~\vref{fig:polyxedit-acidobasic-wnd}. 

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/raster/polyxedit-acidobasic-wnd.png}
  \end{center}
  \caption[Acido-basic computations: pKa, pH, pI]{\textbf{Acido-basic
      computations: pI, pH, pKa.} This figure shows the options that
    can be set for the calculations related to the charges beared by
    the polymer sequence.} 
  \label{fig:polyxedit-acidobasic-wnd}
\end{figure}

This figure shows that the user might either compute the charges
(positive, negative and net) for the polymer sequence by setting the
\guilabel{pH} value at which the computation should take place and
clicking onto the \guilabel{Compute Net Charge} button, or ask that
the isoelectric point be computed \textit{ex nihilo} by clicking onto
the \guilabel{Compute Isoelectric Point} button (in which case the
\guilabel{pI} text entry widget will display the pH at which the
\guilabel{Net Charge Of The Polymer Sequence} will be near to
\guival{0}. 

Clicking onto the \guilabel{Compute Isoelectric Point} will trigger
computations that are lengthy, and the user is advised to be patient. 
As an example, on my computer,\footnote{My \filename{/proc/cpuinfo}
  and \filename{/proc/meminfo} say ``Intel(R) Pentium(R) M processor
  1400MHz; cpu family: 6; model: 9; 1024 KB cache size; 774376 kB of
  memory; bogomips: 2768.89''.} the pI computation for a protein of
10201 residues took 10 seconds (no modifications taken into account). 
If the user asks that the different modifications (permanent polymer
modifications and monomer modifications) be taken into account, the
duration of the computation is twice as long (23 seconds). 

Note that the computations might involve the permanent left/right
modifications of the polymer sequence, as well as the monomer chemical
modifications. To configure the way net charge ---or pI---
computations are performed, please use the calculations engine
configuration window, as described in
Figure~\vref{fig:polyxedit-calc-engine-options-wnd}. 

\renewcommand{\sectitle}{The m/z Ratio Calculator}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

When requiring m/z ratio calculations the user might use the

\centerline{\guimenu{Chemistry}\guimenuitem{m/z Ratio Calculations}}

contextual menu that shows up when the user right-clicks onto the
polymer sequence.  Note that the process of using the calculator was
described in Section~\vref{sect:polyxcalc-mz-ratio-calculator}.  When
the calculator is used in \pxe, the initial ionization status data are
set from the currently defined ionization rules (see the
\guilabel{Ionization Rules} frame in the window displayed in
Figure~\vref{fig:polyxedit-calc-engine-options-wnd}) of the polymer
sequence for which the computations are to be performed.



\renewcommand{\sectitle}{The Self-Read Feature Of Polymer Sequences}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\label{sect:self-read-feature-of-polymer-sequences}

It happens some times that the user needs somebody to read a sequence
while he double-checks the sequence being read. I have been confronted
to that situation a number of times (in particular when having to
confirm oligonucleotidic sequences), and finally decided that I would
give polymer sequences a ``self-reading'' ability. 

The basis of the self-reading framework is as simple as the writing
of (yet) another dictionary that makes ---for each polymer chemistry
definition--- the correspondence between a chemical entity and a sound
file that should be played in order to ``read out'' a polymer
sequence. Two chemical entities are able to read themselves out:

\begin{itemize}

\item Monomers: the user may define two sound files for each monomer
  of the polymer chemistry definition: a sound file vocalizing the
  monomer name (``alanine'', for example), and another sound file
  vocalizing the monomer code (``A'' or ``Ala'', for example). 

\item Modifications: the user may define only one sound file for each
  modification of the polymer chemistry definition. 

\end{itemize}

Selecting the

\centerline{\guimenu{Edit}\guimenuitem{Export Sound Playlist}}

menu item in the contextual menu that pops up when the user
right-clicks onto the polymer sequence will trigger the window
displayed in Figure\vref{fig:polyxedit-sequence-self-read-wnd} to
show up. 

\begin{figure}
  \begin{center}
    \includegraphics[width=0.8\textwidth]
    {figures/raster/polyxedit-sequence-self-read-wnd.png}
  \end{center}
  \caption[Polymer Sequence Self-Read Options]{\textbf{Polymer
      Sequence Self-Read Options.} This figure shows the options that
    can be set for the polymer sequence to read itself out to a
    playlist file.} 
  \label{fig:polyxedit-sequence-self-read-wnd}
\end{figure}


If a polymer sequence region is selected when the menu above is
selected, then the positions of the monomers delimiting that region
are displayed in the \guilabel{Define Self-Reading Sequence Interval}
frame. If the user changes the selection in the sequence editor, these
values can be updated by clicking onto the \guilabel{Sequence Region}
button. It is, however, possible to ask that the whole polymer
sequence be self-spoken out by clicking onto the \guilabel{Whole
  Sequence} checkbutton. 

The polymer sequence self-reading feature allows to select if monomer
codes or monomer names should be vocalized in the sequence, and if the
monomer modifications should be vocalized also. 

Finally, the \guilabel{Temporal Segmentation} frame lets the user
define how the files corresponding to the monomers' code/name (and
modifications' name, if so is required) are played. Specifically, it
is possible to ask that silences be interspersed between the sounds
corresponding to the chemical entities being self-spoken out. Silent
delays are played exactly in the same manner as the other chemical
entities' sounds (that is: a silent delay is played as a ``silence
sound'' file\dots). The user might ask that the sequence read-out be
interspersed with the following silent delays:

\begin{itemize}

\item \guilabel{Start Self-Reading After \ovalbox{x} Silent Slices}: the
  ``silent sound'' file is played the specified number of times before
  the sequence starts to read itself out. If the ``silent sound''
  file is 300~milliseconds-long, and the user wants a 1~second delay
  before the sequence actually begins to read itself out, the number
  asked would be 3;
\item \guilabel{Inter-Monomer Delay Of \ovalbox{x} Silent Slices}: a
  silent delay will be inserted between each monomer sound;
\item \guilabel{Extra Delay Of \ovalbox{x} Silent Slices Every Other
    Monomer \ovalbox{y}}: it might be useful, sometimes, to insert a
  silent delay each time a given number of monomers have been spelled. 
  This is particularly interesting when nucleic acids sequences read
  themselves out, so that a ``reading frame'' is conserved all along. 
  One would thus set the silent delay to be inserted every three
  monomers\dots

\end{itemize}


The user indicates the name of the file where the playlist is to be
written. It is advisable to use the \fileformat{m3u} file extension so
that the sound player will recognize that file as a playlist file. 

Indeed, \pxm does not generate sounds on the sound card. All it does
is write a sound playlist that the user later hands out to a sound
player, like \software{xmms} or \software{winamp}. 

The \fileformat{m3u} file format is pretty easy: it is a list of files
to be played in succession. Note that for the sequence to be properly
spoken-out at that step, the ``shuffle'' feature of the player should
be disabled. 

The following is the contents of the \filename{sequence.m3u} playlist
file that was obtained by having a protein sequence read itself out:

\begin{verbatim}

/usr/share/polyxmass/polchem-defs/protein/sounds/glutamate.ogg
/usr/share/polyxmass/polchem-defs/protein/sounds/glutamate.ogg
/usr/share/polyxmass/polchem-defs/protein/sounds/aspartate.ogg
/usr/share/polyxmass/polchem-defs/protein/sounds/silence.ogg
/usr/share/polyxmass/polchem-defs/protein/sounds/phenylalanine.ogg
/usr/share/polyxmass/polchem-defs/protein/sounds/serine.ogg
/usr/share/polyxmass/polchem-defs/protein/sounds/phospho.ogg

\end{verbatim}

Note that the last serine monomer is phosphorylated and that the user
asked that an interval be played every three monomers. 

The correspondence between a given monomer (or modification) and the
sound it should use to read itself out is performed in a text file
(\filename{sounds.dic}) located in the \filename{sounds} directory
itself located in the polymer chemistry definition data directory. See
the chapter about \pxmcommon for details. 



\renewcommand{\sectitle}{Results Reporting}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\pxe allows the user to perform a great number of different
simulations on any number of polymer sequences opened at any given
time. While the simultaneity of simulations (for example having at one
given time different enzymatic cleavages on a set of different
proteins) is necessary, as a simple matter of flexibility and power,
it is necessary to perform well-organized results reporting.

The reports might be asked for any window that displays results. For
example, a window that displays a polymer sequence (the polymer
sequence editor, in fact) is a results window as it displays a
sequence. A window displaying the oligomers obtained upon cleavage of
a polymer sequence with a chemical cleavage agent is also a results
window. As we have seen earlier, each results window is registered to
the program and its specifics are stored in items visible in the
\guilabel{Available Windows} treeview of the window management window
shown in Figure~\vref{fig:polyxmass-window-management}.

The configuration of the way reports are prepared takes place in the
polymer sequence context. The polymer sequence editor window menu

\centerline{\guimenu{Reporting}\guimenuitem{Reporting Options}\\}

will open a window as depicted in
Figure~\vref{fig:polyxedit-reporting-opt-wnd}.

\begin{figure}
  \begin{center}
    \includegraphics[scale=1.75]
    {figures/raster/polyxedit-reporting-opt-wnd.png}
  \end{center}
  \caption[The reporting options configuration]{\textbf{The reporting
      options configuration.} The configuration of the way window
    contents are reported is highly configurable. The configuration
    will affect the way the polymer sequence's data are reported, but
    also the way oligomers' data are reported and monomers'. Each tab
    of the depicted window deals with each one of these configuration
    options..}
  \label{fig:polyxedit-reporting-opt-wnd}
\end{figure}

Once the reporting options are configured in a polymer sequence
editing context, they automaticall apply for all the results windows
in the same polymer sequence editing context.  The reporting options
are always modifiable using the same menu as above. Once the
configuration of the reporting options is performed, the user might
use the

\centerline{\guimenu{Reporting}\guimenuitem{Make Reports}\\}

menu to elicit the opening of the window management window, where the
following reporting actions are made avaiblable through button
widgets.

After selecting a particular window item from the \guilabel{Available
  Windows} treeview, it becomes possible to ask that the selected
window exports a report about its contents.  The report can be sent to
the clibpoard or to a file (in append or overwrite mode) by using the
corresponding button widgets in the window management window: 

\begin{itemize}

\item \guilabel{Report To Clipboard:} {\footnotesize ask that the
    window contents be exported to the clipboard;}

\item \guilabel{Overwrite To File:} {\footnotesize ask that the window
    contents be exported to a file. Overwrite the file if it exists
    already;}

\item \guilabel{Append To File:} {\footnotesize ask that the window
    contents be exported to a file. The new contents report data are
    appended to a preexisting file.}

\end{itemize}
  










\cleardoublepage


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "polyxmass"
%%% End: