\chapter[\pxe] {\pxe: A Powerful Simulator} \label{chap:polyxedit} After having completed this chapter you will be able to perform sophisticated polymer chemistry simulations on polymer sequences ---that can be edited in place--- along with automatic mass recalculations. \renewcommand{\sectitle}{\pxe\ Invocation} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} The \pxe module is easily called by pulling down the ``\pxe'' menu item from the \pxm program's menu. The user may start the \pxe module by: \begin{itemize} \item Ask that a polymer sequence be loaded from disk; \item Ask that a new polymer sequence be started \textit{ex nihilo}. \end{itemize} \renewcommand{\sectitle}{\pxe\ Operation: \textit {In Medias Res}} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-polchemdef-open-def-init-seq-wnd.png} \end{center} \caption[Initializing a new polymer sequence in \pxe]{\textbf{Initializing a new polymer sequence in \pxe} When starting a new sequence from scratch, it is necessary to seed the program with a number of data that the user is invited to give in this window.} \label{fig:polyxedit-polchemdef-open-def-init-seq-wnd} \end{figure} When starting a new polymer sequence from scratch, the first thing the program does is to provide the user with a window (Figure~\vref{fig:polyxedit-polchemdef-open-def-init-seq-wnd}) where the user is invited to: \begin{itemize} \item Select the polymer chemistry definition (\guilabel{Def. Type}) to be used to interpret the polymer sequence (compulsory datum); \item Enter a \guilabel{Sequence Name} for the polymer sequence (non-compulsory datum); \item Enter a \guilabel{Sequence Code} for the polymer sequence (non-compulsory datum); \item Choose a file \guilabel{Name} for the polymer sequence file. \end{itemize} \noindent Once all the data have been selected/entered, then the user clicks onto the \guilabel{Validate} button and the program open an empty sequence window as shown on Figure~\vref{fig:polyxedit-seqeditor-empty}. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-seqeditor-empty.png} \end{center} \caption[An empty \pxe window]{\textbf{An empty \pxe window} This figure shows an empty \pxe window, waiting for the user to either paste a sequence from the clipboard or edit one from the keyboard.} \label{fig:polyxedit-seqeditor-empty} \end{figure} At this point, when the user starts editing a sequence, the characters entered at the keyboard, or pasted from the clipboard, will be interpreted using the polymer chemistry definition that was selected in the initialization window described above. Now, of course, editing a polymer sequence is not enough for a mass spectrometric-oriented software suite; what we want is to compute masses! When the \pxm software program is started, the window displaying the masses of the sequence being edited is not displayed. Go to the main menu of the program and select the item \guimenu{\pxe}\guimenuitem{View} and activate the checkbutton menu \guilabel{Display Masses Window}. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-masses-display-wnd.png} \end{center} \caption[The window displaying the masses]{\textbf{The window displaying the masses} This figure shows the window that displays masses for the currently edited polymer sequence. As can be seen the identity of the polymer sequence is shown along with masses computed for the sequence. } \label{fig:polyxedit-masses-display-wnd} \end{figure} The window that displays the masses for the currently edited polymer sequence is show in Figure~\vref{fig:polyxedit-masses-display-wnd}, where the reader can see that two different types of masses are displayed: \begin {itemize} \item \guilabel{Whole Sequence} These are the monoisotopic and average masses computed for the whole polymer sequence; \item \guilabel{Selection} These are the monoisotopic and average masses computed for the selected portion of the polymer sequence; \end{itemize} \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-calc-engine-options-wnd.png} \end{center} \caption[Configuring the mass calculation engine]{\textbf{Configuring the mass calculation engine} This figure shows the detail in which the mass calculation engine can be configured. See the text for details.} \label{fig:polyxedit-calc-engine-options-wnd} \end{figure} As the user can see, the protein sequence that we did initialize earlier is empty (the only visible item is the cursor), and the masses displayed correspond to an empty protein. But if there is no polymer sequence, then how come \textit{nihil} weighs some 19~mass~units? Well that's because we still have to show how polymer sequence masses are computed: by adding the masses of each monomer in the sequence, that's for sure. But also ---depending on the configuration set by the user--- on other parameters. Figure~\vref{fig:polyxedit-calc-engine-options-wnd} shows to what extent the way masses are computed can be configured. The window that is shown in this figure was shown as a result of right-clicking in the polymer sequence editor, selecting ---from the contextual menu that pops up--- the \guimenu{View}\guimenuitem{Calc. Options} menu. We'll review the different items in this window: \begin{itemize} \item \guilabel{Sequence Name}: This entry widget holds the name of the polymer sequence for which the mass computations are being configured; \item \guilabel{ID Number}: Unambiguous identification of the polymer sequence (this is useful in case the same identical polymer sequence file is loaded twice in \pxe since this ID number will differ); \item \guilabel{Left Capped}: If checked, the left cap of the polymer definition corresponding to this polymer sequence will be taken into account when computing masses \item \guilabel{Right Capped}: Same as for \guilabel{Left Capped} but for the right end of the polymer; \item \guilabel{Account Left End Modif}: If checked, take into account the modification that might be set to the left end of the polymer sequence; \item \guilabel{Account Right End Modif}: If checked, take into account the modification that might be set to the right end of the polymer sequence; \item \guilabel{Monomer -- Account Modifs}: If checked, take into account the chemical modifications that might be set to monomers in the polymer sequence (or selection portion of it); \item \guilabel{Ionization Rules -- Actform}: What action-formula to apply to the polymer sequence when ionization is computed; \item \guilabel{Ionization Rules -- Unitary Charge}: What is the charge that is brought by the action-formula mentioned above; \item \guilabel{Ionization Rules -- Level}: How many times the polymer sequence should be ionized according to the two data elements above. \end{itemize} \noindent The fact that the user can specify ionization rules should make it clear that the masses that are displayed are actually $\mathrm{\frac{m}{z}}$ ratios, as long as one ionization occurs\dots\ Also, note that the masses that are displayed in the window shown in Figure~\vref{fig:polyxedit-masses-display-wnd}, are updated automatically anytime something ``ponderable'' happens with the polymer sequence (\guilabel{Whole Sequence} masses) or anytime the cursor is moved in the sequence (this is equivalent to selecting from the beginning of the sequence up to the cursor point) or a selection is modified (\guilabel{Selection} masses). \bigskip For the moment that should be enough. Let's delve more into the capabilities of the \pxe module of the \pxm mass spectrometric software suite. \renewcommand{\sectitle}{\pxe The Polymer Sequence Menu} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} There are two menus available to the user in the polymer sequence editor window. The first menu is a conventional menu sitting on top of the sequence editor window. The second menu pops up when the user right-clicks onto the sequence-displaying area onto a monomer icon. The general rule of thumb is rather simple: whenever a menu item allows to perform an action onto a specific sequence graphical rendering item (I mean a specific sequence as displayed in a specific canvas), the menu to explore first is the popup menu. Conversely, if the action to be triggered more about the sequence itself, and less about its actual graphical rendering, then the menu to explore first is the main window menu. The sequence editor window main menu comprises the items described below: \begin{itemize} %%%%%%%%%%%%%%%%%%%%%%%% \item \guimenu{File} \begin{itemize} \item \guimenuitem{Save}\dots\ Save the polymer sequence; \item \guimenuitem{Save As}\dots\ Save the polymer sequence with a new name; \item \guimenuitem{Close}\dots\ Close the polymer sequence; \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%% \item \guimenu{Edit} \begin{itemize} \item \guimenuitem{Polymer Sequence Properties}\dots\ Edit the polymer sequence properties, such as sequence name, sequence code, for example. Note that the annotation process will let you enter as many notes as required to the polymer sequence; \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%% \item \guimenu{View} \begin{itemize} \item \guimenuitem{Calc. Options}\dots\ View/Modify the way calculations are performed, be them mass calculations or elemental composition calculations; \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%% \item \guimenu{Chemistry} \begin{itemize} \item \guimenuitem{Cleave}\dots\ Open a window so that a polymer sequence can be cleaved; \item \guimenuitem{Fragment}\dots\ Open a window so that a polymer sequence can be fragmented; \item \guimenuitem{Compositions} \begin{itemize} \item \guimenuitem{Elemental}\dots\ Open a window so that options can be set for the program to compute the elemental composition of the polymer sequence or a region of it; \item \guimenuitem{Monomeric}\dots\ Open a window so that options can be set for the program to compute the monomeric composition of the polymer sequence or a region of it; \end{itemize} \item \guimenuitem{pKa-pH-pI} \begin{itemize} \item \guimenuitem{(Re)Load The Data}\dots\ Ask that the \filename{acidobasic.xml} file be read (or re-read) from disk; \item \guimenuitem{Calculations}\dots\ Open a window so that options can be set for the program to compute the charges of the polymer sequence (or its isoelectric point); \end{itemize} \item \guimenuitem{m/z Ratio Calculations}\dots\ Open a window to perform m/z ratio calculations; \item \guimenuitem{Search Mass(es)}\dots\ Open a window so that options can be set for the program to search arbitrary oligomers in the polymer sequence that have the same mass as the one(s) searched for. \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%% \item \guimenu{Reports} \begin{itemize} \item \guimenuitem{Make Reports}\dots\ Open the window management facility to let the user choose windows to make reports of their contents; \item \guimenuitem{Report Options}\dots\ Configure the way reports are prepared. \end{itemize} \end{itemize} \vspace{\baselineskip} \noindent Note that each action undertaken as the response to choosing one menu item is performed onto the polymer sequence being edited in the polymer sequence editor from which the menu was selected. \vspace{\baselineskip} \noindent When the user right-clicks onto a monomer icon, an \guilabel{Edit} contextual menu pops-up that has the following menu structure: \begin{itemize} %%%%%%%%%%%%%%%%%%%%%%%% \item \guimenu{Edit} \begin{itemize} \item \guimenuitem{Copy}\dots\ Copy to the clipboard the currently selected sequence; \item \guimenuitem{Cut}\dots\ Copy to the clipboard the currently selected sequence and remove it from the sequence; \item \guimenuitem{Paste}\dots\ Paste the sequence from the clipboard to the current location of the cursor in the polymer sequence editor; \item \guimenuitem{Find Replace}\dots\ Extremely flexible Find/Replace functionality; \item \guimenu{Annotation} \begin{itemize} \item \guimenuitem{Monomer}\dots\ Edit (add/remove/modify) the notes for the monomer lying below the cursor when the menu was elicited; \item \guimenuitem{Polymer}\dots\ Edit (add/remove/modify) the notes for the polymer being edited in the polymer sequence editor; \end{itemize} \item \guimenuitem{List Completions}\dots\ Show the list of available monomer code completions according to what is already typed in the sequence editor and the monomer codes defined in the polymer chemistry definition; \item \guimenu{Select All}\dots\ Selects the whole sequence in the polymer sequence editor; \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%% \item \guimenu{Chemistry} \begin{itemize} \item \guimenuitem{Monomer Modifications}\dots\ Open a window so that a monomer (or any combination of monomers) can be modified or unmodified; \item \guimenuitem{Polymer Modifications}\dots\ Open a window so that the polymer sequence can be modified or unmodified either on its left end or on its right end (or both); \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%% \item \guimenuitem{Self Read Sequence To File}\dots\ Write to file a configurable list of sound files to play the sequence aloud; \end{itemize} \renewcommand{\sectitle}{Editing Polymer Sequences} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} As we have seen in the \pxd module, the user may stipulate that a polymer chemistry definition allows more than one character in order to define the codes of the different monomers of this same polymer chemistry (see section~\vref{sect:monomers}). Remember that it is not because the number of allowed characters is \cfgval{3}, for example, that all your monomer codes must be defined using three characters. \cfgval{3} is the \emph{max} number of characters that you may use. This means that you are perfectly entitled, in this case, to have single-character or bi-character monomer codes in this polymer chemistry definition. Let's start by looking at how the polymer sequence editor window behaves when the user tries to enter multi-character monomer codes. Next, we'll see that whatever the length of a monomer code, if its very first character is unambiguous, the behaviour of the polymer sequence editor is flexible and powerful. \subsection*{Multi-Character Monomer Codes} \begin{figure} \begin{center} \includegraphics[scale=1.75] {figures/raster/polyxedit-multi-char-code-editing.png} \end{center} \caption[Multi-character code sequence editing in \pxe]{\textbf{Multi-character code sequence editing in \pxe.} This figure shows the process by which it is made possible to edit polymer sequences with a code set that allows more than one character per code.} \label{fig:polyxedit-multi-char-code-editing} \end{figure} In this section we will describe the editing of a polymer sequence for which monomers can be described using more than one character. The Figure~\vref{fig:polyxedit-multi-char-code-editing} shows the case of a polymer sequence that is of a polymer chemistry definition that allows three characters to define monomer codes. Let's now assume that the user wants to edit the sequence by insertion ---at the cursor point--- of a new monomer ``Aspartate'', of which the user knows only that its code starts with an `A'. The cursor is located between the two ``Ala'' monomers at positions 15 and 16 (panel~1). The user keys-in \kbdKey{A} (panel~2). To her dismay, nothing happens in the polymer sequence, but she sees an `A' character now displayed in the left text widget under the label \guilabel{Editing Feedback}. The reason why we have this behaviour is related to the fact that we are allowed up to 3 characters to describe a monomer code. If no monomer icon is displayed in the polymer sequence, that may simply mean that more than one monomer code start with an `A' character: \pxe cannot figure out which monomer code the user actually means when keying-in \kbdKey{A}. There is a way, called \emph{code completion}, to know which monomer code(s) ---in the current polymer chemistry definition--- do start with the keyed-in character(s) (`A' for us, now). The user can always enter the \emph{code completion mode} by hitting the tabulation \kbdKey{TAB}~key. This is what is shown in the panel~A. We see that, in the current polymer chemistry definition, four monomer codes start with an `A' character, and these are ``Ala'', ``Arg'', ``Asp'' and ``Asn''. We could be selecting the monomer of choice by double-clicking onto the proper list item, which would insert the corresponding monomer icon (``monicon'') in the polymer sequence at the cursor location. But, since this is a manual, we are going through another step. Let's continue editing the polymer sequence and key-in a \kbdKey{s} (we did not forget that we wanted to enter an ``Asp'' monomer code in the first place, did we?). The result is shown in panel~3. What we see here is that, this time also, nothing changed in the polymer sequence. What changed is that there is now a ``As'' character string in the left text widget under the label \guilabel{Editing Feedback}. Let's key-in once more the \kbdKey{TAB}~key, and we get the small window show in panel~B. This time, only two items are listed: ``Asp'' and ``Asn''. This is easy to understand: there are only two monomer codes that start with the two letters `A' and `s' (``As'') that we have keyed-in so far. At this time, we either select one of the items (we wanted to enter the ``Aspartate'' monomer, so we'll double-click onto the first item of the list), or we just key-in a last character: \kbdKey{p}. At this point, the monomer is effectively inserted in the polymer sequence, as the ``Asp'' monomer left of the cursor, as shown in panel~4. \subsection*{Unambiguous Single-/Multi-Character Monomer Codes} Let's imagine that we have a polymer chemistry definition that allows up to 3 characters for the definition of monomer codes, but that we have one monomer code (let's say the one for the ``Glutamate'' monomer) that is `E'. This monomer code `E' is the only one of the polymer chemistry definition that starts (and ends, since it is mono-character) with an `E'. In this case, when we key-in \kbdKey{E}, we'll observe that the monomer code is immediately validated and that its corresponding monomer icon is also immediately inserted in the polymer sequence. This is because, \emph{if there is no ambiguity, \pxe will immediately validate the code being edited}. This means that you are absolutely free to define \emph{only single-character monomer codes} in your polymer chemistry definition, so that you are not even conscious that the powerful multi-character feature exists! Indeed, in this 1-character monomer code configuration, each time you'll key-in an uppercase character, you'll be inserting its corresponding monomer into the polymer sequence immediately. \subsection*{Displaying All The Available Monomer Codes} Equally interesting is the fact that if you key-in the \kbdKey{TAB}~key while no monomer code is being edited (that is: the left text widget under the label \guilabel{Editing Feedback} is empty), all the monomer codes available in the polymer chemistry definition currently in use are displayed, exactly as shown in the panel~C, Figure~\vref{fig:polyxedit-multi-char-code-editing}. \subsection*{Erroneous Monomer Codes} Let's see now what happens when the user keys-in bad characters in the polymer sequence editor window. This is described in the Figure~\vref{fig:polyxedit-bad-lowercase-char-code}. If the user enters a lowercase character as the first character of a monomer code, the program immediately complains in the right text widget under the label \guilabel{Editing Feedback}. In this case, the monomer code is not put into the left text widget, which means it is simply ignored. \begin{figure} \begin{center} \includegraphics[scale=2.5] {figures/raster/polyxedit-bad-lowercase-char-code.png} \end{center} \caption[Bad code character in \pxe\ sequence editor]{\textbf{Bad code character in \pxe\ sequence editor.} This figure shows the feedback that the user is provided by the code editing engine, when a bad character code is keyed-in.} \label{fig:polyxedit-bad-lowercase-char-code} \end{figure} If the user starts keying-in valid monomer character codes, like for example we did earlier with ``As'', and that she wants to erase these characters because she changed her mind, she \emph{must not} use the \kbdKey{BACKSPACE} key, because this key will erase the monomer left of the cursor point in the polymer sequence! The way that the user has to remove the characters currently displayed in the left text widget under the label \guilabel{Editing Feedback}, is to key-in the \kbdKey{Esc} key once for each character. For example, let's say I've already keyed-in \kbdKey{A} and \kbdKey{s}. In this case the left text widget, under label \guilabel{Editing Feedback}, displays these two characters: ``As''. Now, \emph{I change my mind} and do not want to enter the ``Asp'' monomer code anymore. I want to enter the ``Gly'' code. All I have to do is key-in the \kbdKey{Esc} key once for the `s' character (which disappears) and once more to remove the remaining `A' character which disappears also. At this point I can start fresh with the ``Gly'' monomer code by keying-in sequentially \kbdKey{G}, \kbdKey{l} and finally \kbdKey{y}. \renewcommand{\sectitle}{Clipboard-Importing Of Sequences} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} Very often, the user will make a sequence search on the web and be provided with a polymer sequence that is crippled with non-code characters. The user typically selects all the text provided by the remote site, pastes that sequence in the \pxm\ polymer sequence editor window and finally encounters invalid codes in it. It might be uncomfortable to have to trigger ---prior to pasting a correct sequence in \pxe--- a text editor only to ``purify'' that sequence\dots\ \pxm provides a convenient way to spot non-valid characters from a polymer sequence and to let the user ``purify'' the imported sequence. A clipboard-imported sequence is systematically parsed. When invalid characters are found the window depicted in Figure~\vref{fig:polyxedit-check-import-sequence} is presented to the user for her to make appropriate adjustments. The sequence is presented to the user in a textview widget (\guilabel{Imported Sequence}) with the improper characters tagged in red color. The rationale for tagging characters in red colour is by comparing the imported sequence with the monomer codes available in the current polymer chemistry definition. As soon as a character does not correspond to any valid monomer code, it is tagged in red. At that point, if the user clicks onto the \guilabel{Remove All Tagged} button, all the red-tagged characters will be automatically removed. Also, the user is provided with an automatic ``purification'' procedure whereby it is possible to remove one or more classes of characters from the imported sequence (\guilabel{Remove Characters} frame widget). Checking one or more of the \guilabel{Digits} or \guilabel{Punctuation} or \guilabel{Space} checkbuttons, or even entering other user-specified characters in the \guilabel{Other} text entry widget, will elicit their removal from the imported sequence after the user clicks the \guilabel{Purify Sequence} button. \begin{figure} \begin{center} \includegraphics[scale=2.5] {figures/raster/polyxedit-check-import-sequence.png} \end{center} \caption[Clipboard-imported sequence error-checking]{\textbf{Clipboard-imported sequence error-checking.} If a sequence that is imported through the clipboard to the \pxe\ sequence editor contains invalid characters, the user is provided with a facility to ``purify'' the sequence. This facility is provided to the user through the window depicted in this figure.} \label{fig:polyxedit-check-import-sequence} \end{figure} When the user is confident that almost all the erroneous characters have been removed, she can click the \guilabel{Check Sequence} button, which will trigger a ``re-reading'' of the sequence in the \guilabel{Imported Sequence} textview widget. If erroneous characters are still found, they are presented to the user in red color. Note that, for maximum flexibility, the user is allowed an immediate and direct editing of the imported sequence in the textview widget (that is, the textview widget is \emph{not} read-only). Once the sequence if finally depured from all the invalid characters, the user can select it in the textview on the left of the window and can paste it in the \pxe\ sequence editor. This time, the paste operation will be error-free. \renewcommand{\sectitle}{Importing Of Sequences As Raw Text Files} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} It might be of interest to be able to import a sequence from a raw file. To this end, the user is provided a menu item \guimenu{Edit}\guimenuitem{Import Raw Text} from the contextual menu of the sequence editor widget (available by right-clicking on the polymer sequence editor region). Using that menu, the user will be provided a file selection window from which to choose the file to import. The program then iterates in the lines of that file and checks their content for validity. If errors are found, then the same process as described earlier for clipboard-imported sequences is started. The user can then purify the sequence imported from the file and finally integrate that sequence in the polymer sequence currently edited. Note that if any sequence portion is currently selected, it will be replaced by the one that is being imported. \renewcommand{\sectitle}{Sequence Selections: The Various X Mechanisms} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} As any text editor, the \pxe\ polymer sequence editor can perform the usual clipboard operations. In the \software{X window} world, there is another process to copy text and paste it into another place: the \software{X window} primary selection mechanism. That process is easy: text is first selected (either using the keyboard or the mouse; that makes the \emph{selection}), and when that selected text needs to be pasted, the user just clicks the mouse's middle button at the destination location. The copy/cut/paste process, much usual in the \OSname{MS Windows} system, is implemented also. Thus, the users of \pxe\ get the best features of selection and pasting. When the user tries to paste a sequence element from the clipboard (say, after copying it from a web browser), the program checks that sequence very thoroughly. If an invalid character is found, the whole process is stopped with a message logged to the console; the sequence is not modified in any way and the user may verify that sequence so that she removes the invalid characters or codes. When the users copies/cuts a sequence from the \pxe\ sequence editor window to the clipboard, what is actually copied in the clipboard is a text string that is made with all the monomer codes of the polymer sequence that was selected the copying/cutting operation was performed. \renewcommand{\sectitle}{Visual Feedback In The Editor} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \begin{figure} \begin{center} \includegraphics[scale=2.5] {figures/raster/polyxedit-editor-visual-feedback.png} \end{center} \caption[Visual feedback in the \pxe\ sequence editor]{\textbf{Visual feedback in the \pxe\ sequence editor.} This figure shows the feedback that the user is provided when moving the mouse cursor over monomer icons. See the text for details.} \label{fig:polyxedit-editor-visual-feedback} \end{figure} The polymer sequence editor provides a number of widgets to inform in real time the user about what is going on in it. These widgets are briefly reviewed below, and the user is invited to look at Figure~\vref{fig:polyxedit-editor-visual-feedback}: \begin{itemize} \item The \guilabel{Un/Modified} label informs the user if the polymer sequence was modified or not since it was either last written to a file on disk or last read from a file; \item The monomer status flag (here it is red-green-red) is supposed to inform the user about the status of the monomer onto which the mouse cursor is positioned (in the image example, that is monomer `S', at position 22). The flag is interpreted in the following manner: \begin{itemize} \item The first flag element (red in the example) tells if the monomer contains properties. That is a flag about the internal status of the monomer. This flag is mainly interesting to the power user who goes in the source code and modifies it to adapt it to her specific needs. Red means that the monomer has at least one ``prop'' object in it. Green means that it has no such ``prop'' in it. If this flag element is green, then the two remaining flags are necessarily green. This is because the two other flag elements tell the presence or the absence of monomer characteristics that are subsets of the ``prop'' object; \item The second flag element (green in our example) tells if the monomer has been \emph{annotated} at least once. The green color indicates that no note is found in the monomer. That flag would be red if the monomer had been annotated at least once; \item The third flag element (red in our example) tells if the monomer has undergone a \emph{chemical modification}. In our example that flag is red, because as the reader can see, the `S' monomer at position 22 is indeed modified: it is a phosphorylated seryl residue! If the monomer had not been modified, then that flag element would have been green. \end{itemize} \item The label that is located left of the monomer status flag (it indicates \guival{8} on the figure) tells the sequence position of the monomer onto which the cursor is positioned at any given time\footnote{The cursor is not visible because the screen dump function in \software{The Gimp} removes it to clean the image.}. \end{itemize} \renewcommand{\sectitle}{Sequence Annotation: The Various Mechanisms} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} The annotation of polymer sequences is very often required in projects for which a number of scientist-made observations are to be ``connected'' in a time-lasting manner either to a polymer sequence (as a whole object \textit{per se}) or to any monomer in a polymer sequence. \pxe\ allows the annotation of the whole polymer and/or of any (and any number) of monomers in the polymer sequence. There is no limitation on the number of notes that can be set to the polymer or any given monomer. Further, the user is provided with two mechanisms by which she can set notes to monomers (annotate): \emph{single-mode} monomer annotation and \emph{range-mode} monomer annotation. All these polymer/monomer note-setting processes are described in detail below. First, however, I should tell you, respected reader, that a note is basically an envelope that contains a number of elements: \begin{itemize} \item A textual element that is the \textit{name of the note}; \item Any number of paired data, called \textit{noteval} objects (like ``note value''). A noteval is made of two data: \begin{itemize} \item A datum describing the type of the noteval: either \textit{string}, or \textit{integer}, or \textit{double}; \item The contents of the noteval object. \end{itemize} \end{itemize} \noindent The notes are stored in the polymer sequence file and are easily managed graphically, as we'll describe now. \subsection*{Managing Polymer Notes} The user may set/modify/remove polymer notes using the following contextual menu: \centerline{\guimenu{Edit}\guimenuitem{Annotation}\guimenuitem{Polymer}} The Figure~\vref{fig:polyxedit-note-editing-center-polymer} shows the window that pops up to let the user perform a number of note-related actions that are rather self-explanatory. \begin{figure} \begin{center} \includegraphics[scale=2.33] {figures/raster/polyxedit-note-editing-center-polymer.png} \end{center} \caption[Annotating polymer sequences]{\textbf{Annotating polymer sequences.} This figure shows the graphical interface to the annotation of polymer sequences.} \label{fig:polyxedit-note-editing-center-polymer} \end{figure} A note that is set to a polymer sequence is set to that sequence as a whole, and not to any specific monomer or monomer range. If all the monomers in the annotated polymer sequence were removed, that (empty) polymer sequence would still bear the annotation. In order to add notes, the user must first fill-in the note~\guilabel{Name} field. Once this field is filled, the user clicks the \guilabel{Add New Note} button. The note name will be listed in the \guilabel{Name} column of the \guilabel{Notes Already Set} treeview. It is only once a note (name) has been added, as described above, that the user can add notevalue objects to that note. Remember, we said earlier that a note was made of a name and of any number of [value type+value contents] noteval pairs. The note, or one of its noteval objects, has to be selected in the treeview on the left hand side of the window, so that the user can add a noteval object, by: \begin{itemize} \item Choosing the type of the value (string, integer or double) by selecting the radiobutton of choice in the \guilabel{Type} widget; \item Entering the data proper in the \guilabel{Contents} textview widget; \item Clicking onto the \guilabel{Add New Value} button. \end{itemize} Accomplishing the tasks above will create a new subitem in the treeview: a new noteval object will be listed under the node corresponding to the note name under which a new noteval [type-contents] pair has been defined. It is possible to change the note name of a note that is selected in the treeview or to change the type or contents of a noteval object that is currently selected in the treeview. Most intuitively, these changes are done by editing the data in their respective widgets, and then clicking either \guilabel{Apply Note Changes} or \guilabel{Apply Value Changes}. It is also possible to remove any item that is currently selected in the treeview. The menu entitled \guilabel{Notes Specific Actions} will popup when clicked, to show the menu items shown on the Figure~\vref{fig:polyxedit-note-editing-center-notes-menu-single}. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-note-editing-center-notes-menu-single.png} \end{center} \caption[The menu governing actions on note items]{\textbf{The menu governing actions on note items.} This figure shows the menu that the user may use in order to remove any item currently selected in the treeview. When the window is opened in single-mode, the range-mode actions are inactive.} \label{fig:polyxedit-note-editing-center-notes-menu-single} \end{figure} \bigskip Setting notes to the polymer sequence as a whole is conceptually simpler than what we are about to visit: the annotation of any monomer in either single-mode or range-mode. \subsection*{Managing Monomer Notes} As stated earlier, monomer notes can be set in two distinct modes: \emph{single-mode} and \emph{range-mode}. Setting notes to a monomer is as easy as setting notes to a polymer sequence. However, before starting doing any annotation work, it should be understood what kind of note is appropriate for the specific annotation task. Let's first see the simplest mode of monomer annotation: \emph{single-mode}. \subsubsection*{Managing Monomer Notes In Single-Mode} \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-note-editing-center-monomer-single.png} \end{center} \caption[Annotating monomers in single-mode]{\textbf{Annotating monomers in single-mode.} This figure shows the graphical interface to the annotation of monomers in single-mode.} \label{fig:polyxedit-note-editing-center-monomer-single} \end{figure} If the annotation pertains to a single monomer in the sequence,\footnote{Like indicating that this specific residue is polymorphic, for example, or entering any kind of comment.} the user should hit the corresponding monomer icon with the mouse and right-click onto it so that the following menu item can be selected out of the contextual menu that pops up: \centerline{\guimenu{Edit}\guimenuitem{Annotation}% \guimenuitem{Monomer}\guimenuitem{Single}} The precise mouse-clicking of that specific monomer icon will trigger internal calculations that will lead to the proper initialization of the popped up window, as shown in Figure~\ref{fig:polyxedit-note-editing-center-monomer-single}, where the \guilabel{Ref. Monomer Code/Pos.} label indicates \guival{F/15}. That example means that the user wanted to annotate a phenylalanine residue located at position 15 of the polymer (protein) sequence. Note, by the way, that the \guilabel{Range} label indicates no specific value (\guival{-$\;$-}). We'll see later that this bit of information is useful in other cases. Once the window shown in that example is displayed, the managing of monomer notes is identical to the managing of polymer notes (as was previously described). \subsubsection*{Managing Monomer Notes In Range-Mode} Sometimes it is desirable to be able to set an identical note to a range of consecutive monomers. For example, one user might want to set to a range of residues in a protein a note (with a name \textsl{``TRYPSIN''} and a number of notevalue objects describing scientific observations (either text or numerical) and interrogations, for example). That note will be set in each monomer of the range of monomers. Once the range-mode annotation has been performed, each note in each monomer will behave exactly the same way as notes set using the \emph{single-mode} annotation procedures. See Figure~\vref{fig:polyxedit-note-editing-center-monomer-range} for a good example of such note. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-note-editing-center-monomer-range.png} \end{center} \caption[Annotating monomers in range-mode]{\textbf{Annotating monomers in range-mode.} This figure shows the graphical interface to the annotation of monomers in range-mode.} \label{fig:polyxedit-note-editing-center-monomer-range} \end{figure} So, how are range-mode annotations actually carried out by the program? The very first thing is to select --in the polymer sequence editor-- the range of monomers to be annotated. Once that range of monomers is effectively selected, the user can mouse-click with the right button one specific monomer, in that range of selected monomers. In order to elicit the displaying of a window like the one represented in Figure~\vref{fig:polyxedit-note-editing-center-monomer-range}, the user must select the following menu item from the contextual menu: \centerline{\guimenu{Edit}\guimenuitem{Annotation}% \guimenuitem{Monomer}\guimenuitem{Range}} As can be seen on that figure, this time the \guilabel{Range} label gives an indication in the form \guival{[xx->yy]}. This means that the user wanted to edit a note for all the monomers comprised in that range (from position \guival{xx} to position \guival{yy}). That makes a range-mode annotation action that is taken on three monomers. One interesting question is: ---\textsl{``Given the fact that the user is performing a range-mode annotation, to which monomer do belong the notes shown in the \guilabel{Notes Already Set} list on the left hand side of the window?''} That's undoubtedly a good question. The answer is that the notes that are listed there belong to the \emph{reference monomer}, that is the monomer that was actually pointed while right-clicking the sequence (to elicit the popping up of the contextual menu). This \emph{reference monomer} is very important, as we'll see in a moment. The Figure~\vref{fig:polyxedit-note-editing-center-monomer-range} shows that range-mode annotations are performed much like monomer single annotations or polymer annotations (same window, in fact, with same widgets). The big difference comes with the notes menu, that lists menu items that are specific to the \emph{range-mode} actions (Figure~\vref{fig:polyxedit-note-editing-center-monomer-range}): \begin{itemize} \item The menu item \guimenuitem{Remove Item (Range)} will remove the selected item (note item) from all the monomers in the range; \item The menu item \guimenuitem{Propagate Item (Range)} will make a copy of a newly created note into all the other monomers in the range. Note that, if a note by the same name exists already in any of the monomers in the range, the note is not added to it. The user will be informed by a dialog window that a given monomer was skipped. \end{itemize} Note that the single-mode menu item (\guilabel{Remove Item (Single)} will perform the action, when in range-mode, on the reference monomer, that is the one that was right-clicked upon when the note editing process was triggered (see above for the definition of the \emph{reference monomer}. \begin{center} \noindent\fbox{\parbox{0.9\textwidth}{It is important to grasp that in the range-mode annotations, when an action cannot be performed in one of the monomers in the selected range of monomers, then this does not prevent the process from trying to accomplish the task on the other monomers of the range. For example, the user selects a stretch of twenty monomoners in a polymer sequence, and then elicits a range-mode annotation process (namely the addition of a note) onto these twenty monomers. Let's say that the to-be-added note is identical to a note present in the fifth monomer of the monomer range. The note addition --for this monomer-- is going to fail. That does not mean that the whole process is stopped: if the to-be-added note is not found identical in any other monomer, it is going to be successfully added into all the remaining monomers. In other words, one failure does not abort the whole range-mode annotation process.}} \end{center} \bigskip Without bothering the reader with more descriptions, I would suggest that she experiments with the features described here. The design has been conceived as the most flexible possible. Notheworthy is that flexibility sometimes goes with risky programmatic behaviours: the user must know what she does when clicking onto a button! The \guilabel{Save As} menu item is your friend \emph{before} experimenting that annotation feature. \renewcommand{\sectitle}{Chemically Modifying Polymer Sequences} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} It very much often happens that the (bio)~chemist uses chemical reactions to modify the polymer sequence she is working on. Mass spectrometry is then often used to check if the reaction proceeded properly or not. Further, in nature, chemical modifications of biopolymer sequences are very often encountered. For example, protein sequences get often modified as a means to regulate their function (phophorylations, namely). Nucleic acid sequences are very often and extensively modified with modifications such as methylation\dots It is thus crucial that \pxm\ be able to model with high precision and flexibility the various chemical reactions that can be either made in the chemistry lab or found in nature. The \pxm\ program provides two different chemical modification processes: \begin{itemize} \item A process by which monomers in the polymer sequence can be individually modified; \item A process by which the whole polymer sequence can be modified, either on its left end or on its right end or even on both ends. \end{itemize} \noindent We shall review these two processes separately in the two sections below. \subsection*{Chemical Modification Of Monomers} \subsubsection*{Modification Of Monomers} There are a number of manners in which monomers can be modified in a polymer sequence. The Figure~\vref{fig:polyxedit-monomer-modif} shows the simplest manner: the user first selects the monomer icon to modify, next calls the \guimenu{Chemistry}\guimenuitem{Modifications}\guimenuitem{Monomer} menu and --as a result-- is provided with a window where all the modifications currently available in the polymer chemistry definition are listed. Since a monomer icon was initially selected in the editor window, the \guilabel{Selected Monomer} target radiobutton is on by default. It is then simply a matter of choosing the right modification from the \guilabel{Available Modifications} list and clicking onto the \guilabel{Modify} button. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-monomer-modif.png} \end{center} \caption[Modification of a monomer in a polymer sequence]{\textbf{Modification of a monomer in a polymer sequence.} This figure shows the graphical rendering of a phosphorylation of a seryl residue in a protein polymer sequence.} \label{fig:polyxedit-monomer-modif} \end{figure} The modified seryl residue is shown in the polymer sequence editor window: a transparent graphics object (a red `P') was overlaid onto the corresponding seryl monicon. While the \guilabel{Modification Target(s)} frame widget contains radiobuttons the signification of which is rather easy to understand, we want to detail one of these: the \guilabel{Specific Monomer Locations} frame. If the user selects the radiobutton inside that specific frame (labelled \guilabel{Positions Should Be Separated With ';'}), she also has to write the locations in the text entry widget below it. This text entry widget receives textual strings that should describe what locations on the polymer sequence should be modified. The syntax of the descriptive string allows logical positions to be indicated. The user is invited to experiment, maybe using variations on the themes described below as examples: \begin{itemize} \item \guival{ALL} That would mean that the currently selected modification in the \guilabel{Available modifications} list is to be applied to all the monomers in the polymer sequence. This is equal to selecting the radiobutton labelled \guilabel{All Monomers}; \item \guival{EVEN} or \guival{even} This will modify all monomers at even positions: 2, 4, 6\dots \item \guival{ODD} or \guival{odd} This will modify all monomers at odd positions: 1, 3, 5\dots \item \guival{EVEN;ODD} is identical to \guival{ALL}; \item \guival{[1-10];[20-30,odd]} This will modify all the monomers from position~1 to position~10 inclusive, and all the odd-positioned monomers between position~20 and position~30 inclusive; \end{itemize} \noindent The user is responsible for correctly reading the results that are published in the paned textview lying between the upper pane (labelled \guilabel{Monomer Modification Rules}) and the two buttons at the bottom of the window. Further, when a modification or un-modification is performed, the count of successful events and of failed events is displayed in the messages' text widget at the very bottom of the window. The messages that are displayed in this widget are not permanent, they last some seconds and disappear. Care should be taken at what is displayed in this messages' text widget. \medskip \begin{center} \noindent\fbox{\parbox{0.9\textwidth}{Attention should be paid to the fact that the user is responsible for applying chemical modifications to monomers that are listed as modifiable with the modification used. For example, if a phosphorylation modification is applied to a monomer that is not listed as phosphorylatable in the relevant configuration file, then the modification is applied to it (which means that --internally-- the monomer is modified) but its corresponding monicon is not graphically changed because no graphical rule is associated with the phosphorylation of this monomer (see section~\vref{subsect:monicons.dic}, the file of interest is \filename{monicons.dic}).}} \end{center} \medskip \noindent It is important to understand that, when a monomer is modified, its previous modification (if any) is overwritten with the new one. The user is invited to experiment a bit with the monomer modification process, so as to be confident of the results that she is going to obtain when real polymer chemistry work is to be modelled in \pxm. \subsubsection*{Un-Modification Of Monomers} If a monomer is modified, then it also should be possible to revert the chemical reaction: to un-modify it. There is, however, a subtlety here, that we ought to put into the limelight: an example will do. Let's say that all the seryl residues of our protein polymer sequence are phosphorylated.\footnote{That's protein chemistry stuff.} Only seryl residues are phosphorylated in this polymer sequence. We thus see all their corresponding monicons overlaid with a small `P' on them (see the example above). Other monomers are acetylated, like lysyl residues, for example. What we want to do is un-modify all the phosphorylated seryl monomers in one go. We thus open the monomer modification window, select the monomer code corresponding to the seryl residue in the \guilabel{Monomers} list, select the rabiobutton labelled \guilabel{Monomers From The List}, we select ``Phosphorylation'' in the \guilabel{Available Modifications} list and finally we click the \guilabel{Unmodify} button. All the seryl residues currently phosphorylated are un-modified. This is OK. Now, let's assume that we had not selected ``Phosphorylation'' in the list of available modifications, but ``Acetylation'', for example: no phosphorylated seryl residue would have been un-modified. This is a foolproof feature: if you select a modification name from the list of available modifications, and next click onto the \guilabel{Unmodify} button, that means that your un-modifying action has --as targets-- monomers that are currently modified with the modification that you selected. That means that if, in our example, you had selected, as monomer targets to the un-modification, the \guilabel{All Monomers} radiobutton, selected the ``Phosphorylation'' modification and clicked onto the \guilabel{Unmodify} button, \emph{only} the phosphorylated monomers\footnote{Whatever they be, because the \guilabel{All Monomers} radiobutton was selected.} would have been un-modified. Now, if you un-select all the items in the list of available modifications\footnote{You may need to maintain the \kbdKey{Ctrl} key pressed while clicking onto the currently selected item to unselect it.}, that you select the \guilabel{All Monomers} radiobutton and next click onto the \guilabel{Unmodify} button, then you'll un-modify absolutely \emph{all} the monomers, because you are not restricting the monomer targets neither by their code, neither by the identity of their potential modification. \bigskip The user is encouraged to play with these features\dots\ Also of great importance is to understand that the modifications that can be set to the monomers do disappear when the monomer is removed from the polymer sequence. These modifications are \emph{monomer modifications}, they belong to the monomer that is modified. We say that these modifications are \emph{intrinsic}. \subsection*{Chemical Modification Of The Polymer Sequence} We have seen above that it is possible to modify any monomer in the polymer sequence and that when the modified monomer is removed, the modification associated to it disappears also. The modifications that we describe here are not of this kind. They can be applied to either the left end of the polymer sequence or its right end. But these modifications do belong to the polymer sequence \textit{per se} and are not removed from it even if the polymer sequence is edited by removing the left end monomer or the right end monomer. We say that these \emph{polymer modifications} are \emph{permanent}. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-polymer-modif.png} \end{center} \caption[Modification of the left end of a polymer sequence]{\textbf{Modification of the left end of a polymer sequence} This figure shows how simple it is to permanently modify a polymer sequence on either or both its left/right ends. The permanent modifications currently set to a polymer sequence are conveniently listed in two text widgets located under the polymer sequence rendering area.} \label{fig:polyxedit-polymer-modif} \end{figure} The way in which a polymer sequence is modified using \emph{polymer modifications} is much easier than the previous \emph{monomer modifications} case. The modification window is opened by choosing the \guimenu{Chemistry}\guimenuitem{Modifications}\guimenuitem{Polymer} menu or the \guilabel{Edit} button below the polymer sequence rendering area. The Figure~\vref{fig:polyxedit-polymer-modif} shows that window. The modification is absolutely easy to perform, with a clear feedback provided to the user (by listing the permanent modifications in two convenient text widgets located under the polymer sequence graphical rendering area, under label \guilabel{Left and Right Ends' Modifications}). In the example (Figure~\vref{fig:polyxedit-polymer-modif}), the top polymer sequence is not yet modified. By using the window on the right, the polymer sequence is modified on its left end using the ``Acetylation'' modification. The newly modified polymer sequence is shown in the window below, with the left text widget displaying the name of the left end modification. The \guilabel{Unmodify} button is responsible for the un-modification of the selected polymer sequence end (left/right), so that reverting a modification is perfectly feasible. \renewcommand{\sectitle}{Finding and Replacing Sequence Motifs} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \label{sect:find-replace} It is very much often the case that one wants to find a given sequence motif quickly. \pxm allows this easily by selecting in the contextual menu the following menu item:\\ \centerline{\guimenu{Edit}\guimenuitem{Find Replace}} \medskip Using that menu item will provide an options window, as described in Figure~\vref{fig:polyxmass-find-replace-options-wnd}. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxmass-find-replace-options-wnd.png} \end{center} \caption[Find/Replace options window]{\textbf{Find/Replace options window.} This figure shows the window with which the user is provided when she performs a polymer sequence find/replace operation. The two sequence editing regions are full blown sequence editor widgets in which the user edits sequence motifs exactly the same way she edits a sequence in the polymer sequence editor. This allows for flexible find/replace operations.} \label{fig:polyxmass-find-replace-options-wnd} \end{figure} What is interesting with this Figure~\vref{fig:polyxmass-find-replace-options-wnd} is that it shows how flexible the functionality is: the user has two sequence editor widgets at hand. The left one \guilabel{Find Motif} is where the motif to find should be entered. The right one \guilabel{Replace Motif} is where the motif to be used in order to replace the found motif is edited. As visible on the right hand widget, the monomers entered in these two widgets might be modified (by chemical modification) or annotated (by monomer annotation) exactly in the same way as the user is used to do in the polymer sequence editor. The sequence editor widgets in Figure~\vref{fig:polyxmass-find-replace-options-wnd} are actually \emph{the same} as the ones that are located in the polymer sequence editor windows. Let us see some of the available options: \\ \begin{itemize} \item \guilabel{Start At Point} The find operation will not start from the very first monomer in the polymer sequence, but at the position where the cursor is located (\emph {the point}); \item \guilabel{Backward} Normally, the find operation is performed downstream of the current location; thus the next found motif will necessarily occur at positions in the polymer sequence greater than the current. With this option, however, it is possible to reverse the direction of the search. \guilabel{Backward} instructs the search engine to look for motifs in the upstream sequence with respect to the current location ; thus any found motif will be at a position lesser than the current position; \item \guilabel{Matching Strictness (M1 and M2 matching rules)} These matching rules will govern the way monomers in the polymer sequence are considered as matching the monomers in the \guilabel{Find Motif} motif sequence or how stringent the replacement using \guilabel{Replace Motif} should be: \begin{itemize} \item The \guilabel{Find} matching rules: \begin{itemize} \item \guilabel{M1 Identical To M2}: \guilabel{M1} is a given monomer in the polymer sequence and \guilabel{M2} is a monomer in the \guilabel{Find Motif} motif sequence; both monomer are being compared, and will be considered to actually match only if both are absolutely \emph{identical}; \item \guilabel{M2 Is Subset of M1}: \guilabel{M1} is a given monomer in the polymer sequence and \guilabel{M2} is a monomer in the \guilabel{Find Motif} motif sequence; both monomer are being compared, and will be considered to actually match if all the modification and/or note(s) present in M2 are found in M1, \emph{even if} M1 might contain other modification and/or note(s); \end{itemize} \item The \guilabel{Replace} matching rules: \begin{itemize} \item \guilabel{New Identical To M2}: \guilabel{New} is the monomer that will be in the polymer sequence after the replacement is performed and \guilabel{M2} is the monomer from the \guilabel{Replace Motif} sequence that was used to guide the replacement process; the new monomer will be identical to M2; \item \guilabel{New Superset Of M2}: \guilabel{New} is the monomer that will be in the polymer sequence after the replacement is performed and \guilabel{M2} is the monomer from the \guilabel{Replace Motif} sequence that was used to guide the replacement process; upon replacement all the modification and/or notes from M2 will be present in the \guilabel{New} monomer, but if the original monomer in polymer sequence had modification and/or notes not present in M2, then these will be retained; thus, \guilabel{New} will be a superset of \guilabel{M2}; \end{itemize} \end{itemize} \end{itemize} \noindent It is obvious that the \guilabel{Replace Motif} sequence might be empty when performing Find or Replace operations. The way Replace operations are performed is sequential: first the user clicks onto the \guilabel{Find} button. If a sequence element is found to match the \guilabel{Find Motif} sequence it is selected in the polymer sequence editor window. At this time the user might click onto the \guilabel{Replace} button. Once the replacement is performed, the search engine is automatically asked to find a new occurrence of the \guilabel{Find Motif} sequence, and so on\dots\ The user is invited to experiment with the series of options described above as these render the operations rather flexible. \renewcommand{\sectitle}{Cleavage Of Polymer Sequences} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \label{sect:cleave-polymer-sequences} It happens very often that polymer sequences get cleaved in a sequence-specific manner. These specific cleavages do occur very often in nature, and are made by enzymes that do cleave biopolymer sequences, like the glycosidases (cleaving saccharides), the proteases (cleaving proteins) or the nucleases (cleaving nucleic acids). But the scientist also uses purified enzymes to perform such cleavages in the test tube. \pxm\ must be able to perform thoses cleavages \textit{in silico}. Let's see how a polymer sequence can be cleaved using \pxm. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-cleave-options.png} \end{center} \caption[Cleavage options window]{\textbf{Cleavage options window.} This figure shows the window with which the user is provided when she performs a polymer sequence cleavage. The user can select one cleavage specification and specify what level of partial cleavage the chemical cleavage should perform.} \label{fig:polyxedit-cleave-options} \end{figure} It is a matter of having a polymer sequence opened in an editor window and selecting the \guimenu{Chemistry}\guimenuitem{Cleave} menu. The user is provided with a window where a number of cleavage specifications are listed (Figure~\vref{fig:polyxedit-cleave-options}). These cleavage specifications are listed by looking into the polymer chemistry definition corresponding to the polymer sequence to be cleaved. The program knows, for example, that the polymer sequence to be cleaved is of the ``protein'' chemistry type, and thus will list all the cleavage specifications that were defined in the ``protein'' polymer chemistry definition. The cleavage specifications are available for the user to select one of them to perform the cleavage. The user selects the cleavage specification of interest and also sets the number of partial cleavages that the cleaving agent may yield. In our example, \cfgval{2} was entered, which means that the cleavage reaction will yield the set of oligomers corresponding to a total cleavage (no missed cleavages=partial cleavages 0) along with the set of oligomers corresponding to 1 missed cleavage and to 2 missed cleavages. The calculating process is extremely rapid, so the user may enter rather high values here. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-cleave-results-wnd-seq-tab.png} \end{center} \caption[Cleavage-generated oligomers window]{\textbf{Cleavage-generated oligomers window.} This figure shows the window that is opened so that the oligomers generated upon cleavage of a polymer sequence can be displayed. Other data are also displayed (see text for details).} \label{fig:polyxedit-cleave-results-wnd-seq-tab} \end{figure} Upon successful termination of the cleavage reaction, the user is provided with a new window (Figure~\vref{fig:polyxedit-cleave-results-wnd-seq-tab}) in which all the oligomers that were generated are listed (upper pane). The listview widget on the upper pane sports a number of columns. Each row of this listview widget describes the properties of a single oligomer. The different columns are detailed below: \begin{itemize} \item \guilabel{Part. Cleav.} This is the missed cleavage level for which the oligomer was generated; \item \guilabel{Number} This is the number of the oligomer, so that the user may refer to it simply. The syntax is simple: p\emph{x}-n\emph{y} means that this oligomer is the oligomer number \emph{y} from the set of oligomers obtained in the \emph{x}-missed cleavages series; \item \guilabel{Coordinates} These are the coordinates of the oligomer as it is occurring in the polymer sequence that was cleaved in the first place. For example, ``[19-38]'' would mean that the oligomer starts at position~19 and ends at position~38 of the polymer sequence, both values being inclusive; \item \guilabel{Mono Mass} This is the monoisotopic mass of the oligomer, computed using the options that are set in the \guilabel{Calculation Options} window (see above); \item \guilabel{Avg Mass} Same as above, but for the average mass; \item \guilabel{Modified} Indicates if the oligomer contains an intrinsically-modified monomer (it does not mean that the modification's mass was taken into account, it simply says that at least one monomer is modified in the oligomer. See below for details). \end{itemize} \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-cleave-results-oligodata-tab.png} \end{center} \caption[Cleavage-generated oligomers' data]{\textbf{Cleavage-generated oligomers' data.} This figure shows the notebook tab in which data pertaining to a selected oligomer are displayed. In particular, this tab contains a listview where monomer modifications of the selected oligomer (if any) are displayed.} \label{fig:polyxedit-cleave-results-oligodata-tab} \end{figure} \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-cleave-results-cleavedata-tab.png} \end{center} \caption[Cleavage specification data]{\textbf{Cleavage specification data.} This figure shows the notebook tab in which data pertaining to the cleavage operation are displayed.} \label{fig:polyxedit-cleave-results-cleavedata-tab} \end{figure} \noindent The lower pane of the \guilabel{Cleavage Results} window contains a number of additional data, displayed in a set of pages belonging to the \guilabel{Selected Oligomer Data} notebook widget: \begin{itemize} \item \guilabel{Sequence} (Figure~\vref{fig:polyxedit-cleave-results-wnd-seq-tab}) This is the sequence that is displayed when an oligomer is selected in the listview displaying the oligomers (in the upper pane); \item \guilabel{Oligomer Data} (Figure~\vref{fig:polyxedit-cleave-results-oligodata-tab}) This is the place where monomer modifications are listed as soon as an oligomer that contains modified monomers is selected in the listview. Note that each modified monomer in the selected oligomer will show up as a row in this listview. \item \guilabel{Cleavage Data} (Figure~\vref{fig:polyxedit-cleave-results-cleavedata-tab}) This is the place where the cleavage operation configuration is reported, so that each cleavage results' displaying window is self-traceable to both the cleavage configuration and the polymer sequence that was cleaved in the first place. \end{itemize} The button labelled \guilabel{Find} will allow the user to find masses in the oligomers that were generated upon the cleavage reaction simulation (see section~\vref{sect:find-masses-in-results}) \renewcommand{\sectitle}{Fragmentation Of Polymer Sequences} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \label{sect:fragmentation-polymer-sequence} It happens very often that polymer sequences need to be fragmented in the gas phase (in the mass spectrometer) so that structure characterizations may be performed. For protein chemistry, this happens very often in order to get sequence information for a given peptide ion selected in the gas phase. \pxm\ must be able to perform those fragmentations \textit{in silico}. Let's see how a polymer sequence can be fragmented using \pxm. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-fragment-options.png} \end{center} \caption[Fragmentation options window]{\textbf{Fragmentation options window.} This figure shows the window with which the user is provided when she performs a polymer sequence fragmentation. The user can select one or more fragmentation specifications (patterns).} \label{fig:polyxedit-fragment-options} \end{figure} It is a matter of having a polymer sequence opened in an editor window and selecting the sequence region to be fragmented. Once this is done, the user selects the \guimenu{Chemistry}\guimenuitem{Fragment} menu. The user is provided with a window where a number of fragmentation specifications are listed (Figure~\vref{fig:polyxedit-fragment-options}). These fragmentation specifications are listed by looking into the polymer chemistry definition corresponding to the polymer sequence to be fragmented. The program knows, for example, that the polymer sequence to be cleaved is of the ``protein'' chemistry type, and thus will list all the fragmentation specifications that were defined in the ``protein'' polymer chemistry definition. The user selects the fragmentation specification(s) of interest and clicks the \guilabel{Fragment} button. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-fragment-results.png} \end{center} \caption[Cleavage-generated oligomers window]{\textbf{Fragmentation-generated oligomers window.} This figure shows the window that is opened so that the oligomers generated upon fragmentation of a polymer sequence can be displayed.} \label{fig:polyxedit-fragment-results} \end{figure} Upon successful termination of the fragmentation reaction, the user is provided with a new window (Figure~\vref{fig:polyxedit-fragment-results}) in which all the oligomers that were generated are listed (upper pane). The listview widget on the upper pane sports a number of columns. Each row of this listview widget describes the properties of a single oligomer. The different columns are detailed below: \begin{itemize} \item \guilabel{Frag. Spec.} This is the name of the fragmentation specification that was used to compute the corresponding fragment; \item \guilabel{Name} This is the name of the oligomer, so that the user may refer to it simply. The syntax is simple: \emph{x}-\emph{y} means that this oligomer is the oligomer number \emph{y} from the fragmentation specification \emph{x}; \item \guilabel{Mono Mass} This is the monoisotopic mass of the oligomer, computed using the options that are set in the \guilabel{Calculation Options} window (see earlier explanations); \item \guilabel{Avg Mass} Same as above, but for the average mass; \item \guilabel{Modified} Indicates if the oligomer contains an intrinsically-modified monomer (it does not mean that the modification's mass was taken into account, it simply says that at least one monomer is modified in the oligomer. See below for details). \end{itemize} The \guilabel{Sequence}, \guilabel{Oligomer Data} and \guilabel{Fragmentation Data} pages of the notebook in the \guilabel{Selected Oligomer Data} frame widget are conceptually identical to the ones described at the section~\vref{sect:cleave-polymer-sequences}). The button labelled \guilabel{Find} will allow the user to find masses in the oligomers that were generated upon the fragmentation reaction simulation (see section~\vref{sect:find-masses-in-results}). \renewcommand{\sectitle}{Finding Masses In The Results} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \label{sect:find-masses-in-results} \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-fragres-mass-find-options.png} \end{center} \caption[Finding masses in a set of oligomers]{\textbf{Finding masses in a set of oligomers.} This figure shows how to ask that masses be found in a set of oligomers that result, for example, from the fragmentation of a polymer sequence.} \label{fig:polyxedit-fragres-mass-find-options} \end{figure} It is often necessary to make sure that a mass --observed in the real mass spectrum-- actually corresponds to an oligomer that was generated during a previous simulation experiment (like a cleaving of the polymer sequence with a given cleavage agent or a fragmentation of a simple mass searching operation --see section~\vref{sect:search-masses-polymer-sequence}). To allow this, and as shown in Figures~\vrefrange{fig:polyxedit-cleave-results-wnd-seq-tab}% {fig:polyxedit-fragment-results}, it is possible to ask that masses be found into the oligomers resulting from any previous simulation (cleavage or fragmentation of a polymer sequence or arbitrary mass search operations). Indeed, the button labelled \guilabel{Find} will open a window where the user may enter masses to be found. The Figure~\vref{fig:polyxedit-fragres-mass-find-options} illustrates how easy it is to defines the mass(es) to be found in a set of oligomers, either in the monoisotopic mass list or in the average mass list. There are two ways to actually trigger the mass finding operation: \begin{itemize} \item When the \guilabel{Unique Mass Find Mode} checkbutton \emph{is} checked: the user must enter one mass in the single-line text entry widget and hitting the \guilabel{Find} button or the \kbdKey{ENTER} issues the ``Find Mass'' request. For this to happen properly, it is necessary that only one of the two single-line text entry widgets be filled with a mass (either monoisotopic or average). This is because if there are two masses entered in the widgets, the program would not know which one of the monoisotopic or average masses is to be found in the set of oligomers. \item When the \guilabel{Unique Mass Find Mode} checkbutton is \emph{not} checked: the user may enter masses in whatever the single- or multi-line widgets (either by keying-in one mass per line or by pasting a preformatted list of masses). In the present case, hitting the \kbdKey{ENTER} key will trigger the ``multi-mass'' mass finding operation only if the \guilabel{Find} button has the focus. A click onto the \guilabel{Find} button will do! \end{itemize} \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-fragres-mass-find-options-tolerances.png} \end{center} \caption[Tolerances available in finding masses]{\textbf{Tolerances available in finding masses.} This figure shows the three different ways that tolerances can be configured.} \label{fig:polyxedit-fragres-mass-find-options-tolerances} \end{figure} Prior to asking that masses be found, it is required that tolerances be entered for either monoisotopic or average masses (or both if both kinds of masses are of interest) in their respective text entry widget. In the example of Figure~\vref{fig:polyxedit-fragres-mass-find-options}, the tolerance that is given to the mass finding operation on monoisotopic masses is of \cfgval{0.1}~amu, while the one for the average masses is greater (\cfgval{1}~amu). These values must be understood in a ``broad'' manner (\emph{i.e.}~$\pm$~tolerance): for example, if we searched for a mass \cfgval{1000} with a \cfgval{0.5}~amu tolerance, we would get all the oligomers having masses ranging [$\mathrm {1000-0.5\,\rightarrow \,1000+0.5}$] (which is [999.5--1000.5] \emph{and not [999.75--1000.25]}). The Figure~\vref{fig:polyxedit-fragres-mass-find-options-tolerances} shows that there are two other means to define the tolerance with which masses should be found. They all are self-explanatory and should also be understood in the same ``broad'' manner described above. The oligomers that were found to comply with the masses to find and with the tolerances defined are displayed in a window similar to the one shown in Figure~\vref{fig:polyxedit-fragres-mass-find-results}. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-fragres-mass-find-results.png} \end{center} \caption[Finding masses in a set of oligomers]{\textbf{Finding masses in a set of oligomers.} This figure shows oligomers that were found in a set of oligomers after a mass finding operation has been performed.} \label{fig:polyxedit-fragres-mass-find-results} \end{figure} Note that here also the traceability of the data is ensured using unambiguous identity numbers (\guilabel{Results' Set ID Number}). This identity number is unique and describes the results window in which the user has asked that masses be found (see Figure~\vref{fig:polyxedit-fragres-mass-find-options}). \renewcommand{\sectitle}{Searching Masses In The Polymer Sequence} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \label{sect:search-masses-polymer-sequence} It may happen that the scientist needs to know if some polymer sequence region would have a given mass. \pxm\ allows for mass searching operations in the polymer sequence. This is done by using the menu \guimenu{Chemistry}\guimenuitem{Search Mass(es)}. The window illustrated in Figure~\vref{fig:polyxedit-search-mass-options} shows up and the user enters masses to search for (see section\vref{sect:find-masses-in-results} for details on the workings of a very similar window). \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-search-mass-options.png} \end{center} \caption[Searching masses in a a polymer sequence]{\textbf{Finding masses in a polymer sequence.} This figure shows how to ask that masses be searched in a polymer sequence.} \label{fig:polyxedit-search-mass-options} \end{figure} Once the masses have been searched, if results are found they are displayed in the window shown in Figure~\vref{fig:polyxedit-search-mass-results}. This window has very similar characteristics to the ones of the previously described results' windows (see section~\vref{sect:cleave-polymer-sequences}, for example). \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-search-mass-results.png} \end{center} \caption[Results window after searching masses in a a polymer sequence]{\textbf{Results window after searching masses in a a polymer sequence.} This figure shows the oligomers that were found upon a mass search operation.} \label{fig:polyxedit-search-mass-results} \end{figure} The button labelled \guilabel{Find} will allow the user to find masses in the oligomers that were generated upon the mass searching operation (see section~\vref{sect:find-masses-in-results}). \renewcommand{\sectitle}{The acido-basic calculations: pH, pI and charges} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \label{sect:acido-basic-calculations} When preparing biochemical experiments, very often users need to know how many charges a given polymer sequence will bear at any given pH. Equally important is the ability to know at which pH value the polymer sequence will have a net charge near to zero. The pH value for which a given polymer sequence has a net charge near to zero (typically this means that the number of positive charges equals the number of negative charges) is called the isoelectric point ---the pI. Such computations are pretty computer-intensive and require a very precise knowledge of the chemical structure of the different monomers that take part in the definition of the polymer chemistry. A file, called \filename{acidobasic.xml} is located in the polymer chemistry definition directory. This file lists all the chemical groups that are possibly charged; each monomer of the polymer definition is represented by a \verb|<mnm>| element in which data are defined for any chemical group of that monomer that might bear a charge at any given pH. You can find the listing of the \filename{acidobasic.xml} file in chapter\vref{chap:appendices}. We'll discuss any aspect of this file's contents in the next sections with enough detail that the user will be able to write one such file for her specific polymer chemistry. At the moment, two entities in the polymer chemistry definition might have chemical groups bearing charges: monomers and modifications. We will first review monomers, and modifications next. \subsection*{Monomers might have ionized chemical group(s)} \subsubsection*{Some theory first} Monomers are the building blocks of polymer sequences. These blocks must have at least two reactive groups so that they can be polymerized into a polymer sequence thread. Reactive groups are often chargeable groups; for example, the amino group of amino-acids is such that it gets protonated (positively charged) at a pH inferior to its pKa (that is a physiological pH). Similarly, the carboxylate group ---that is the other reactive group of amino-acids--- is charged at physiological pH: it is in its carboxylate form (that is singly negatively charged; $\rm COO^-$) instead of being in its carboxylic form (that is non-charged; $\rm COOH$). \begin{figure} \begin{center} \includegraphics[scale=0.9] {figures/raster/protein-monomers-acidobasic.png} \end{center} \caption[Different pKa values for a number of amino-acids' chemical groups]{\textbf{Different pKa values for a number of amino-acids' chemical groups.} All of the twenty amino-acids are represented here, which each amino-acid's lateral chain fully represented. Above each chemical group ---for which the value makes sense from a biological perspective--- the pKa value is indicated.} \label{fig:protein-monomers-acidobasic} \end{figure} For the non-biochemist reader, amino-acids involved in the formation of proteins have always at least two chemical groups that are of inverted electrical charge, at physiological pH values (see Figure~\ref{fig:protein-monomers-acidobasic}): \begin{itemize} \item The amino group (called $\rm \alpha NH_2$) has a typical pKa value of 9.6. This means that, at physiological pH values (between 6.5 and 7.5), the amino group will find the environment rather acidic, and will thus be protonated, leading to a positively-charged species ($\rm \alpha NH_3^+$); \item The carboxylic group (called $\rm \alpha COOH$) has a typical pKa value of 2.35. This means that, at physiological pH values, the carboxylic group will be in a rather basic environment, and will thus be deprotonated, leading to a negatively-charged species ($\rm \alpha COO^-$). \end{itemize} \noindent It should be clear that, at physiological pH values the two $\rm \alpha$ chemical groups have a net charge of 0. But proteins are charged, and this is because some of the twenty common amino-acids have other chemical groups beyond the two others already described. Indeed, some amino-acids have lateral chains that bear groups that might be charged depending on the pH: seryl residues have an alcohol group that has a pKa of 13, for example; that means that it is almost always uncharged (form ROH at physiological pH values). The lateral chain of lysine has a pKa of 10.53, which means that at pH values below this pKa value, the $\rm \epsilon NH_2$ gets protonated, introducing a positive charge in the protein. Similarly, amino-acids glutamate and aspartate do have a lateral chain ended with a $\rm \gamma COOH$ and a $\rm \beta COOH$, respectively. Their pKa values are below 4.5, and thus the groups are negatively charged a physiological pH values. When the net charge of a polymer sequence has to be computed for a given pH condition, the program iterates in the sequence, and for each monomer will check which one of its chemical group(s) is possibly charged. For this to happen, it is required that a number of data be known for each monomer's chemical group that might play a role in the determination of the polymer sequence's electrical charge. Thus, for each chemical group a number of data should be listed in the \filename{acidobasic.xml} file (please, see that file in the chapter\vref{chap:appendices}): \begin{itemize} \item the chemical group's <name> element is required. {\footnotesize Examples: ``$\rm \alpha NH_2$'' or ``$\rm \epsilon NH_2$'' or ``$\alpha$COOH'';} \item the chemical group's <pka> element is optional, but is the basis for the charge calculation. {\footnotesize Examples: 9.6 for the ``$\alpha$NH$\rm _2$'' or 2.35 for ``$\alpha$COOH'';} \item the <acidcharged> element is required if the <pka> element is given. This element is responsible for telling if the chemical group is charged (positively) when the pH is lower than pKa (that is when the medium is acidic with respect to the pKa). {\footnotesize Examples: an amine is positively charged when it is in its acidic form (protonated); a carboxylic acid is \emph{not} charged when it is in its acidic form;} \item there can be none, one or more <polrule> element(s) for each chemgroup. The <polrule> element gives informations about the way the chemical group at hand might be ``trapped'' (or not) in the formation of inter-monomer bonds (while the monomer is polymerized into the polymer sequence). The value ``left\_trapped'' means that the chemical group ceases to be involved in charge calculations as soon as it has a monomer at its left end. The value ``right\_trapped'' means the same as above, but when a monomer is polymerized at its right end. For a chemical group that is ``left\_trapped'', we understand that it is only effectively evaluated if it is at the left end of the polymer sequence, since in this case it does not have a monomer at its left side. Conversely, a chemical group that has a <polrule> element with value ``right\_trapped'', will be evaluated only if the monomer is actually the right end monomer in the polymer sequence. Finally, the typical lateral chains of amino-acids have a <polrule> element with a value ``never\_trapped'', as these chemical groups do not take part in the formation of the inter-monomer bond; \item there can be none, one or more <chemgrouprule> element(s) for each chemgroup. A chemgrouprule element should contain the following: \begin{itemize} \item there must be an <entity> element that indicates what is the chemical entity being dealt with in the current chemgroup element. Valid values for this element are ``LE\_PLM\_MODIF'', ``RE\_PLM\_MODIF'' or ``MNM\_MODIF''; \item there must be a <name> element naming the chemical entity properly; \item there must be an <outcome> element telling what action should be taken when encountering the <entity> on the chemgroup. Valid values are either ``LOST'' or ``PRESERVED''. \end{itemize} \end{itemize} \subsubsection*{Understanding by example} Let us take some examples in order to make sure we actually understand the process of describing how an electrical net charge is calculated for a given polymer sequence and at any given pH value. Let us see the example of the aspartate amino-acid, of which the lateral chain is nothing but $\rm CH_2COOH$: \begin{alltt} <mnm> <code>D</code> <chemgroup> <name>N-term NH2</name> <pka>9.6</pka> <acidcharged>TRUE</acidcharged> <polrule>left_trapped</polrule> <chemgrouprule> <entity>LE_PLM_MODIF</entity> <name>Acetylation</name> <outcome>LOST</outcome> </chemgrouprule> </chemgroup> <chemgroup> <name>C-term COOH</name> <pka>2.36</pka> <acidcharged>FALSE</acidcharged> <polrule>right_trapped</polrule> </chemgroup> <chemgroup> <name>Lateral COOH</name> <pka>3.65</pka> <acidcharged>FALSE</acidcharged> <polrule>never_trapped</polrule> <chemgrouprule> <entity>MNM_MODIF</entity> <name>AmidationAsp</name> <outcome>LOST</outcome> </chemgrouprule> </chemgroup> </mnm> \end{alltt} \noindent We see that the code of the monomer for which acid-basic data are being defined is `D' and that this monomer has three chemical groups that might bring electrical charges. These chemical groups are described by three \verb|<chemgroup>| elements that we will review in detail below (see Figure~\vref{fig:protein-monomers-acidobasic}). \medskip The first \verb|<chemgroup>| element is related to the $\rm \alpha NH_2$ amino group of the amino-acid: \begin{itemize} \item \verb|<name>N-term NH2</name>| The name of the chemical group is not immediately useful, but will be used when reports are to be prepared for the calculation; \item \verb|<pka>9.6</pka>| This element is optional. However, of course, if the chemical group might be electrically charged, the pKa value will be essential in order to compute the charge that is brought by this chemical group at any given pH; \item \verb|<acidcharged>TRUE</acidcharged>| This element is also optional, however, if the previous element is given, then this one is compulsory. Telling if the conjugated acid form is charged (that is protonated) is essential in order to know what sign the charge has to be when the chemical group is ionized. The value ``TRUE'' indicates that when the pH is lower than the pKa, the chemical group is charged, thus protonated (in the form $\rm NH_3^+$). Consequently, if the pH is higher than the pKa, then the chemical group is neutral (in the form $\rm NH_2$); \item \verb|<polrule>left_trapped</polrule>| This element indicates that the chemical group should only be taken into account in the eventuality that the monomer bearing it (code `D') is the left end monomer of the polymer sequence. This can easily be understood, as this chemical group is responsible for the establishment of the inter-monomer bond towards the left end of the polymer sequence; \item \verb|<chemgrouprule>| This element provides further details on the chemistry that the chemical group at hand ($\rm \alpha NH_2$) might be involved in: \begin{itemize} \item \verb|<entity>LE_PLM_MODIF</entity>| This element indicates that the supplementary data in the current \verb|<chemgrouprule>| element are pertaining to the $\rm \alpha NH_2$ chemical group \emph{only} in case the polymer sequence is left end-modified (that is with a permanent left end modification) and the monomer (code `D') is located at the left end of the polymer sequence (that is: it is the first monomer of the sequence for which the electrical charge ---or pI--- computation is to be performed). \item \verb|<name>Acetylation</name>| This element goes further in the detail of the potential chemistry of the $\rm \alpha NH_2$ chemical group: if the left end permanent modification is ``Acetylation'', then the current chemgrouprule element can be further processed, otherwise it should be abandoned; \item \verb|<outcome>LOST</outcome>| This element actually indicates what should be done with the chemical group for which the chemgrouprule is being defined. What we see here is: ---\textsl{``If the $\rm \alpha NH_2$ chemical group, belonging to a `D' monomer located at the left end of a polymer sequence, is modified permanently with an ``Acetylation'' left end modification, it should not be taken into account when computing the charge that it could bring to the polymer sequence.''} \end{itemize} \end{itemize} The second \verb|<chemgroup>| element is related to the $\rm \alpha COOH$ carboxylic group of the amino-acid: \begin{itemize} \item \verb|<name>C-term COOH</name>| Same remark as above; \item \verb|<pka>2.36</pka>| Same remark as above; \item \verb|<acidcharged>FALSE</acidcharged>| Same remark as above. However, as we can see, the value indicates that the acid conjugate (form $\rm COOH$) does not bring any charge. This means that when the basic conjugate is predominant (that is when pH > pKa), it brings a negative charge: the form is $\rm COO^-$; \item \verb|<polrule>right_trapped</polrule>| The chemical group should not be evaluated if a monomer is linked to it at its right side. That means that the current chemical group is only evaluated if the monomer bearing it is located at the right end of the polymer sequence. This is easily understood, as the $\rm \alpha COOH$ chemical group is involved in the formation of the inter-monomer bond towards the right end of the polymer sequence. \end{itemize} The third \verb|<chemgroup>| element is related to the $\rm \beta COOH$ carboxylic group of the amino-acid: \begin{itemize} \item \verb|<name>Lateral COOH</name>|; \item \verb|<pka>3.65</pka>|; \item \verb|<acidcharged>FALSE</acidcharged>|; \item \verb|<polrule>never_trapped</polrule>| This element indicates that, whatever the position of the monomer bearing the chemical group in the polymer sequence (left end, right end or middle), the chemical group is to be evaluated; \item \verb|<chemgrouprule>| This element provides further details on the chemistry that the chemical group at hand ($\rm \beta COOH$) might be involved in: \begin{itemize} \item \verb|<entity>MNM_MODIF</entity>| This element indicates that the supplementary data in the current \verb|<chemgrouprule>| element are pertaining to the $\rm \beta COOH$ chemical group \emph{only} in case the monomer bearing the chemical group is chemically modified; \item \verb|<name>AmidationAsp</name>| This is the modification by which the monomer should be modified in order to have the \verb|<chemgrouprule>| element effectively evaluated; \item \verb|<outcome>LOST</outcome>| This element actually indicates that if the monomer bearing the chemical group is modified with an ``AmidationAsp'' chemical modification, then the chemical group should not be evaluated any more for the electrical charge ---or pI--- calculations, since reacting a carboxylate group with an amino group produces an amide group which is not easily chargeable at physiological pH values. \end{itemize} \end{itemize} \noindent At this point we should have made it clear how the charge calculations can be configured for the different monomers in the polymer chemistry definition. As usual, the more the polymer chemistry definition is sophisticated, the more sophisticated the computations allowed. \subsection*{Modifications might have ionized chemical group(s)} In the excerpt from the \filename{acidobasic.xml} file below, we see that chemical modifications can also bring charges. The example of the chemical modification ``Phosphorylation'' shows that when a monomer is phosphorylated, two chemical groups are brought in: the first has a pKa value of 12 (that is it will always be protonated at physiological pH values), the second has a pKa value of 7 (that is it will be divided by half in a protonated (not charged) form and in an un-protonated (negatively charged) form, leading to a net electrical charge of $\rm -0.5$. \begin{alltt} <modifs> <mdf> <name>Phosphorylation</name> <chemgroup> <name>none_set</name> <pka>12</pka> <acidcharged>FALSE</acidcharged> </chemgroup> <chemgroup> <name>none_set</name> <pka>7</pka> <acidcharged>FALSE</acidcharged> </chemgroup> </mdf> </modifs> \end{alltt} \noindent At this point we should be able to study the way computations are actually performed in the \pxe module. \subsection*{Performing pH, pI and charges computations} The user willing to compute charges (positive, negative, net) or the isoelectric point for the current polymer sequence uses the contextual menu \guimenuitem{pKa-pH-pI}\guimenuitem{Computations} which triggers the appearance of the window shown in Figure~\vref{fig:polyxedit-acidobasic-wnd}. \begin{figure} \begin{center} \includegraphics[scale=2] {figures/raster/polyxedit-acidobasic-wnd.png} \end{center} \caption[Acido-basic computations: pKa, pH, pI]{\textbf{Acido-basic computations: pI, pH, pKa.} This figure shows the options that can be set for the calculations related to the charges beared by the polymer sequence.} \label{fig:polyxedit-acidobasic-wnd} \end{figure} This figure shows that the user might either compute the charges (positive, negative and net) for the polymer sequence by setting the \guilabel{pH} value at which the computation should take place and clicking onto the \guilabel{Compute Net Charge} button, or ask that the isoelectric point be computed \textit{ex nihilo} by clicking onto the \guilabel{Compute Isoelectric Point} button (in which case the \guilabel{pI} text entry widget will display the pH at which the \guilabel{Net Charge Of The Polymer Sequence} will be near to \guival{0}. Clicking onto the \guilabel{Compute Isoelectric Point} will trigger computations that are lengthy, and the user is advised to be patient. As an example, on my computer,\footnote{My \filename{/proc/cpuinfo} and \filename{/proc/meminfo} say ``Intel(R) Pentium(R) M processor 1400MHz; cpu family: 6; model: 9; 1024 KB cache size; 774376 kB of memory; bogomips: 2768.89''.} the pI computation for a protein of 10201 residues took 10 seconds (no modifications taken into account). If the user asks that the different modifications (permanent polymer modifications and monomer modifications) be taken into account, the duration of the computation is twice as long (23 seconds). Note that the computations might involve the permanent left/right modifications of the polymer sequence, as well as the monomer chemical modifications. To configure the way net charge ---or pI--- computations are performed, please use the calculations engine configuration window, as described in Figure~\vref{fig:polyxedit-calc-engine-options-wnd}. \renewcommand{\sectitle}{The m/z Ratio Calculator} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} When requiring m/z ratio calculations the user might use the \centerline{\guimenu{Chemistry}\guimenuitem{m/z Ratio Calculations}} contextual menu that shows up when the user right-clicks onto the polymer sequence. Note that the process of using the calculator was described in Section~\vref{sect:polyxcalc-mz-ratio-calculator}. When the calculator is used in \pxe, the initial ionization status data are set from the currently defined ionization rules (see the \guilabel{Ionization Rules} frame in the window displayed in Figure~\vref{fig:polyxedit-calc-engine-options-wnd}) of the polymer sequence for which the computations are to be performed. \renewcommand{\sectitle}{The Self-Read Feature Of Polymer Sequences} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \label{sect:self-read-feature-of-polymer-sequences} It happens some times that the user needs somebody to read a sequence while he double-checks the sequence being read. I have been confronted to that situation a number of times (in particular when having to confirm oligonucleotidic sequences), and finally decided that I would give polymer sequences a ``self-reading'' ability. The basis of the self-reading framework is as simple as the writing of (yet) another dictionary that makes ---for each polymer chemistry definition--- the correspondence between a chemical entity and a sound file that should be played in order to ``read out'' a polymer sequence. Two chemical entities are able to read themselves out: \begin{itemize} \item Monomers: the user may define two sound files for each monomer of the polymer chemistry definition: a sound file vocalizing the monomer name (``alanine'', for example), and another sound file vocalizing the monomer code (``A'' or ``Ala'', for example). \item Modifications: the user may define only one sound file for each modification of the polymer chemistry definition. \end{itemize} Selecting the \centerline{\guimenu{Edit}\guimenuitem{Export Sound Playlist}} menu item in the contextual menu that pops up when the user right-clicks onto the polymer sequence will trigger the window displayed in Figure\vref{fig:polyxedit-sequence-self-read-wnd} to show up. \begin{figure} \begin{center} \includegraphics[width=0.8\textwidth] {figures/raster/polyxedit-sequence-self-read-wnd.png} \end{center} \caption[Polymer Sequence Self-Read Options]{\textbf{Polymer Sequence Self-Read Options.} This figure shows the options that can be set for the polymer sequence to read itself out to a playlist file.} \label{fig:polyxedit-sequence-self-read-wnd} \end{figure} If a polymer sequence region is selected when the menu above is selected, then the positions of the monomers delimiting that region are displayed in the \guilabel{Define Self-Reading Sequence Interval} frame. If the user changes the selection in the sequence editor, these values can be updated by clicking onto the \guilabel{Sequence Region} button. It is, however, possible to ask that the whole polymer sequence be self-spoken out by clicking onto the \guilabel{Whole Sequence} checkbutton. The polymer sequence self-reading feature allows to select if monomer codes or monomer names should be vocalized in the sequence, and if the monomer modifications should be vocalized also. Finally, the \guilabel{Temporal Segmentation} frame lets the user define how the files corresponding to the monomers' code/name (and modifications' name, if so is required) are played. Specifically, it is possible to ask that silences be interspersed between the sounds corresponding to the chemical entities being self-spoken out. Silent delays are played exactly in the same manner as the other chemical entities' sounds (that is: a silent delay is played as a ``silence sound'' file\dots). The user might ask that the sequence read-out be interspersed with the following silent delays: \begin{itemize} \item \guilabel{Start Self-Reading After \ovalbox{x} Silent Slices}: the ``silent sound'' file is played the specified number of times before the sequence starts to read itself out. If the ``silent sound'' file is 300~milliseconds-long, and the user wants a 1~second delay before the sequence actually begins to read itself out, the number asked would be 3; \item \guilabel{Inter-Monomer Delay Of \ovalbox{x} Silent Slices}: a silent delay will be inserted between each monomer sound; \item \guilabel{Extra Delay Of \ovalbox{x} Silent Slices Every Other Monomer \ovalbox{y}}: it might be useful, sometimes, to insert a silent delay each time a given number of monomers have been spelled. This is particularly interesting when nucleic acids sequences read themselves out, so that a ``reading frame'' is conserved all along. One would thus set the silent delay to be inserted every three monomers\dots \end{itemize} The user indicates the name of the file where the playlist is to be written. It is advisable to use the \fileformat{m3u} file extension so that the sound player will recognize that file as a playlist file. Indeed, \pxm does not generate sounds on the sound card. All it does is write a sound playlist that the user later hands out to a sound player, like \software{xmms} or \software{winamp}. The \fileformat{m3u} file format is pretty easy: it is a list of files to be played in succession. Note that for the sequence to be properly spoken-out at that step, the ``shuffle'' feature of the player should be disabled. The following is the contents of the \filename{sequence.m3u} playlist file that was obtained by having a protein sequence read itself out: \begin{verbatim} /usr/share/polyxmass/polchem-defs/protein/sounds/glutamate.ogg /usr/share/polyxmass/polchem-defs/protein/sounds/glutamate.ogg /usr/share/polyxmass/polchem-defs/protein/sounds/aspartate.ogg /usr/share/polyxmass/polchem-defs/protein/sounds/silence.ogg /usr/share/polyxmass/polchem-defs/protein/sounds/phenylalanine.ogg /usr/share/polyxmass/polchem-defs/protein/sounds/serine.ogg /usr/share/polyxmass/polchem-defs/protein/sounds/phospho.ogg \end{verbatim} Note that the last serine monomer is phosphorylated and that the user asked that an interval be played every three monomers. The correspondence between a given monomer (or modification) and the sound it should use to read itself out is performed in a text file (\filename{sounds.dic}) located in the \filename{sounds} directory itself located in the polymer chemistry definition data directory. See the chapter about \pxmcommon for details. \renewcommand{\sectitle}{Results Reporting} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \pxe allows the user to perform a great number of different simulations on any number of polymer sequences opened at any given time. While the simultaneity of simulations (for example having at one given time different enzymatic cleavages on a set of different proteins) is necessary, as a simple matter of flexibility and power, it is necessary to perform well-organized results reporting. The reports might be asked for any window that displays results. For example, a window that displays a polymer sequence (the polymer sequence editor, in fact) is a results window as it displays a sequence. A window displaying the oligomers obtained upon cleavage of a polymer sequence with a chemical cleavage agent is also a results window. As we have seen earlier, each results window is registered to the program and its specifics are stored in items visible in the \guilabel{Available Windows} treeview of the window management window shown in Figure~\vref{fig:polyxmass-window-management}. The configuration of the way reports are prepared takes place in the polymer sequence context. The polymer sequence editor window menu \centerline{\guimenu{Reporting}\guimenuitem{Reporting Options}\\} will open a window as depicted in Figure~\vref{fig:polyxedit-reporting-opt-wnd}. \begin{figure} \begin{center} \includegraphics[scale=1.75] {figures/raster/polyxedit-reporting-opt-wnd.png} \end{center} \caption[The reporting options configuration]{\textbf{The reporting options configuration.} The configuration of the way window contents are reported is highly configurable. The configuration will affect the way the polymer sequence's data are reported, but also the way oligomers' data are reported and monomers'. Each tab of the depicted window deals with each one of these configuration options..} \label{fig:polyxedit-reporting-opt-wnd} \end{figure} Once the reporting options are configured in a polymer sequence editing context, they automaticall apply for all the results windows in the same polymer sequence editing context. The reporting options are always modifiable using the same menu as above. Once the configuration of the reporting options is performed, the user might use the \centerline{\guimenu{Reporting}\guimenuitem{Make Reports}\\} menu to elicit the opening of the window management window, where the following reporting actions are made avaiblable through button widgets. After selecting a particular window item from the \guilabel{Available Windows} treeview, it becomes possible to ask that the selected window exports a report about its contents. The report can be sent to the clibpoard or to a file (in append or overwrite mode) by using the corresponding button widgets in the window management window: \begin{itemize} \item \guilabel{Report To Clipboard:} {\footnotesize ask that the window contents be exported to the clipboard;} \item \guilabel{Overwrite To File:} {\footnotesize ask that the window contents be exported to a file. Overwrite the file if it exists already;} \item \guilabel{Append To File:} {\footnotesize ask that the window contents be exported to a file. The new contents report data are appended to a preexisting file.} \end{itemize} \cleardoublepage %%% Local Variables: %%% mode: latex %%% TeX-master: "polyxmass" %%% End: