<?xml version="1.0" encoding="ascii"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Bio.AlignIO</title> <link rel="stylesheet" href="epydoc.css" type="text/css" /> <script type="text/javascript" src="epydoc.js"></script> </head> <body bgcolor="white" text="black" link="blue" vlink="#204080" alink="#204080"> <!-- ==================== NAVIGATION BAR ==================== --> <table class="navbar" border="0" width="100%" cellpadding="0" bgcolor="#a0c0ff" cellspacing="0"> <tr valign="middle"> <!-- Tree link --> <th> <a href="module-tree.html">Trees</a> </th> <!-- Index link --> <th> <a href="identifier-index.html">Indices</a> </th> <!-- Help link --> <th> <a href="help.html">Help</a> </th> <th class="navbar" width="100%"></th> </tr> </table> <table width="100%" cellpadding="0" cellspacing="0"> <tr valign="top"> <td width="100%"> <span class="breadcrumbs"> <a href="Bio-module.html">Package Bio</a> :: Package AlignIO </span> </td> <td> <table cellpadding="0" cellspacing="0"> <!-- hide/show private --> <tr><td align="right"><span class="options">[<a href="javascript:void(0);" class="privatelink" onclick="toggle_private();">hide private</a>]</span></td></tr> <tr><td align="right"><span class="options" >[<a href="frames.html" target="_top">frames</a >] | <a href="Bio.AlignIO-module.html" target="_top">no frames</a>]</span></td></tr> </table> </td> </tr> </table> <!-- ==================== PACKAGE DESCRIPTION ==================== --> <h1 class="epydoc">Package AlignIO</h1><p class="nomargin-top"><span class="codelink"><a href="Bio.AlignIO-pysrc.html">source code</a></span></p> <pre class="literalblock"> Alignment input/output designed to look similar to Bio.SeqIO. Input ===== For the typical special case when your file or handle contains one and only one alignment, use the function Bio.AlignIO.read(). This takes an input file handle, format string and optional number of sequences per alignment. It will return a single Alignment object (or raise an exception if there isn't just one alignment): from Bio import AlignIO handle = open("example.aln", "rU") align = AlignIO.read(handle, "clustal") handle.close() print align For the general case, when the handle could contain any number of alignments, use the function Bio.AlignIO.parse(...) which takes the same arguments, but returns an iterator giving Alignment objects. For example, using the output from the EMBOSS water or needle pairwise alignment prorams: from Bio import AlignIO handle = open("example.txt", "rU") for alignment in AlignIO.parse(handle, "emboss") : print alignment If you want random access to the alignments by number, turn this into a list: from Bio import AlignIO handle = open("example.aln", "rU") alignments = list(AlignIO.parse(handle, "clustal")) print alignments[0] Most alignment file formats can be concatenated so as to hold as many different multiple sequence alignments as possible. One common example is the output of the tool seqboot in the PHLYIP suite. Sometimes there can be a file header and footer, as seen in the EMBOSS alignment output. There is an optional argument for the number of sequences per alignment which is usually only needed with the alignments stored in the FASTA format. Without this information, there is no clear way to tell if you have say a single alignment of 20 sequences, or four alignments of 5 sequences. e.g. from Bio import AlignIO handle = open("example.faa", "rU") for alignment in AlignIO.parse(handle, "fasta", seq_count=5) : print alignment The above code would split up the FASTA files, and try and batch every five sequences into an alignment. Output ====== Use the function Bio.AlignIO.write(...), which takes a complete set of Alignment objects (either as a list, or an iterator), an output file handle and of course the file format. from Bio import AlignIO alignments = ... handle = open("example.faa", "w") alignment = SeqIO.write(alignments, handle, "fasta") handle.close() In general, you are expected to call this function once (with all your alignments) and then close the file handle. However, for file formats like PHYLIP where multiple alignments are stored sequentially (with no file header and footer), then multiple calls to the write function should work as expected. File Formats ============ When specifying the file format, use lowercase strings. The same format names are also used in Bio.SeqIO and include the following: clustal - Ouput from Clustal W or X, see also the module Bio.Clustalw which can be used to run the command line tool from Biopython. emboss - The "pairs" and "simple" alignment format from the EMBOSS tools. fasta - The generic sequence file format where each record starts with a identifer line starting with a ">" character, followed by lines of sequence. fasta-m10 - For the pairswise alignments output by Bill Pearson's FASTA tools when used with the -m 10 command line option for machine readable output. nexus - Output from NEXUS, see also the module Bio.Nexus which can also read any phylogenetic trees in these files. phylip - Used by the PHLIP tools. stockholm - A richly annotated alignment file format used by PFAM. Further Information =================== See the wiki page biopython.org/wiki/AlignIO and also the Bio.AlignIO chapter in the Biopython Tutorial and Cookbook which is also available online: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf </pre> <!-- ==================== SUBMODULES ==================== --> <a name="section-Submodules"></a> <table class="summary" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr bgcolor="#70b0f0" class="table-header"> <td colspan="2" class="table-header"> <table border="0" cellpadding="0" cellspacing="0" width="100%"> <tr valign="top"> <td align="left"><span class="table-header">Submodules</span></td> <td align="right" valign="top" ><span class="options">[<a href="#section-Submodules" class="privatelink" onclick="toggle_private();" >hide private</a>]</span></td> </tr> </table> </td> </tr> <tr><td class="summary"> <ul class="nomargin"> <li> <strong class="uidlink"><a href="Bio.AlignIO.ClustalIO-module.html">Bio.AlignIO.ClustalIO</a></strong> </li> <li> <strong class="uidlink"><a href="Bio.AlignIO.EmbossIO-module.html">Bio.AlignIO.EmbossIO</a></strong> </li> <li> <strong class="uidlink"><a href="Bio.AlignIO.FastaIO-module.html">Bio.AlignIO.FastaIO</a></strong>: <em class="summary">This module contains a parser for the pairwise alignments produced by Bill Pearson's FASTA tool, for use from the Bio.AlignIO interface where it is refered to as the "fasta-m10" file format (as we only support the machine readable output format selected with the -m 10 command line option).</em> </li> <li> <strong class="uidlink"><a href="Bio.AlignIO.Interfaces-module.html">Bio.AlignIO.Interfaces</a></strong> </li> <li> <strong class="uidlink"><a href="Bio.AlignIO.NexusIO-module.html">Bio.AlignIO.NexusIO</a></strong>: <em class="summary">Bio.AlignIO support for the "nexus" file format.</em> </li> <li> <strong class="uidlink"><a href="Bio.AlignIO.PhylipIO-module.html">Bio.AlignIO.PhylipIO</a></strong>: <em class="summary">Note ==== In TREE_PUZZLE (Schmidt et al.</em> </li> <li> <strong class="uidlink"><a href="Bio.AlignIO.StockholmIO-module.html">Bio.AlignIO.StockholmIO</a></strong> </li> </ul></td></tr> </table> <br /> <!-- ==================== FUNCTIONS ==================== --> <a name="section-Functions"></a> <table class="summary" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr bgcolor="#70b0f0" class="table-header"> <td colspan="2" class="table-header"> <table border="0" cellpadding="0" cellspacing="0" width="100%"> <tr valign="top"> <td align="left"><span class="table-header">Functions</span></td> <td align="right" valign="top" ><span class="options">[<a href="#section-Functions" class="privatelink" onclick="toggle_private();" >hide private</a>]</span></td> </tr> </table> </td> </tr> <tr> <td width="15%" align="right" valign="top" class="summary"> <span class="summary-type"> </span> </td><td class="summary"> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <tr> <td><span class="summary-sig"><a href="Bio.AlignIO-module.html#write" class="summary-sig-name">write</a>(<span class="summary-sig-arg">alignments</span>, <span class="summary-sig-arg">handle</span>, <span class="summary-sig-arg">format</span>)</span><br /> Write complete set of alignments to a file.</td> <td align="right" valign="top"> <span class="codelink"><a href="Bio.AlignIO-pysrc.html#write">source code</a></span> </td> </tr> </table> </td> </tr> <tr class="private"> <td width="15%" align="right" valign="top" class="summary"> <span class="summary-type"> </span> </td><td class="summary"> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <tr> <td><span class="summary-sig"><a href="Bio.AlignIO-module.html#_SeqIO_to_alignment_iterator" class="summary-sig-name" onclick="show_private();">_SeqIO_to_alignment_iterator</a>(<span class="summary-sig-arg">handle</span>, <span class="summary-sig-arg">format</span>, <span class="summary-sig-arg">seq_count</span>=<span class="summary-sig-default">None</span>)</span><br /> Private function, uses Bio.SeqIO to create an Alignment iterator.</td> <td align="right" valign="top"> <span class="codelink"><a href="Bio.AlignIO-pysrc.html#_SeqIO_to_alignment_iterator">source code</a></span> </td> </tr> </table> </td> </tr> <tr> <td width="15%" align="right" valign="top" class="summary"> <span class="summary-type"> </span> </td><td class="summary"> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <tr> <td><span class="summary-sig"><a href="Bio.AlignIO-module.html#parse" class="summary-sig-name">parse</a>(<span class="summary-sig-arg">handle</span>, <span class="summary-sig-arg">format</span>, <span class="summary-sig-arg">seq_count</span>=<span class="summary-sig-default">None</span>)</span><br /> Turns a sequence file into an iterator returning Alignment objects.</td> <td align="right" valign="top"> <span class="codelink"><a href="Bio.AlignIO-pysrc.html#parse">source code</a></span> </td> </tr> </table> </td> </tr> <tr> <td width="15%" align="right" valign="top" class="summary"> <span class="summary-type"> </span> </td><td class="summary"> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <tr> <td><span class="summary-sig"><a href="Bio.AlignIO-module.html#read" class="summary-sig-name">read</a>(<span class="summary-sig-arg">handle</span>, <span class="summary-sig-arg">format</span>, <span class="summary-sig-arg">seq_count</span>=<span class="summary-sig-default">None</span>)</span><br /> Turns an alignment file into a single Alignment object.</td> <td align="right" valign="top"> <span class="codelink"><a href="Bio.AlignIO-pysrc.html#read">source code</a></span> </td> </tr> </table> </td> </tr> </table> <!-- ==================== VARIABLES ==================== --> <a name="section-Variables"></a> <table class="summary" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr bgcolor="#70b0f0" class="table-header"> <td colspan="2" class="table-header"> <table border="0" cellpadding="0" cellspacing="0" width="100%"> <tr valign="top"> <td align="left"><span class="table-header">Variables</span></td> <td align="right" valign="top" ><span class="options">[<a href="#section-Variables" class="privatelink" onclick="toggle_private();" >hide private</a>]</span></td> </tr> </table> </td> </tr> <tr class="private"> <td width="15%" align="right" valign="top" class="summary"> <span class="summary-type"> </span> </td><td class="summary"> <a href="Bio.AlignIO-module.html#_FormatToIterator" class="summary-name" onclick="show_private();">_FormatToIterator</a> = <code title="{"clustal": ClustalIO.ClustalIterator, "emboss": EmbossIO.EmbossIterat\ or, "fasta-m10": FastaIO.FastaM10Iterator, "nexus": NexusIO.NexusItera\ tor, "phylip": PhylipIO.PhylipIterator, "stockholm": StockholmIO.Stock\ holmIterator,}">{"clustal": ClustalIO.ClustalIterator, "em<code class="variable-ellipsis">...</code></code> </td> </tr> <tr class="private"> <td width="15%" align="right" valign="top" class="summary"> <span class="summary-type"> </span> </td><td class="summary"> <a href="Bio.AlignIO-module.html#_FormatToWriter" class="summary-name" onclick="show_private();">_FormatToWriter</a> = <code title="{"phylip": PhylipIO.PhylipWriter, "stockholm": StockholmIO.StockholmWr\ iter, "clustal": ClustalIO.ClustalWriter,}">{"phylip": PhylipIO.PhylipWriter, "stockholm<code class="variable-ellipsis">...</code></code> </td> </tr> </table> <!-- ==================== FUNCTION DETAILS ==================== --> <a name="section-FunctionDetails"></a> <table class="details" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr bgcolor="#70b0f0" class="table-header"> <td colspan="2" class="table-header"> <table border="0" cellpadding="0" cellspacing="0" width="100%"> <tr valign="top"> <td align="left"><span class="table-header">Function Details</span></td> <td align="right" valign="top" ><span class="options">[<a href="#section-FunctionDetails" class="privatelink" onclick="toggle_private();" >hide private</a>]</span></td> </tr> </table> </td> </tr> </table> <a name="write"></a> <div> <table class="details" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr><td> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <tr valign="top"><td> <h3 class="epydoc"><span class="sig"><span class="sig-name">write</span>(<span class="sig-arg">alignments</span>, <span class="sig-arg">handle</span>, <span class="sig-arg">format</span>)</span> </h3> </td><td align="right" valign="top" ><span class="codelink"><a href="Bio.AlignIO-pysrc.html#write">source code</a></span> </td> </tr></table> <p>Write complete set of alignments to a file.</p> <p>sequences - A list (or iterator) of Alignment objects handle - File handle object to write to format - What format to use.</p> <p>You should close the handle after calling this function.</p> <p>There is no return value.</p> <dl class="fields"> </dl> </td></tr></table> </div> <a name="_SeqIO_to_alignment_iterator"></a> <div class="private"> <table class="details" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr><td> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <tr valign="top"><td> <h3 class="epydoc"><span class="sig"><span class="sig-name">_SeqIO_to_alignment_iterator</span>(<span class="sig-arg">handle</span>, <span class="sig-arg">format</span>, <span class="sig-arg">seq_count</span>=<span class="sig-default">None</span>)</span> </h3> </td><td align="right" valign="top" ><span class="codelink"><a href="Bio.AlignIO-pysrc.html#_SeqIO_to_alignment_iterator">source code</a></span> </td> </tr></table> <pre class="literalblock"> Private function, uses Bio.SeqIO to create an Alignment iterator. handle - handle to the file. format - string describing the file format. seq_count- Optional integer, number of sequences expected in each alignment. Recommended for fasta format files. If count is omitted (default) then all the sequences in the file are combined into a single Alignment. </pre> <dl class="fields"> </dl> </td></tr></table> </div> <a name="parse"></a> <div> <table class="details" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr><td> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <tr valign="top"><td> <h3 class="epydoc"><span class="sig"><span class="sig-name">parse</span>(<span class="sig-arg">handle</span>, <span class="sig-arg">format</span>, <span class="sig-arg">seq_count</span>=<span class="sig-default">None</span>)</span> </h3> </td><td align="right" valign="top" ><span class="codelink"><a href="Bio.AlignIO-pysrc.html#parse">source code</a></span> </td> </tr></table> <pre class="literalblock"> Turns a sequence file into an iterator returning Alignment objects. handle - handle to the file. format - string describing the file format. seq_count- Optional integer, number of sequences expected in each alignment. Recommended for fasta format files. If you have the file name in a string 'filename', use: from Bio import AlignIO my_iterator = AlignIO.parse(open(filename,"rU"), format) If you have a string 'data' containing the file contents, use: from Bio import AlignIO from StringIO import StringIO my_iterator = AlignIO.parse(StringIO(data), format) Use the Bio.AlignIO.read(handle, format[, seq_count]) function when you expect a single record only. </pre> <dl class="fields"> </dl> </td></tr></table> </div> <a name="read"></a> <div> <table class="details" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr><td> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <tr valign="top"><td> <h3 class="epydoc"><span class="sig"><span class="sig-name">read</span>(<span class="sig-arg">handle</span>, <span class="sig-arg">format</span>, <span class="sig-arg">seq_count</span>=<span class="sig-default">None</span>)</span> </h3> </td><td align="right" valign="top" ><span class="codelink"><a href="Bio.AlignIO-pysrc.html#read">source code</a></span> </td> </tr></table> <pre class="literalblock"> Turns an alignment file into a single Alignment object. handle - handle to the file. format - string describing the file format. seq_count- Optional interger, number of sequences expected in the alignment to check you got what you expected. If the handle contains no alignments, or more than one alignment, an exception is raised. For example, using a PFAM/Stockholm file containing one alignment: from Bio import AlignIO align = AlignIO.read(open("example.sth"), "stockholm") If however you want the first alignment from a file containing multiple alignments this function would raise an exception. Instead use: from Bio import AlignIO align = AlignIO.parse(open("example.sth"), "stockholm").next() Use the Bio.AlignIO.parse() function if you want to read multiple records from the handle. </pre> <dl class="fields"> </dl> </td></tr></table> </div> <br /> <!-- ==================== VARIABLES DETAILS ==================== --> <a name="section-VariablesDetails"></a> <table class="details" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr bgcolor="#70b0f0" class="table-header"> <td colspan="2" class="table-header"> <table border="0" cellpadding="0" cellspacing="0" width="100%"> <tr valign="top"> <td align="left"><span class="table-header">Variables Details</span></td> <td align="right" valign="top" ><span class="options">[<a href="#section-VariablesDetails" class="privatelink" onclick="toggle_private();" >hide private</a>]</span></td> </tr> </table> </td> </tr> </table> <a name="_FormatToIterator"></a> <div class="private"> <table class="details" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr><td> <h3 class="epydoc">_FormatToIterator</h3> <dl class="fields"> </dl> <dl class="fields"> <dt>Value:</dt> <dd><table><tr><td><pre class="variable"> {"clustal": ClustalIO.ClustalIterator, "emboss": EmbossIO.EmbossIterat<span class="variable-linewrap"><img src="crarr.png" alt="\" /></span> or, "fasta-m10": FastaIO.FastaM10Iterator, "nexus": NexusIO.NexusItera<span class="variable-linewrap"><img src="crarr.png" alt="\" /></span> tor, "phylip": PhylipIO.PhylipIterator, "stockholm": StockholmIO.Stock<span class="variable-linewrap"><img src="crarr.png" alt="\" /></span> holmIterator,} </pre></td></tr></table> </dd> </dl> </td></tr></table> </div> <a name="_FormatToWriter"></a> <div class="private"> <table class="details" border="1" cellpadding="3" cellspacing="0" width="100%" bgcolor="white"> <tr><td> <h3 class="epydoc">_FormatToWriter</h3> <dl class="fields"> </dl> <dl class="fields"> <dt>Value:</dt> <dd><table><tr><td><pre class="variable"> {"phylip": PhylipIO.PhylipWriter, "stockholm": StockholmIO.StockholmWr<span class="variable-linewrap"><img src="crarr.png" alt="\" /></span> iter, "clustal": ClustalIO.ClustalWriter,} </pre></td></tr></table> </dd> </dl> </td></tr></table> </div> <br /> <!-- ==================== NAVIGATION BAR ==================== --> <table class="navbar" border="0" width="100%" cellpadding="0" bgcolor="#a0c0ff" cellspacing="0"> <tr valign="middle"> <!-- Tree link --> <th> <a href="module-tree.html">Trees</a> </th> <!-- Index link --> <th> <a href="identifier-index.html">Indices</a> </th> <!-- Help link --> <th> <a href="help.html">Help</a> </th> <th class="navbar" width="100%"></th> </tr> </table> <table border="0" cellpadding="0" cellspacing="0" width="100%%"> <tr> <td align="left" class="footer"> Generated by Epydoc 3.0.1 on Mon Sep 15 09:26:22 2008 </td> <td align="right" class="footer"> <a target="mainFrame" href="http://epydoc.sourceforge.net" >http://epydoc.sourceforge.net</a> </td> </tr> </table> <script type="text/javascript"> <!-- // Private objects are initially displayed (because if // javascript is turned off then we want them to be // visible); but by default, we want to hide them. So hide // them unless we have a cookie that says to show them. checkCookie(); // --> </script> </body> </html>