Revision history for GO-TermFinder group of modules. 0.82 Wed May 14 13:34:01 2008 No functional changes. Regenerated swig wrappers using 1.3.35, in the hope that that might see some of the failures I am seeing from CPAN testers. I'm not sure if those errors are specific to a new version of gcc, or to Perl 5.10.0. If the last version worked for you without any errors, you don't need this one. 0.81 Tue May 13 16:06:55 2008 - GO::TermFinder Made totalNumGenes a public method, which returns the total number of genes that are in the background set of genes from which the genes of interest were drawn. Unannotated genes are included in this count. This allowed me to fix a bug in analyze.pl, where it didn't always report the correct number of genes in the background. - GO::AnnotationProvider::AnnotationParser Fixed bug in handling a name that was ambiguous when considered in a case-insensitive fashion, but unambiguous when considered in a case-sensitive fashion. Thanks to Robert Flight for the bug report. - GO::OntologyProvider::OboParser Added additional logic to detect an error in the obo file if a node and a parent are in different namespaces, as this previously produced a fatal error with a cryptic message. - GO::TermFinder Prevent crash that could occur when there were no GO nodes tested - assertion that correction factor had to be >=1 was incorrect in this case, which could cause the script to die. Now correctly recognizes this. Thanks to Mark Schroeder at Princeton for pointing this out. - GO::View I had previously commented out a check, that resulted in postscript images always being generated. Now they are only generated when requested. GO-TermFinder-Obo.t Tweaked test to be regex instead of equality, as different Perls might add different things to exceptions. This fixes a failure I was seeing on Solaris. 0.80 Thu Mar 22 18:45:29 2007 - GO::OntologyProvider::OboParser New module, contributed by Shuai Weng of SGD. This allows you to use the .obo files instead of the .ontology files. The .ontology files have been deprecated for a while, and are much bigger than the .obo version, as they contain redundant information. The ontology parsing module still ships, but will be deprecated in future versions. The test suite for the ontology parser no longer ships with the distribution, though I still test it locally. Note, the API for the OboParser is slightly different, in that you have to provide the aspect of the ontology that you want (P, C or F), which was not the case when parsing the .ontology files. All the example code in the examples/ directory has been updated to work with the obo parser. - GO::TermFinder Updated to deal with the loss of the 'unknown' nodes in the ontology. With the removal of those nodes, direct annotation to the aspect node, such as biological_process, is meaningful. Thus, it now tests for significance for genes annotated directly to the aspect nodes, ignoring indirect annotations to those nodes. Added an additional check, such that if you provide a background population, but none of the genes in your list of interest are in that population, it dies with a useful error message (as opposed to the previously useless one). Corrected a bug in the Bonferroni correction, where it was too conservative - not sure what I was thinking. - GO::TermFinderReport::Text Can now print out the results in tabular form, instead of the usual text version, to help with automated analyses. Thanks to Noah Zimmerman for the code changes. - Code Quality I have started to use Perl::Critic locally, and all the code at least passes level 5. Future revisions will pass harsher review (with some rules commented out!). - Compilation The native/Makefile.PL has been modified to hopefully make it compile correctly under Cygwin - thanks to Noah Zimmerman for help with testing. 0.72 Thu Jul 27 16:44:14 2006 - GO::TermFinder Forgot to have the __saveVariables method save the discarded genes when simulations were being run, or the FDR was being calculated. Consequently, this information was being lost if either FDR or simulations were requested. This is now fixed. Found a minor buglet, concerning how genes were selected for simulations if a population had been used. This didn't affect the actual results, but now is *slightly* faster. - General Started using the Test::Spelling module locally, and it found a few typos that have been corrected. 0.71 Sun Jul 23 16:15:20 2006 - native/Makefile.PL Fix for the compilation of the C++ code that carries out the math. It is now hard-coded to use g++ as the LD and CC variables, and will complain if it can't find g++. If it fails, it will tell you what needs editing to hopefully fix it. - GO::TermFinder Small amount of refactoring on the findTerms() method, to make it easier to follow. This doesn't change the functionality. Added logic such that if a background population is provided, but then the list of genes for which enriched terms are to be found has genes not in that background population, then those genes are discarded from the calculation, and a new method, discardedGenes, allows the identify of those genes to be determined. - tests Added some new tests, and make sure all code uses strict and warnings 0.70 Thu Nov 18 11:55:10 2004 - GO::TermFinder Now uses an entirely new set of code for calculating the probability based on the hypergeometric distribution (written by Ihab Awad). The new code is written in C++, and interfaced to Perl using SWIG. For long running batch jobs, with > 100 lists of genes (using for instance analyze.pl in the examples directory), the new code is up to 3 times faster. Note, that installation now requires an ANSI C++ compiler - see the README for more details. Note that the ability to use the binomial distribution for the probability calculation no longer exists - it will always use the hypergeometric distribution now. - GO::Utils::File fixed small bug in that only one leading space would be stripped from a gene name in GenesFromFile - thanks to Linda McMahan at Princeton for spotting this one. - GO::TermFinder made error message a little more explicit when a goid used to annotate a gene (as indicated by the annotation provider) does not appear in the provided ontology - thanks to John Matese for the suggestion. - GO::View small addition that hopefully deals with a problem sometimes seen when running on Windows, that I think is due to line endings produced by the dot program. 0.64 Sun Aug 15 23:30:32 2004 - GO::View Added the ability to create postscript output from GO::View - simply change the makePs attribute in the GoView.conf file to a '1', and if you run batchGoView.pl, you should get a postcript file as well as png or gif images. The Postscript is not perfect, because the GraphViz interface doesn't seem to allow me to modify the page size (even though it says I should be able to), so I have restricted the size of the image to try and get it to fit on one page. Suggestions as to how to make the postscript feature more robust would be appreciated. Added in copious comments to GO::View, so that most of it can actually be understood my regular programmers - performed some cleanups or unnecessary or obfuscated code in the process. Significant refactoring of this module is next on the agenda. 0.63 Wed Aug 11 11:16:23 2004 - GO::View Fix to bug that was causing some significant nodes to not have the correct color, and thus making the image somewhat misleading - thanks to the folks at Princeton for spotting it, and to Shuai Weng at SGD for providing a fix in record time. fixed bug in reading of configuration file, that resulted from minMapWidth being a substring of minMapWidth4OneLineKey. This now stops a warning being printed. Added some comments to GO::View, as a start to begin to fully comment all the code - there's a long way to go until I understand fully how GO::View works. Then I can add postscript output for publication quality figures... 0.62 Wed Jul 28 11:46:08 2004 No major new functionality - small bug fix release - AnnotationParser.pm fixed capitalization bugs when calling goidsByDatabaseId() in nameIsAnnotated() instead of goIdsByDatabaseId() on lines 1592 and 1617 - thanks to lfriedl@cs.umass.edu for spotting this. - GoView.conf Fixed typo in GoView.conf : totalNumGene should be totalNumGenes - thanks to John Matese for spotting this one. - GO::TermFinder Made __databaseIds() method of GO::TermFinder public, and named it genesDatabaseIds, as I realized it is the only way that a client can determine how many genes were actually used when calculating p-values. Before, I was using the number of genes that were passed in, but they will get collapsed if more than one maps to the same databaseId. - batchGOView.pl Modified batchGOView.pl to use the genesDatabaseIds method. Removed its use of the CategorizeGenes function in GO::Utils::General, as I don't think was was any reasonable logic for using it (that I can remember). - GO::TermFinder Fixed lots of spurious warnings that were due to checking the state of a variable before it had been set, when using a user defined background population - thanks to Jeremy Gollub for spotting this one. - GO::TermFinder If a gene is padded in multiple times in either the list of genes of interest, or the background population, you should only be warned about that gene once for each list, rather than every time the gene is encountered. 0.61 Fri May 7 11:23:36 2004 - GO::TermFinder Made one line optimization to calculation of p-value using the hypergeometric distribution, such that it is on average about twice as fast. 0.60 Wed May 5 15:04:06 2004 - GO::TermFinder The multiple hypothesis correction is no longer done using the custom method, but instead uses a Bonferroni correction. Correcting p-values by running simulations is now available, to allow you to control the Family Wise Error rate. Added the ability to calculate the False Discovery Rate, as a potential means of avoiding the whole p-value problem. see the pod for GO::TermFinder, as well as docs/GO-TermFinder.doc for more information on these. - tools updated the batchGOView.pl tool, and the analyze.pl tool, in the examples directory, to support printing out of the False Discovery Rate if calculated. They also both use the newly created GO::TermFinderReport::Text object, to consolidate code, and to keep their reports consistent with one another. - README Tried to make it a little clearer as to how to install the libraries. 0.50 Tue Dec 16 17:34:50 2003 Big news is that a version of Shuai Weng's (from SGD) GO::View module has been added in, which can create a graphic representation of the results of GO::TermFinder. This gives a much better way to intuitively look at the results. A bunch of other stuff was added in to support this, including a batch processor that will generate an html pages with a graphic for any number of input lists of genes. See batchGOView.pl and examples.html in the examples directory. - GO::TermFinder Fixed a nasty bug that was actually due to a bug in Perl (I swear!) that meant that clients of the GO::TermFinder would have a runtime error if they did not recognize one or more of the genes that it was provided with. This should no longer happen, and I have written additional tests to make sure it never happens again! 0.40 Tue Dec 2 18:41:10 2003 - GO::TermFinder Added in the ability to define a subpopulation of genes as the background from which the interesting genes were drawn. See pod for constuctor of GO::TermFinder for more details. 0.30 Wed Nov 26 11:48:57 2003 - GO::AnnotationProvider::AnnotationParser Extensive reworking of the code, such that it is now case insensitive, with certain caveats. See the pod for that module for more details. - GO::TermFinder No longer considers the root node, or its child (the aspect) as hypotheses, as they are known a priori to have a p-value of 1. - GO::Node Added lengthOfShortestPathToRoot and meanLengthOfPathsToRoot methods. - added in new tests for the AnnotationParser that make sure that the behaviour with respect to case insensitivity is correct. 0.23 Sat Nov 1 11:56:43 2003 - GO::OntologyProvider::OntologyParser; Added in check that a given non-comment line can actually have a GOID extracted, which makes the error a more informative when such a line is encountered, due to an error in an ontology file. 0.22 Sun Oct 19 16:54:38 2003 - GO::TermFinder Fix for test that could occasionally fail that relied on the sort order of the pValue array when two items had the same pvalue. It now sorts such cases explicitly by goid, so the result should always be the same. 0.21 Thu Oct 16 19:00:14 2003 - GO::TermFinder Fix for situation when a gene identifier wasn't recognized, but wasn't handled properly - thanks to Shuai Weng for bring it to my attention. Cleaned up the code that calculates the p-values to make it easier to write a test-suite, which will allow me to make other desired changes with more confidence. - tests Created tests for the GO::TermFinder module itself, that pass under both OSX and Solaris in my testing - this should make it much easier to modify in future with less concern for failing to notice new bugs. - examples Added a new example, that makes it easier to analyze multiple files of gene names. - suppressed some warnings that occured due to some undefs being used. - fixed some documentation typos 0.2 Mon Apr 14 17:55:28 2003 - added in code such that the findTerms() method will return enough data for you to be able to work out which genes in the list you provided were annotated to which GO nodes. - started some work on Annotation, AnnotatedGene, and Reference objects. Nothing is using them yet though. - added in over 100 tests(!). Still lots more to do, but this should help keep the code honest... 0.1 Fri Mar 7 13:51:32 2003 - original version; created by h2xs 1.21 with options -n GO-TermFinder -X