Difference between revisions of "SPANNER"

From Bioinformatics Software
Jump to navigationJump to search
(New page: __NOTOC__ SPANNER (Simularity Profile ANNotatER) extends LCA to perform supervised homology-based taxonomic classification of metagenomic fragments. == License == This software is releas...)
 
 
(5 intermediate revisions by the same user not shown)
Line 12: Line 12:
 
* [[Media:SPANNER.tar.gz|SPANNER v1.0.0 (gzipped tarball)]]
 
* [[Media:SPANNER.tar.gz|SPANNER v1.0.0 (gzipped tarball)]]
  
After downloading, uncompress the file and follow the installation instructions in the INSTALL.txt file. Additional instructions are located in the README.txt and MANUAL.txt files. To install the FCP under OS X, gcc must be installed which is available as part of the [http://developer.apple.com/technologies/tools/ Mac Developer Tools].
+
After downloading, uncompress the file and follow the installation instructions in the INSTALL.txt file. Additional instructions are located in the README.txt and MANUAL.txt files. To install SPANNER under OS X, gcc must be installed which is available as part of the [http://developer.apple.com/technologies/tools/ Mac Developer Tools].
 +
 
 +
Reference LCA Profiles are also available as a gzipped tarball. These LCA Profiles can be used as a reference dataset by SPANNER and were created from an all-versus-all BLASTp of nr at an e-value threshold of 0.01. These profiles will be updated as new versions of nr are released; the version used here was obtained from ncbi in March 2013. Note: this file is 5 GB in size, uncompressed it is 20 GB.
 +
 
 +
* [http://kiwi.cs.dal.ca/~mikep/ReferencenrLCAProfiles.tar.gz Reference nr LCA Profiles v03.2013]
 +
 
 +
Full instructions on using these reference LCA Profiles can be found in the README.txt file inside the tarball. Brief instructions for using these Profiles:
 +
 
 +
* These Profiles have not been trimmed to any ''p'' value, but instead include all BLASTp matches. In the SPANNER paper the reference Profiles were trimmed to the same ''p'' value as the query Profiles. The SPANNER tarball contains the Python script "trim_profiles_to_p.py" in the SCRIPTS directory to remove matches below a given ''p'' value. Running this script on the reference Profiles will greatly reduce the size of the reference dataset.
 +
 
 +
* In the SPANNER paper only the highest BLASTp match to any taxon was kept in a Profile, subsequent matches were removed. This was not done for these Profiles. The Python script "make_profiles_taxa_singletons.py" (also in the SCRIPTS directory) removes subsequent matches to already matched taxa, this script will reduce the size of the reference Profile dataset but is not necessary.
 +
 
 +
* The SPANNER paper also removed LCA Profiles with only one match, this was not done for these Profiles. The tarball contains the Python script "remove_singletons_from_profiles.py" in the SCRIPTS directory. This script should be run after "make_profiles_taxa_singletons.py" but again is not necessary for SPANNER to function. Reference Profiles with only one match should be removed if query profiles with only one match are removed (those query profiles should be classified using Best BLAST instead).
  
 
== Citing SPANNER ==
 
== Citing SPANNER ==
Line 18: Line 30:
 
If you find this software helpful in your research, please cite:
 
If you find this software helpful in your research, please cite:
  
* '''Porter, M.S., Beiko, R.G. SPANNER: Taxonomic assignment of sequences using pyramid matching of similarity profiles. ''Submitted to BMC Bioinformatics''.'''
+
* '''Porter, M.S., Beiko, R.G. SPANNER: Taxonomic assignment of sequences using pyramid matching of similarity profiles. ''Submitted to Bioinformatics''.'''
  
 
== Contact Information ==
 
== Contact Information ==
Line 33: Line 45:
 
The development of this software has been supported by several organizations:
 
The development of this software has been supported by several organizations:
  
 +
* [http://www.beem.utoronto.ca BEEM (Bioproducts + Enzymes from Environmental Metagenomes)]
 
* [http://www.genomeatlantic.ca Genome Atlantic]
 
* [http://www.genomeatlantic.ca Genome Atlantic]
 
* The Canadian Foundation for Innovation
 
* The Canadian Foundation for Innovation
 
* Genome Canada
 
* Genome Canada
 
* Ontario Genomics Institute
 
* Ontario Genomics Institute

Latest revision as of 09:49, 3 April 2013

SPANNER (Simularity Profile ANNotatER) extends LCA to perform supervised homology-based taxonomic classification of metagenomic fragments.

License

This software is released under the GNU General Public License v3.0.

Obtaining the Software

SPANNER is available as a gzipped tarball, supported on *nix systems (Linux, Unix, Mac).

After downloading, uncompress the file and follow the installation instructions in the INSTALL.txt file. Additional instructions are located in the README.txt and MANUAL.txt files. To install SPANNER under OS X, gcc must be installed which is available as part of the Mac Developer Tools.

Reference LCA Profiles are also available as a gzipped tarball. These LCA Profiles can be used as a reference dataset by SPANNER and were created from an all-versus-all BLASTp of nr at an e-value threshold of 0.01. These profiles will be updated as new versions of nr are released; the version used here was obtained from ncbi in March 2013. Note: this file is 5 GB in size, uncompressed it is 20 GB.

Full instructions on using these reference LCA Profiles can be found in the README.txt file inside the tarball. Brief instructions for using these Profiles:

  • These Profiles have not been trimmed to any p value, but instead include all BLASTp matches. In the SPANNER paper the reference Profiles were trimmed to the same p value as the query Profiles. The SPANNER tarball contains the Python script "trim_profiles_to_p.py" in the SCRIPTS directory to remove matches below a given p value. Running this script on the reference Profiles will greatly reduce the size of the reference dataset.
  • In the SPANNER paper only the highest BLASTp match to any taxon was kept in a Profile, subsequent matches were removed. This was not done for these Profiles. The Python script "make_profiles_taxa_singletons.py" (also in the SCRIPTS directory) removes subsequent matches to already matched taxa, this script will reduce the size of the reference Profile dataset but is not necessary.
  • The SPANNER paper also removed LCA Profiles with only one match, this was not done for these Profiles. The tarball contains the Python script "remove_singletons_from_profiles.py" in the SCRIPTS directory. This script should be run after "make_profiles_taxa_singletons.py" but again is not necessary for SPANNER to function. Reference Profiles with only one match should be removed if query profiles with only one match are removed (those query profiles should be classified using Best BLAST instead).

Citing SPANNER

If you find this software helpful in your research, please cite:

  • Porter, M.S., Beiko, R.G. SPANNER: Taxonomic assignment of sequences using pyramid matching of similarity profiles. Submitted to Bioinformatics.

Contact Information

SPANNER is in active development and we are interested in discussing all potential applications of this software. We encourage you to send us suggestions for new features. Suggestions, comments, and bug reports can be sent to Rob Beiko (beiko [at] cs.dal.ca). If reporting a bug, please provide as much information as possible and a simplified version of the data set which causes the bug. This will allow us to quickly resolve the issue.

Version History

v1.0.0 (December 17, 2012)

  • initial software release.

Funding

The development of this software has been supported by several organizations: