SPANNER
SPANNER (Simularity Profile ANNotatER) extends LCA to perform supervised homology-based taxonomic classification of metagenomic fragments.
License
This software is released under the GNU General Public License v3.0.
Obtaining the Software
SPANNER is available as a gzipped tarball, supported on *nix systems (Linux, Unix, Mac).
After downloading, uncompress the file and follow the installation instructions in the INSTALL.txt file. Additional instructions are located in the README.txt and MANUAL.txt files. To install SPANNER under OS X, gcc must be installed which is available as part of the Mac Developer Tools.
Reference LCA Profiles are also available as a gzipped tarball. These LCA Profiles can be used as a reference dataset by SPANNER and were created from an all-versus-all BLASTp of nr at an e-value threshold of 0.01. These profiles will be updated as new versions of nr are released; the version used here was obtained from ncbi in March 2013. Note: this file is 5 GB in size, uncompressed it is 20 GB.
Full instructions on using these reference LCA Profiles can be found in the README.txt file inside the tarball. Brief instructions for using these Profiles:
- These Profiles have not been trimmed to any p value, but instead include all BLASTp matches. In the SPANNER paper the reference Profiles were trimmed to the same p value as the query Profiles. The SPANNER tarball contains the Python script "trim_profiles_to_p.py" in the SCRIPTS directory to remove matches below a given p value. Running this script on the reference Profiles will greatly reduce the size of the reference dataset.
- In the SPANNER paper only the highest BLASTp match to any taxon was kept in a Profile, subsequent matches were removed. This was not done for these Profiles. The Python script "make_profiles_taxa_singletons.py" (also in the SCRIPTS directory) removes subsequent matches to already matched taxa, this script will reduce the size of the reference Profile dataset but is not necessary.
- The SPANNER paper also removed LCA Profiles with only one match, this was not done for these Profiles. The tarball contains the Python script "remove_singletons_from_profiles.py" in the SCRIPTS directory. This script should be run after "make_profiles_taxa_singletons.py" but again is not necessary for SPANNER to function. Reference Profiles with only one match should be removed if query profiles with only one match are removed (those query profiles should be classified using Best BLAST instead).
Citing SPANNER
If you find this software helpful in your research, please cite:
- Porter, M.S., Beiko, R.G. SPANNER: Taxonomic assignment of sequences using pyramid matching of similarity profiles. Submitted to Bioinformatics.
Contact Information
SPANNER is in active development and we are interested in discussing all potential applications of this software. We encourage you to send us suggestions for new features. Suggestions, comments, and bug reports can be sent to Rob Beiko (beiko [at] cs.dal.ca). If reporting a bug, please provide as much information as possible and a simplified version of the data set which causes the bug. This will allow us to quickly resolve the issue.
Version History
v1.0.0 (December 17, 2012)
- initial software release.
Funding
The development of this software has been supported by several organizations:
- BEEM (Bioproducts + Enzymes from Environmental Metagenomes)
- Genome Atlantic
- The Canadian Foundation for Innovation
- Genome Canada
- Ontario Genomics Institute
