https://beikolab.cs.dal.ca/software/api.php?action=feedcontributions&user=Beiko&feedformat=atomBioinformatics Software - User contributions [en]2024-03-28T13:09:08ZUser contributionsMediaWiki 1.34.0https://beikolab.cs.dal.ca/software/index.php?title=SimDEF&diff=1414SimDEF2015-09-04T23:53:09Z<p>Beiko: Created page with "simDEF is currently hosted [http://iwera.ir/~ahmad/dal/ here]."</p>
<hr />
<div>simDEF is currently hosted [http://iwera.ir/~ahmad/dal/ here].</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1413Main Page2015-09-04T23:51:14Z<p>Beiko: /* Software (Currently Supported) */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [[simDEF | simDEF]]: Definition-based Semantic Similarity Measure of GO Terms for Functional Similarity Analysis of Genes.<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* PICA: genotype-phenotype data mining software (G). A [https://github.com/univieCUBE/PICA new version] of the software has been developed by the [http://cube.univie.ac.at/people Rattei group] at Universität Wien and is hosted on Github . The older version from our lab, which is no longer supported, can be found [[PICA | here]].<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]<br />
* [[MEGASAT]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1412Main Page2015-07-13T13:28:47Z<p>Beiko: /* Software (Currently Supported) */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* PICA: genotype-phenotype data mining software (G). A [https://github.com/univieCUBE/PICA new version] of the software has been developed by the [http://cube.univie.ac.at/people Rattei group] at Universität Wien and is hosted on Github . The older version from our lab, which is no longer supported, can be found [[PICA | here]].<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]<br />
* [[MEGASAT]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1410Main Page2015-07-10T12:19:08Z<p>Beiko: /* Software (Currently Supported) */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* PICA: genotype-phenotype data mining software (G). A [https://github.com/univieCUBE/PICA new version] of the software has been developed by the [http://cube.univie.ac.at/people Rattei group] at Universität Wien in Kinderhand and is hosted on Github . The older version from our lab, which is no longer supported, can be found [[PICA | here]].<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]<br />
* [[MEGASAT]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1409Main Page2015-07-10T12:18:13Z<p>Beiko: </p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* PICA: genotype-phenotype data mining software (G). A [https://github.com/univieCUBE/PICA new version] of the software has been developed by the [http://cube.univie.ac.at/people Rattei group] at Universität Wien in Kinderhand and is hosted on Github . The older version, which is no longer supported, can be found [[PICA]] here.<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]<br />
* [[MEGASAT]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1408Main Page2015-07-10T12:16:31Z<p>Beiko: /* Software (Currently Supported) */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* PICA: genotype-phenotype data mining software (G). A [https://github.com/univieCUBE/PICA new version] of the software has been developed by the Rattei group at Universität Wien in Kinderhand and is hosted on Github . The older version, which is no longer supported, can be found [[PICA]] here.<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]<br />
* [[MEGASAT]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1407Main Page2015-07-10T12:14:06Z<p>Beiko: </p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* PICA: genotype-phenotype data mining software (G). A [https://github.com/univieCUBE/PICA new version] of the software has been developed by the *** group at *** and is hosted on Github . The older version, which is no longer supported, can be found [[PICA]] here.<br />
<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]<br />
* [[MEGASAT]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=MEGASAT&diff=1344MEGASAT2015-05-01T18:54:08Z<p>Beiko: /* Running “SatGenotype.pl” */</p>
<hr />
<div><br />
== This page is currently under development. Please check back for release information about MEGASAT and updated documentation. ==<br />
<br />
== Overview ==<br />
<br />
<br />
The current version of MEGASAT is 1.0. The MEGASAT scripts should work with any relatively recent version of Perl and have been tested with versions 5.18.2 and 5.16.3.<br />
<br />
== Installing MEGASAT ==<br />
<br />
=== Windows ===<br />
In this “MEGASAT_1.0 for Windows” folder, there are two Perl executable files and two Executable Jar Files. MEGASAT_GUI is the graphical user interface for running the main Perl executable file called “MS”. “update_GUI” is the graphical user interface for running the Perl executable file called “updateMatrix”.<br />
<br />
If Perl is already installed in your computer, you can just click the Start button and go to your Perl interpreter to run the Perl scripts. If you don’t have Perl installed but still want to run the Perl scripts, here are the two main distributions for Windows: ActivePerl (http://www.activestate.com/activePerl) and Strawberry Perl (http://strawberryPerl.com/). We have used the latter in development and testing of MEGASAT, and recommend its use. The script “SatGenotype.pl” uses no complicated library functions, while the script “updateMatrix.pl” uses two packages called “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel”. Here is the link of the installation instructions for “Spreadsheet::WriteExcel”: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html. <br />
<br />
=== Macintosh ===<br />
If you are using a Macintosh system, Perl should already be installed; type “Perl –v” at the command line to ensure this is the case. So Perl scripts can be easily invoked from the terminal on Mac system. But if you don’t want to run scripts in terminal, two simple GUIs are also offered to invoke those two executable Perl scripts.<br />
In order to run “updateMatrix.pl”, two packages “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel” should be installed. The link of installation instructions is: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html.<br />
<br />
=== Linux ===<br />
Perl should already be installed on Linux system. So it’s easy to go to terminal to invoke Perl scripts. For running “updateMatrix.pl”, follow the installation instructions as above.<br />
<br />
== Input file formats ==<br />
“SatGenotype.pl” requires an input file with information about PCR primers, and a set of .fastq files representing reads from each sampled locus.<br />
<br />
=== Primer file ===<br />
The primer file must be in a tab-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2: forward primers<br />
- Column3: reverse primers<br />
- Column4: 3’ flank<br />
- Column5: 5’ flank<br />
- Column6: the repeat array<br />
<br />
In this txt file, a header line is required to specify the column name. If one locus doesn’t have 3’ flank, a character “A” needs to be written in the 3’ flank column in that txt file. But if it doesn’t have 5’ flank, nothing needs to be written in the 5’ flank column.<br />
<br />
Here is an [[Media:Guppy primers.txt|example primer file]].<br />
<br />
=== Input sequence file ===<br />
Input sequence read files must be in standard FASTQ format.<br />
<br />
Here is an [[Media:MEGASAT Example.fastq|example of a short FASTQ file that will work with the primer file above.]]<br />
<br />
“updateMatrix.pl” requires an input file with original genotyping and new genotyping information you want to update to. This scores txt file must be in comma-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2 & Column3: original genotype<br />
- Column4 & Column5: new genotype<br />
<br />
Another input file is your original genotyping excel file that contains genotyping information for different individuals and loci.<br />
<br />
== Running MEGASAT ==<br />
<br />
=== Running “SatGenotype.pl” ===<br />
If you don’t want to use command line to invoke scripts, a simple GUI is provided for Windows, Mac and Linux users. Double click the “MEGASAT_GUI” will display a pop-up page. In this page, you can click the first “Open” button to open your input primers file. The text field under the “Open” button will display the directory of your primers file. The second small text field is for typing the number of mismatches. The second “Open” button is to open the data set folder that contains the input sequence read files. The “Choose” button is to choose the directory to save your output folder. Two radio buttons in this page offer two options- compress the output folder or not compress the output folder. After all these parameters are filled, click the “Run the program” to run the Perl scripts.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “SatGenotype.pl” and your primers txt file “primers.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the data set folder “dataset” is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ SatGenotype.pl C:\Users\Andy\Downloads\primers.txt 2 C:\Users\Andy\Downloads\dataset C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of primers txt file. The second command-line argument is the number of mismatches (2 is a good choice to set). Next argument is the directory of data set folder that contains input sequence read files. The last command-line argument specifies the directory where you want to save your output. After this script is completed, an output folder called “Output_dataset” will be in the saving directory you type in the command line.<br />
=== Running “updateMatrix.pl” ===<br />
A simple GUI “update_GUI” is also provided for Windows, Mac and Linux users. Double click the “update_GUI” will display a pop-up page. In this page, the first “Open” button is to open your scores txt file that contains the original genotyping and new genotyping information you want to update to. The second “Open” button is to open the original genotyping excel file. The “Choose” button is to choose the directory to save the new excel file. After all these parameters are filled, click the “Run the program” to run “updateMatrix.pl”.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “updateMatrix.pl” and your scores txt file “Scores.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the original genotyping excel file is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ updateMatrix.pl C:\Users\Andy\Downloads\Scores.txt C:\Users\Andy\Downloads\output.xls C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of scores txt file. The second command-line argument is the directory of original genotyping excel file (This excel file comes from the output.txt in the output folder generated by “SatGenotype.pl”, you can save the output.txt into excel file). The last command-line argument specifies the directory where you want to save your new excel file. After this script is completed, a new excel file called “Newoutput.xls” will be in the saving directory you type in the command line.<br />
<br />
=== Output folder ===<br />
<br />
The output folder generated by “SatGenotype.pl” has three types of files. In this folder, “output.txt” is a comma-separated txt file that gives all the genotype information for all the individuals and loci. In this output.txt file, “X X” means that this locus doesn’t occur in this individual. “0 0” means that the depth of alleles is too small to score. “Unscorable Unscorable” means that there are three possible real alleles, which makes the genotype difficult to be determined. You can use Microsoft Excel to open this csv file, which make this txt file more easily to read.<br />
<br />
Those txt files whose names start with “output” and follow by the individual name show the length distribution of each microsatellite locus. In those txt files, the first row illustrates the different length for all the loci in one individual. Each row under the first row shows the number of the occurrences of different lengths for each locus. The last column is the genotype information for all loci in one individual.<br />
<br />
And those split files whose names start with “Sorted” and follow by the individual name and locus name contain all the non-trimmed sequences for one individual & one locus. Obviously, those split files whose names start with “Trimmed” have all the trimmed sequences.<br />
<br />
For another Perl script “updateMatrix.pl” that helps to update the output.xls very fast, the output is an excel file called “Newoutput.xls”. This “Newoutput.xls” has all the updated genotyping information.</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=MEGASAT&diff=1343MEGASAT2015-05-01T16:35:18Z<p>Beiko: </p>
<hr />
<div><br />
== This page is currently under development. Please check back for release information about MEGASAT and updated documentation. ==<br />
<br />
== Overview ==<br />
<br />
<br />
The current version of MEGASAT is 1.0. The MEGASAT scripts should work with any relatively recent version of Perl and have been tested with versions 5.18.2 and 5.16.3.<br />
<br />
== Installing MEGASAT ==<br />
<br />
=== Windows ===<br />
In this “MEGASAT_1.0 for Windows” folder, there are two Perl executable files and two Executable Jar Files. MEGASAT_GUI is the graphical user interface for running the main Perl executable file called “MS”. “update_GUI” is the graphical user interface for running the Perl executable file called “updateMatrix”.<br />
<br />
If Perl is already installed in your computer, you can just click the Start button and go to your Perl interpreter to run the Perl scripts. If you don’t have Perl installed but still want to run the Perl scripts, here are the two main distributions for Windows: ActivePerl (http://www.activestate.com/activePerl) and Strawberry Perl (http://strawberryPerl.com/). We have used the latter in development and testing of MEGASAT, and recommend its use. The script “SatGenotype.pl” uses no complicated library functions, while the script “updateMatrix.pl” uses two packages called “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel”. Here is the link of the installation instructions for “Spreadsheet::WriteExcel”: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html. <br />
<br />
=== Macintosh ===<br />
If you are using a Macintosh system, Perl should already be installed; type “Perl –v” at the command line to ensure this is the case. So Perl scripts can be easily invoked from the terminal on Mac system. But if you don’t want to run scripts in terminal, two simple GUIs are also offered to invoke those two executable Perl scripts.<br />
In order to run “updateMatrix.pl”, two packages “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel” should be installed. The link of installation instructions is: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html.<br />
<br />
=== Linux ===<br />
Perl should already be installed on Linux system. So it’s easy to go to terminal to invoke Perl scripts. For running “updateMatrix.pl”, follow the installation instructions as above.<br />
<br />
== Input file formats ==<br />
“SatGenotype.pl” requires an input file with information about PCR primers, and a set of .fastq files representing reads from each sampled locus.<br />
<br />
=== Primer file ===<br />
The primer file must be in a tab-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2: forward primers<br />
- Column3: reverse primers<br />
- Column4: 3’ flank<br />
- Column5: 5’ flank<br />
- Column6: the repeat array<br />
<br />
In this txt file, a header line is required to specify the column name. If one locus doesn’t have 3’ flank, a character “A” needs to be written in the 3’ flank column in that txt file. But if it doesn’t have 5’ flank, nothing needs to be written in the 5’ flank column.<br />
<br />
Here is an [[Media:Guppy primers.txt|example primer file]].<br />
<br />
=== Input sequence file ===<br />
Input sequence read files must be in standard FASTQ format.<br />
<br />
Here is an [[Media:MEGASAT Example.fastq|example of a short FASTQ file that will work with the primer file above.]]<br />
<br />
“updateMatrix.pl” requires an input file with original genotyping and new genotyping information you want to update to. This scores txt file must be in comma-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2 & Column3: original genotype<br />
- Column4 & Column5: new genotype<br />
<br />
Another input file is your original genotyping excel file that contains genotyping information for different individuals and loci.<br />
<br />
== Running MEGASAT ==<br />
<br />
=== Running “SatGenotype.pl” ===<br />
If you don’t want to use command line to invoke scripts, a simple GUI is provided for Windows, Mac and Linux users. Double click the “MEGASAT_GUI” will display a pop-up page. In this page, you can click the first “Open” button to open your input primers file. The text field under the “Open” button will display the directory of your primers file. The second small text field is for typing the number of mismatches. The second “Open” button is to open the data set folder that contains the input sequence read files. The “Choose” button is to choose the directory to save your output folder. Two radio buttons in this page offer two options- compress the output folder or not compress the output folder. After all these parameters are filled, click the “Run the program” to run the Perl scripts.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “SatGenotype.pl” and your primers txt file “primers.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the data set folder “dataset” is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ SatGenotype.pl C:\Users\Andy\Downloads\primers.txt 2 C:\Users\Andy\Downloads\dataset C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of primers txt file. The second command-line argument is the number of mismatches (2 is a good choice to set). Next argument is the directory of data set folder that contains input sequence read files. The last command-line argument specifies the directory where you want to save your output. After this script is completed, an output folder called “Output_dataset” will be in the saving directory you type in the command line.<br />
Running “updateMatrix.pl”<br />
A simple GUI “update_GUI” is also provided for Windows, Mac and Linux users. Double click the “update_GUI” will display a pop-up page. In this page, the first “Open” button is to open your scores txt file that contains the original genotyping and new genotyping information you want to update to. The second “Open” button is to open the original genotyping excel file. The “Choose” button is to choose the directory to save the new excel file. After all these parameters are filled, click the “Run the program” to run “updateMatrix.pl”.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “updateMatrix.pl” and your scores txt file “Scores.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the original genotyping excel file is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ updateMatrix.pl C:\Users\Andy\Downloads\Scores.txt C:\Users\Andy\Downloads\output.xls C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of scores txt file. The second command-line argument is the directory of original genotyping excel file (This excel file comes from the output.txt in the output folder generated by “SatGenotype.pl”, you can save the output.txt into excel file). The last command-line argument specifies the directory where you want to save your new excel file. After this script is completed, a new excel file called “Newoutput.xls” will be in the saving directory you type in the command line.<br />
<br />
=== Output folder ===<br />
<br />
The output folder generated by “SatGenotype.pl” has three types of files. In this folder, “output.txt” is a comma-separated txt file that gives all the genotype information for all the individuals and loci. In this output.txt file, “X X” means that this locus doesn’t occur in this individual. “0 0” means that the depth of alleles is too small to score. “Unscorable Unscorable” means that there are three possible real alleles, which makes the genotype difficult to be determined. You can use Microsoft Excel to open this csv file, which make this txt file more easily to read.<br />
<br />
Those txt files whose names start with “output” and follow by the individual name show the length distribution of each microsatellite locus. In those txt files, the first row illustrates the different length for all the loci in one individual. Each row under the first row shows the number of the occurrences of different lengths for each locus. The last column is the genotype information for all loci in one individual.<br />
<br />
And those split files whose names start with “Sorted” and follow by the individual name and locus name contain all the non-trimmed sequences for one individual & one locus. Obviously, those split files whose names start with “Trimmed” have all the trimmed sequences.<br />
<br />
For another Perl script “updateMatrix.pl” that helps to update the output.xls very fast, the output is an excel file called “Newoutput.xls”. This “Newoutput.xls” has all the updated genotyping information.</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:MEGASAT_Example.fastq&diff=1342File:MEGASAT Example.fastq2015-05-01T16:34:17Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=MEGASAT&diff=1341MEGASAT2015-05-01T16:27:39Z<p>Beiko: </p>
<hr />
<div><br />
== This page is currently under development. Please check back for release information about MEGASAT and updated documentation. ==<br />
<br />
== Overview ==<br />
<br />
<br />
The current version of MEGASAT is 1.0. The MEGASAT scripts should work with any relatively recent version of Perl and have been tested with versions 5.18.2 and 5.16.3.<br />
<br />
== Installing MEGASAT ==<br />
<br />
=== Windows ===<br />
In this “MEGASAT_1.0 for Windows” folder, there are two Perl executable files and two Executable Jar Files. MEGASAT_GUI is the graphical user interface for running the main Perl executable file called “MS”. “update_GUI” is the graphical user interface for running the Perl executable file called “updateMatrix”.<br />
<br />
If Perl is already installed in your computer, you can just click the Start button and go to your Perl interpreter to run the Perl scripts. If you don’t have Perl installed but still want to run the Perl scripts, here are the two main distributions for Windows: ActivePerl (http://www.activestate.com/activePerl) and Strawberry Perl (http://strawberryPerl.com/). We have used the latter in development and testing of MEGASAT, and recommend its use. The script “SatGenotype.pl” uses no complicated library functions, while the script “updateMatrix.pl” uses two packages called “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel”. Here is the link of the installation instructions for “Spreadsheet::WriteExcel”: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html. <br />
<br />
=== Macintosh ===<br />
If you are using a Macintosh system, Perl should already be installed; type “Perl –v” at the command line to ensure this is the case. So Perl scripts can be easily invoked from the terminal on Mac system. But if you don’t want to run scripts in terminal, two simple GUIs are also offered to invoke those two executable Perl scripts.<br />
In order to run “updateMatrix.pl”, two packages “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel” should be installed. The link of installation instructions is: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html.<br />
<br />
=== Linux ===<br />
Perl should already be installed on Linux system. So it’s easy to go to terminal to invoke Perl scripts. For running “updateMatrix.pl”, follow the installation instructions as above.<br />
<br />
== Input file formats ==<br />
“SatGenotype.pl” requires an input file with information about PCR primers, and a set of .fastq files representing reads from each sampled locus.<br />
<br />
=== Primer file ===<br />
The primer file must be in a tab-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2: forward primers<br />
- Column3: reverse primers<br />
- Column4: 3’ flank<br />
- Column5: 5’ flank<br />
- Column6: the repeat array<br />
<br />
In this txt file, a header line is required to specify the column name. If one locus doesn’t have 3’ flank, a character “A” needs to be written in the 3’ flank column in that txt file. But if it doesn’t have 5’ flank, nothing needs to be written in the 5’ flank column.<br />
<br />
Here is an [[Media:Guppy primers.txt|example primer file]].<br />
<br />
=== Input sequence file ===<br />
Input sequence read files must be in standard FASTQ format<br />
<br />
'''Put example sequence file here'''<br />
<br />
“updateMatrix.pl” requires an input file with original genotyping and new genotyping information you want to update to. This scores txt file must be in comma-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2 & Column3: original genotype<br />
- Column4 & Column5: new genotype<br />
<br />
Another input file is your original genotyping excel file that contains genotyping information for different individuals and loci.<br />
<br />
== Running MEGASAT ==<br />
<br />
=== Running “SatGenotype.pl” ===<br />
If you don’t want to use command line to invoke scripts, a simple GUI is provided for Windows, Mac and Linux users. Double click the “MEGASAT_GUI” will display a pop-up page. In this page, you can click the first “Open” button to open your input primers file. The text field under the “Open” button will display the directory of your primers file. The second small text field is for typing the number of mismatches. The second “Open” button is to open the data set folder that contains the input sequence read files. The “Choose” button is to choose the directory to save your output folder. Two radio buttons in this page offer two options- compress the output folder or not compress the output folder. After all these parameters are filled, click the “Run the program” to run the Perl scripts.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “SatGenotype.pl” and your primers txt file “primers.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the data set folder “dataset” is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ SatGenotype.pl C:\Users\Andy\Downloads\primers.txt 2 C:\Users\Andy\Downloads\dataset C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of primers txt file. The second command-line argument is the number of mismatches (2 is a good choice to set). Next argument is the directory of data set folder that contains input sequence read files. The last command-line argument specifies the directory where you want to save your output. After this script is completed, an output folder called “Output_dataset” will be in the saving directory you type in the command line.<br />
Running “updateMatrix.pl”<br />
A simple GUI “update_GUI” is also provided for Windows, Mac and Linux users. Double click the “update_GUI” will display a pop-up page. In this page, the first “Open” button is to open your scores txt file that contains the original genotyping and new genotyping information you want to update to. The second “Open” button is to open the original genotyping excel file. The “Choose” button is to choose the directory to save the new excel file. After all these parameters are filled, click the “Run the program” to run “updateMatrix.pl”.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “updateMatrix.pl” and your scores txt file “Scores.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the original genotyping excel file is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ updateMatrix.pl C:\Users\Andy\Downloads\Scores.txt C:\Users\Andy\Downloads\output.xls C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of scores txt file. The second command-line argument is the directory of original genotyping excel file (This excel file comes from the output.txt in the output folder generated by “SatGenotype.pl”, you can save the output.txt into excel file). The last command-line argument specifies the directory where you want to save your new excel file. After this script is completed, a new excel file called “Newoutput.xls” will be in the saving directory you type in the command line.<br />
<br />
=== Output folder ===<br />
<br />
The output folder generated by “SatGenotype.pl” has three types of files. In this folder, “output.txt” is a comma-separated txt file that gives all the genotype information for all the individuals and loci. In this output.txt file, “X X” means that this locus doesn’t occur in this individual. “0 0” means that the depth of alleles is too small to score. “Unscorable Unscorable” means that there are three possible real alleles, which makes the genotype difficult to be determined. You can use Microsoft Excel to open this csv file, which make this txt file more easily to read.<br />
<br />
Those txt files whose names start with “output” and follow by the individual name show the length distribution of each microsatellite locus. In those txt files, the first row illustrates the different length for all the loci in one individual. Each row under the first row shows the number of the occurrences of different lengths for each locus. The last column is the genotype information for all loci in one individual.<br />
<br />
And those split files whose names start with “Sorted” and follow by the individual name and locus name contain all the non-trimmed sequences for one individual & one locus. Obviously, those split files whose names start with “Trimmed” have all the trimmed sequences.<br />
<br />
For another Perl script “updateMatrix.pl” that helps to update the output.xls very fast, the output is an excel file called “Newoutput.xls”. This “Newoutput.xls” has all the updated genotyping information.</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:Guppy_primers.txt&diff=1340File:Guppy primers.txt2015-05-01T16:24:27Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=MEGASAT&diff=1339MEGASAT2015-05-01T16:12:02Z<p>Beiko: </p>
<hr />
<div><br />
== This page is currently under development. Please check back for release information about MEGASAT and updated documentation. ==<br />
<br />
== Overview ==<br />
<br />
<br />
The current version of MEGASAT is 1.0. The MEGASAT scripts should work with any relatively recent version of Perl and have been tested with versions 5.18.2 and 5.16.3.<br />
<br />
== Installing MEGASAT ==<br />
<br />
=== Windows ===<br />
In this “MEGASAT_1.0 for Windows” folder, there are two Perl executable files and two Executable Jar Files. MEGASAT_GUI is the graphical user interface for running the main Perl executable file called “MS”. “update_GUI” is the graphical user interface for running the Perl executable file called “updateMatrix”.<br />
<br />
If Perl is already installed in your computer, you can just click the Start button and go to your Perl interpreter to run the Perl scripts. If you don’t have Perl installed but still want to run the Perl scripts, here are the two main distributions for Windows: ActivePerl (http://www.activestate.com/activePerl) and Strawberry Perl (http://strawberryPerl.com/). We have used the latter in development and testing of MEGASAT, and recommend its use. The script “SatGenotype.pl” uses no complicated library functions, while the script “updateMatrix.pl” uses two packages called “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel”. Here is the link of the installation instructions for “Spreadsheet::WriteExcel”: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html. <br />
<br />
=== Macintosh ===<br />
If you are using a Macintosh system, Perl should already be installed; type “Perl –v” at the command line to ensure this is the case. So Perl scripts can be easily invoked from the terminal on Mac system. But if you don’t want to run scripts in terminal, two simple GUIs are also offered to invoke those two executable Perl scripts.<br />
In order to run “updateMatrix.pl”, two packages “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel” should be installed. The link of installation instructions is: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html.<br />
<br />
=== Linux ===<br />
Perl should already be installed on Linux system. So it’s easy to go to terminal to invoke Perl scripts. For running “updateMatrix.pl”, follow the installation instructions as above.<br />
<br />
== Input file formats ==<br />
“SatGenotype.pl” requires an input file with information about PCR primers, and a set of .fastq files representing reads from each sampled locus.<br />
<br />
=== Primer file ===<br />
The primer file must be in a tab-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2: forward primers<br />
- Column3: reverse primers<br />
- Column4: 3’ flank<br />
- Column5: 5’ flank<br />
- Column6: the repeat array<br />
<br />
In this txt file, a header line is required to specify the column name. If one locus doesn’t have 3’ flank, a character “A” needs to be written in the 3’ flank column in that txt file. But if it doesn’t have 5’ flank, nothing needs to be written in the 5’ flank column.<br />
<br />
'''Put example primer file here'''<br />
<br />
=== Input sequence file ===<br />
Input sequence read files must be in standard FASTQ format<br />
<br />
'''Put example sequence file here'''<br />
<br />
“updateMatrix.pl” requires an input file with original genotyping and new genotyping information you want to update to. This scores txt file must be in comma-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2 & Column3: original genotype<br />
- Column4 & Column5: new genotype<br />
<br />
Another input file is your original genotyping excel file that contains genotyping information for different individuals and loci.<br />
<br />
== Running MEGASAT ==<br />
<br />
=== Running “SatGenotype.pl” ===<br />
If you don’t want to use command line to invoke scripts, a simple GUI is provided for Windows, Mac and Linux users. Double click the “MEGASAT_GUI” will display a pop-up page. In this page, you can click the first “Open” button to open your input primers file. The text field under the “Open” button will display the directory of your primers file. The second small text field is for typing the number of mismatches. The second “Open” button is to open the data set folder that contains the input sequence read files. The “Choose” button is to choose the directory to save your output folder. Two radio buttons in this page offer two options- compress the output folder or not compress the output folder. After all these parameters are filled, click the “Run the program” to run the Perl scripts.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “SatGenotype.pl” and your primers txt file “primers.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the data set folder “dataset” is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ SatGenotype.pl C:\Users\Andy\Downloads\primers.txt 2 C:\Users\Andy\Downloads\dataset C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of primers txt file. The second command-line argument is the number of mismatches (2 is a good choice to set). Next argument is the directory of data set folder that contains input sequence read files. The last command-line argument specifies the directory where you want to save your output. After this script is completed, an output folder called “Output_dataset” will be in the saving directory you type in the command line.<br />
Running “updateMatrix.pl”<br />
A simple GUI “update_GUI” is also provided for Windows, Mac and Linux users. Double click the “update_GUI” will display a pop-up page. In this page, the first “Open” button is to open your scores txt file that contains the original genotyping and new genotyping information you want to update to. The second “Open” button is to open the original genotyping excel file. The “Choose” button is to choose the directory to save the new excel file. After all these parameters are filled, click the “Run the program” to run “updateMatrix.pl”.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “updateMatrix.pl” and your scores txt file “Scores.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the original genotyping excel file is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ updateMatrix.pl C:\Users\Andy\Downloads\Scores.txt C:\Users\Andy\Downloads\output.xls C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of scores txt file. The second command-line argument is the directory of original genotyping excel file (This excel file comes from the output.txt in the output folder generated by “SatGenotype.pl”, you can save the output.txt into excel file). The last command-line argument specifies the directory where you want to save your new excel file. After this script is completed, a new excel file called “Newoutput.xls” will be in the saving directory you type in the command line.<br />
<br />
=== Output folder ===<br />
<br />
The output folder generated by “SatGenotype.pl” has three types of files. In this folder, “output.txt” is a comma-separated txt file that gives all the genotype information for all the individuals and loci. In this output.txt file, “X X” means that this locus doesn’t occur in this individual. “0 0” means that the depth of alleles is too small to score. “Unscorable Unscorable” means that there are three possible real alleles, which makes the genotype difficult to be determined. You can use Microsoft Excel to open this csv file, which make this txt file more easily to read.<br />
<br />
Those txt files whose names start with “output” and follow by the individual name show the length distribution of each microsatellite locus. In those txt files, the first row illustrates the different length for all the loci in one individual. Each row under the first row shows the number of the occurrences of different lengths for each locus. The last column is the genotype information for all loci in one individual.<br />
<br />
And those split files whose names start with “Sorted” and follow by the individual name and locus name contain all the non-trimmed sequences for one individual & one locus. Obviously, those split files whose names start with “Trimmed” have all the trimmed sequences.<br />
<br />
For another Perl script “updateMatrix.pl” that helps to update the output.xls very fast, the output is an excel file called “Newoutput.xls”. This “Newoutput.xls” has all the updated genotyping information.</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=MEGASAT&diff=1338MEGASAT2015-05-01T16:09:38Z<p>Beiko: </p>
<hr />
<div><br />
== Overview ==<br />
<br />
<br />
The current version of MEGASAT is 1.0. The MEGASAT scripts should work with any relatively recent version of Perl and have been tested with versions 5.18.2 and 5.16.3.<br />
<br />
== Installing MEGASAT ==<br />
<br />
=== Windows ===<br />
In this “MEGASAT_1.0 for Windows” folder, there are two Perl executable files and two Executable Jar Files. MEGASAT_GUI is the graphical user interface for running the main Perl executable file called “MS”. “update_GUI” is the graphical user interface for running the Perl executable file called “updateMatrix”.<br />
<br />
If Perl is already installed in your computer, you can just click the Start button and go to your Perl interpreter to run the Perl scripts. If you don’t have Perl installed but still want to run the Perl scripts, here are the two main distributions for Windows: ActivePerl (http://www.activestate.com/activePerl) and Strawberry Perl (http://strawberryPerl.com/). We have used the latter in development and testing of MEGASAT, and recommend its use. The script “SatGenotype.pl” uses no complicated library functions, while the script “updateMatrix.pl” uses two packages called “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel”. Here is the link of the installation instructions for “Spreadsheet::WriteExcel”: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html. <br />
<br />
=== Macintosh ===<br />
If you are using a Macintosh system, Perl should already be installed; type “Perl –v” at the command line to ensure this is the case. So Perl scripts can be easily invoked from the terminal on Mac system. But if you don’t want to run scripts in terminal, two simple GUIs are also offered to invoke those two executable Perl scripts.<br />
In order to run “updateMatrix.pl”, two packages “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel” should be installed. The link of installation instructions is: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html.<br />
<br />
=== Linux ===<br />
Perl should already be installed on Linux system. So it’s easy to go to terminal to invoke Perl scripts. For running “updateMatrix.pl”, follow the installation instructions as above.<br />
<br />
== Input file formats ==<br />
“SatGenotype.pl” requires an input file with information about PCR primers, and a set of .fastq files representing reads from each sampled locus.<br />
<br />
=== Primer file ===<br />
The primer file must be in a tab-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2: forward primers<br />
- Column3: reverse primers<br />
- Column4: 3’ flank<br />
- Column5: 5’ flank<br />
- Column6: the repeat array<br />
<br />
In this txt file, a header line is required to specify the column name. If one locus doesn’t have 3’ flank, a character “A” needs to be written in the 3’ flank column in that txt file. But if it doesn’t have 5’ flank, nothing needs to be written in the 5’ flank column.<br />
<br />
'''Put example primer file here'''<br />
<br />
=== Input sequence file ===<br />
Input sequence read files must be in standard FASTQ format<br />
<br />
'''Put example sequence file here'''<br />
<br />
“updateMatrix.pl” requires an input file with original genotyping and new genotyping information you want to update to. This scores txt file must be in comma-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2 & Column3: original genotype<br />
- Column4 & Column5: new genotype<br />
<br />
Another input file is your original genotyping excel file that contains genotyping information for different individuals and loci.<br />
<br />
== Running MEGASAT ==<br />
<br />
=== Running “SatGenotype.pl” ===<br />
If you don’t want to use command line to invoke scripts, a simple GUI is provided for Windows, Mac and Linux users. Double click the “MEGASAT_GUI” will display a pop-up page. In this page, you can click the first “Open” button to open your input primers file. The text field under the “Open” button will display the directory of your primers file. The second small text field is for typing the number of mismatches. The second “Open” button is to open the data set folder that contains the input sequence read files. The “Choose” button is to choose the directory to save your output folder. Two radio buttons in this page offer two options- compress the output folder or not compress the output folder. After all these parameters are filled, click the “Run the program” to run the Perl scripts.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “SatGenotype.pl” and your primers txt file “primers.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the data set folder “dataset” is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ SatGenotype.pl C:\Users\Andy\Downloads\primers.txt 2 C:\Users\Andy\Downloads\dataset C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of primers txt file. The second command-line argument is the number of mismatches (2 is a good choice to set). Next argument is the directory of data set folder that contains input sequence read files. The last command-line argument specifies the directory where you want to save your output. After this script is completed, an output folder called “Output_dataset” will be in the saving directory you type in the command line.<br />
Running “updateMatrix.pl”<br />
A simple GUI “update_GUI” is also provided for Windows, Mac and Linux users. Double click the “update_GUI” will display a pop-up page. In this page, the first “Open” button is to open your scores txt file that contains the original genotyping and new genotyping information you want to update to. The second “Open” button is to open the original genotyping excel file. The “Choose” button is to choose the directory to save the new excel file. After all these parameters are filled, click the “Run the program” to run “updateMatrix.pl”.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “updateMatrix.pl” and your scores txt file “Scores.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the original genotyping excel file is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ updateMatrix.pl C:\Users\Andy\Downloads\Scores.txt C:\Users\Andy\Downloads\output.xls C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of scores txt file. The second command-line argument is the directory of original genotyping excel file (This excel file comes from the output.txt in the output folder generated by “SatGenotype.pl”, you can save the output.txt into excel file). The last command-line argument specifies the directory where you want to save your new excel file. After this script is completed, a new excel file called “Newoutput.xls” will be in the saving directory you type in the command line.<br />
<br />
=== Output folder ===<br />
<br />
The output folder generated by “SatGenotype.pl” has three types of files. In this folder, “output.txt” is a comma-separated txt file that gives all the genotype information for all the individuals and loci. In this output.txt file, “X X” means that this locus doesn’t occur in this individual. “0 0” means that the depth of alleles is too small to score. “Unscorable Unscorable” means that there are three possible real alleles, which makes the genotype difficult to be determined. You can use Microsoft Excel to open this csv file, which make this txt file more easily to read.<br />
<br />
Those txt files whose names start with “output” and follow by the individual name show the length distribution of each microsatellite locus. In those txt files, the first row illustrates the different length for all the loci in one individual. Each row under the first row shows the number of the occurrences of different lengths for each locus. The last column is the genotype information for all loci in one individual.<br />
<br />
And those split files whose names start with “Sorted” and follow by the individual name and locus name contain all the non-trimmed sequences for one individual & one locus. Obviously, those split files whose names start with “Trimmed” have all the trimmed sequences.<br />
<br />
For another Perl script “updateMatrix.pl” that helps to update the output.xls very fast, the output is an excel file called “Newoutput.xls”. This “Newoutput.xls” has all the updated genotyping information.</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=MEGASAT&diff=1337MEGASAT2015-05-01T16:00:22Z<p>Beiko: Created page with " == Overview == The current version of MEGASAT is 1.0. == Installing MEGASAT == === Windows === In this “MEGASAT_1.0 for Windows” folder, there are two Perl executable..."</p>
<hr />
<div><br />
== Overview ==<br />
<br />
<br />
The current version of MEGASAT is 1.0.<br />
<br />
== Installing MEGASAT ==<br />
<br />
=== Windows ===<br />
In this “MEGASAT_1.0 for Windows” folder, there are two Perl executable files and two Executable Jar Files. MEGASAT_GUI is the graphical user interface for running the main Perl executable file called “MS”. “update_GUI” is the graphical user interface for running the Perl executable file called “updateMatrix”.<br />
<br />
If Perl is already installed in your computer, you can just click the Start button and go to your Perl interpreter to run the Perl scripts. If you don’t have Perl installed but still want to run the Perl scripts, here are the two main distributions for Windows: ActivePerl (http://www.activestate.com/activePerl) and Strawberry Perl (http://strawberryPerl.com/). We have used the latter in development and testing of MEGASAT, and recommend its use. The script “SatGenotype.pl” uses no complicated library functions, while the script “updateMatrix.pl” uses two packages called “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel”. Here is the link of the installation instructions for “Spreadsheet::WriteExcel”: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html. <br />
<br />
=== Macintosh ===<br />
If you are using a Macintosh system, Perl should already be installed; type “Perl –v” at the command line to ensure this is the case. So Perl scripts can be easily invoked from the terminal on Mac system. But if you don’t want to run scripts in terminal, two simple GUIs are also offered to invoke those two executable Perl scripts.<br />
In order to run “updateMatrix.pl”, two packages “Spreadsheet::ParseExcel” and “Spreadsheet::WriteExcel” should be installed. The link of installation instructions is: http://www.j-tsurugashima.com/cgi/lib/Spreadsheet/WriteExcel/doc/install.html.<br />
<br />
=== Linux ===<br />
Perl should already be installed on Linux system. So it’s easy to go to terminal to invoke Perl scripts. For running “updateMatrix.pl”, follow the installation instructions as above.<br />
<br />
Those two scripts should work with any relatively recent version of Perl and have been tested with versions 5.18.2 and 5.16.3.<br />
<br />
Input file formats<br />
“SatGenotype.pl” requires an input file with information about PCR primers, and a set of .fastq files representing reads from each sampled locus.<br />
<br />
The primer file must be in a tab-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2: forward primers<br />
- Column3: reverse primers<br />
- Column4: 3’ flank<br />
- Column5: 5’ flank<br />
- Column6: the repeat array<br />
In this txt file, a header line is required to specify the column name. If one locus doesn’t have 3’ flank, a character “A” needs to be written in the 3’ flank column in that txt file. But if it doesn’t have 5’ flank, nothing needs to be written in the 5’ flank column.<br />
<br />
Input sequence read files must be in standard FASTQ format (see example data file )<br />
<br />
“updateMatrix.pl” requires an input file with original genotyping and new genotyping information you want to update to. This scores txt file must be in comma-separated format, with the following headers:<br />
- Column1: locus name<br />
- Column2 & Column3: original genotype<br />
- Column4 & Column5: new genotype<br />
<br />
Another input file is your original genotyping excel file that contains genotyping information for different individuals and loci.<br />
<br />
Running MEGASAT<br />
<br />
Running “SatGenotype.pl”<br />
If you don’t want to use command line to invoke scripts, a simple GUI is provided for Windows, Mac and Linux users. Double click the “MEGASAT_GUI” will display a pop-up page. In this page, you can click the first “Open” button to open your input primers file. The text field under the “Open” button will display the directory of your primers file. The second small text field is for typing the number of mismatches. The second “Open” button is to open the data set folder that contains the input sequence read files. The “Choose” button is to choose the directory to save your output folder. Two radio buttons in this page offer two options- compress the output folder or not compress the output folder. After all these parameters are filled, click the “Run the program” to run the Perl scripts.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “SatGenotype.pl” and your primers txt file “primers.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the data set folder “dataset” is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ SatGenotype.pl C:\Users\Andy\Downloads\primers.txt 2 C:\Users\Andy\Downloads\dataset C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of primers txt file. The second command-line argument is the number of mismatches (2 is a good choice to set). Next argument is the directory of data set folder that contains input sequence read files. The last command-line argument specifies the directory where you want to save your output. After this script is completed, an output folder called “Output_dataset” will be in the saving directory you type in the command line.<br />
Running “updateMatrix.pl”<br />
A simple GUI “update_GUI” is also provided for Windows, Mac and Linux users. Double click the “update_GUI” will display a pop-up page. In this page, the first “Open” button is to open your scores txt file that contains the original genotyping and new genotyping information you want to update to. The second “Open” button is to open the original genotyping excel file. The “Choose” button is to choose the directory to save the new excel file. After all these parameters are filled, click the “Run the program” to run “updateMatrix.pl”.<br />
<br />
If you want to run the scripts from command line, for Windows users, make sure you already have Perl installed in your system. We assume that “updateMatrix.pl” and your scores txt file “Scores.txt” are saved in the directory “C:\Users\Andy\Downloads”. And the original genotyping excel file is also saved in the directory “C:\Users\Andy\Downloads”. In order to run the Perl script, first step is go back to the command prompt and type “perl C:\Users\Andy\Downloads\ updateMatrix.pl C:\Users\Andy\Downloads\Scores.txt C:\Users\Andy\Downloads\output.xls C:\Users\Andy\Desktop”. <br />
<br />
The first command-line argument is the directory of scores txt file. The second command-line argument is the directory of original genotyping excel file (This excel file comes from the output.txt in the output folder generated by “SatGenotype.pl”, you can save the output.txt into excel file). The last command-line argument specifies the directory where you want to save your new excel file. After this script is completed, a new excel file called “Newoutput.xls” will be in the saving directory you type in the command line.<br />
<br />
Output folder<br />
<br />
The output folder generated by “SatGenotype.pl” has three types of files. In this folder, “output.txt” is a comma-separated txt file that gives all the genotype information for all the individuals and loci. In this output.txt file, “X X” means that this locus doesn’t occur in this individual. “0 0” means that the depth of alleles is too small to score. “Unscorable Unscorable” means that there are three possible real alleles, which makes the genotype difficult to be determined. You can use Microsoft Excel to open this csv file, which make this txt file more easily to read.<br />
<br />
Those txt files whose names start with “output” and follow by the individual name show the length distribution of each microsatellite locus. In those txt files, the first row illustrates the different length for all the loci in one individual. Each row under the first row shows the number of the occurrences of different lengths for each locus. The last column is the genotype information for all loci in one individual.<br />
<br />
And those split files whose names start with “Sorted” and follow by the individual name and locus name contain all the non-trimmed sequences for one individual & one locus. Obviously, those split files whose names start with “Trimmed” have all the trimmed sequences.<br />
<br />
For another Perl script “updateMatrix.pl” that helps to update the output.xls very fast, the output is an excel file called “Newoutput.xls”. This “Newoutput.xls” has all the updated genotyping information.</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1336Main Page2015-05-01T15:58:13Z<p>Beiko: </p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [[PICA]]: genotype-phenotype data mining software (G).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]<br />
* [[MEGASAT]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1335Main Page2015-05-01T15:54:47Z<p>Beiko: </p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [[PICA]]: genotype-phenotype data mining software (G).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1334Main Page2015-05-01T15:53:16Z<p>Beiko: </p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [[PICA]]: genotype-phenotype data mining software (G).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1333Main Page2015-02-20T15:41:49Z<p>Beiko: </p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [[PICA]]: genotype-phenotype data mining software (G).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMonitor]<br />
* [http://bio.mquter.qut.edu.au/Moa/ MOAMap]<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald<br />
<br />
* [[TBD]]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1332Main Page2015-02-20T15:41:17Z<p>Beiko: </p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPANNER]]: Homology-based taxonomic classification of protein sequences. (GML)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* [[TBD]]<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [[PICA]]: genotype-phenotype data mining software (G).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMonitor]<br />
* [http://bio.mquter.qut.edu.au/Moa/ MOAMap]<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1045Main Page2012-06-18T00:57:31Z<p>Beiko: </p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
== Older Software ==<br />
<br />
The following software packages have largely been superseded by others in the above list, or by software written by others. The software should still work, but we can no longer offer significant support for it.<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [[PICA]]: genotype-phenotype data mining software (G).<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMonitor]<br />
* [http://bio.mquter.qut.edu.au/Moa/ MOAMap]<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=1044Main Page2012-06-18T00:54:49Z<p>Beiko: /* Software */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software (Currently Supported) ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [[ExpressBetaDiversity | Express Beta Diversity (EBD)]]: taxon- and phylogenetic-based beta diversity measures. (P)<br />
<br />
* [[FCP | Fragment classification package (FCP)]]: Homology- and composition-based classifiers for assigning a taxonomic attribution to metagenomic fragments. (GML)<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[NetworkDiversity| Network Diversity]]: calculation of beta diversity over phylogenetic networks. (P)<br />
<br />
* [[PICA]]: genotype-phenotype data mining software (G).<br />
<br />
* [[RITA]]: Rapid Identification of Taxonomic Assignments for metagenomic fragments (M).<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[SPRSupertrees]]: software to calculate rooted supertrees that minimize the SPR distance. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* VAREB [to be added]<br />
<br />
== Older Software ==<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation <br />
in mind.<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMonitor]<br />
* [http://bio.mquter.qut.edu.au/Moa/ MOAMap]<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Mailing list ==<br />
<br />
* Join our [http://ratite.cs.dal.ca/software_email_list/subscribe mailing list] to keep informed about new developments.<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=RITA&diff=1008RITA2012-04-04T11:18:32Z<p>Beiko: /* Rapid identification of taxonomic assignments (RITA) */</p>
<hr />
<div>== Overview of RITA ==<br />
<br />
RITA is a standalone software package and Web server for taxonomic assignment of metagenomic sequence reads. By combining homology predictions from BLAST or UBLAST with compositional classifications from a Naive Bayes classifier, RITA is able to achieve very high accuracy on short reads. Unlike other hybrid approaches which combine these predictions for all sequences to be classified, RITA uses a pipeline to first identify cases where both types of classifier are in agreement, which constitute the highest-confidence set. Sequences not classified in this manner are subjected to a series of downstream classification steps. <br />
<br />
This work has been accepted for publication:<br />
<br />
MacDonald NJ, Parks DH, and Beiko RG. Rapid identification of taxonomic assignments. Accepted to ''Nucleic Acids Research'' April 4, 2012.<br />
<br />
If you have any questions or bug reports, please let us know at <beiko@cs.dal.ca>.<br />
<br />
== Web Server ==<br />
<br />
For smaller datasets, taxonomic attributions can be obtained with the [http://ratite.cs.dal.ca/rita RITA web server].<br />
<br />
== License ==<br />
RITA is released under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Share-Alike Attribution 3.0 License].<br />
<br />
== Downloads ==<br />
* [[Media:RITA_v1_0_1.zip|RITA v1.0.1]] RITA source code<br />
<br />
== Rank-specific Setup ==<br />
<br />
''Prerequisites'': You must have [ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ BLAST+ 2.2.21] or higher installed.<br />
<br />
* These instruction assume the following directory structure:<br />
<pre><br />
/home/<user>/RITA/<br />
FastTree/<br />
FCP/<br />
mothur/<br />
rita/<br />
</pre><br />
<br />
which will contain installations of [http://www.microbesonline.org/fasttree/ FastTree], [http://kiwi.cs.dal.ca/Software/FCP FCP], [http://www.mothur.org/ mothur], and RITA, respectively.<br />
<br />
* Unzip RITA:<br />
> unzip RITA_v1_0_1.zip<br />
<br />
* Download, unzip, and run [http://kiwi.cs.dal.ca/Software/FCP FCP] install (FCP_install.py) with '--protein ncbi_genomes.faa' flag:<br />
> unzip FCP_1_0_3.zip -d ./FCP<br />
> cd FCP<br />
> python FCP_install.py --protein ncbi_genomes.faa<br />
<br />
* Concatenate the files in the training/sequences folder of FCP into a single input nucleotide file (e.g. ncbi_genomes.fna):<br />
> cat ./training/sequences/*.fasta > ncbi_genomes.fna<br />
<br />
* RITA is designed to run over multiple BLAST databases in order to reduce memory consumption. Therefore, you should split both the nucleotide and protein files into X pieces using the splitfasta.py script in the <br />
scripts directory (X can be 1 if you do not wish to split the files, we recommend X=10):<br />
> cd ../rita<br />
> mv ../FCP/ncbi_genomes.faa .<br />
> mv ../FCP/ncbi_genomes.fna . <br />
> cd scripts<br />
> python splitfasta.py ../ncbi_genomes.fna 10<br />
> python splitfasta.py ../ncbi_genomes.faa 10<br />
<br />
* Create a nucleotide database with makeblastdb BLAST+ for each input ncbi_genomes.p*.fna:<br />
> makeblastdb -in "ncbi_genomes.p1.fna" -dbtype nucl<br />
> makeblastdb -in "ncbi_genomes.p2.fna" -dbtype nucl<br />
> ...<br />
> makeblastdb -in "ncbi_genomes.pX.fna" -dbtype nucl<br />
<br />
* Create a protein database with makeblastdb BLAST+ for each input ncbi_genomes.p*.faa.<br />
> makeblastdb -in "ncbi_genomes.p1.faa" -dbtype prot<br />
> makeblastdb -in "ncbi_genomes.p2.faa" -dbtype prot<br />
> ...<br />
> makeblastdb -in "ncbi_genomes.pX.faa" -dbtype prot<br />
<br />
* Set BLASTDB_PARTS to X in globalconfig.cfg and the name of the database to<br />
<database_name>.p%%d (e.g. ncbi_genomes.p%%d.fna if this was your output BLAST database name)<br />
%%d is the placeholder for the database number identifier, 1..X.<br />
<br />
* Configure ''globalsettings.cfg'' appropriately for the above installation directories.<br />
<br />
* To use the UBLASTX classifier, you must also obtain a licensed copy of usearch and set up the configuration file appropriately.<br />
<br />
== Rank-specific example usage ==<br />
python rita.py --rank PHYLUM --pipeline NB_DCMEGABLAST,DCMEGABLAST_RATIO,NB_RATIO,NB_ML --query fragments.fasta --out results.txt<br />
<br />
Note: Use the --jobid flag with a unique job identifier if running rita in parallel to ensure intermediate temporary files <br />
are not overwritten.<br />
<br />
The above command will classify the fragments in 'fragments.fasta' at the rank of PHYLUM, using a pipeline that starts with the<br />
consensus of Naive Bayes and DCMEGABLAST, then the most confident DCMEGABLAST results, then the most confident Naive Bayes results<br />
and finally with the maximum likelihood NB prediction. Note that fragments are not attempted to be classified at <br />
a given step in the pipeline if they have already been classified at an earlier step (the order matters). See the Pipeline Components<br />
section below.<br />
<br />
== Rank-flexible setup ==<br />
''Prerequisites'':<br />
* Install [http://biopython.org/wiki/Biopython BioPython] (needed for tree manipulation).<br />
* Install [http://www.mothur.org/ MOTHUR] for 16S DNA alignments.<br />
* Install [http://www.microbesonline.org/fasttree/ FastTree] for building 16S trees.<br />
<br />
To configure RITA for rank-flexible classifications follow these steps:<br />
<br />
* Follow the instructions for a rank-specific RITA installation given above.<br />
* Update the MOTHUR and FastTree installation paths in ''globalsettings.cfg''.<br />
* Build a trusted BLAST database of 16S sequences. We recommend using the hand-curated sequences from [http://rdp.cme.msu.edu/ RDP].<br />
** Download and extract the unaligned Bacteria and Archaea sequences from RDP ([http://rdp.cme.msu.edu/download/release10_28_unaligned.fa.gz link]) into a directory called ''RDP'':<br />
> gunzip release10_28_unaligned.fa.gz<br />
** Create the BLAST database:<br />
> makeblastdb -in "release10_28_unaligned.fa" -dbtype nucl<br />
* From the ''rita'' directory, BLAST the complete genomes against the 16S database:<br />
> blastn -query ncbi_genomes.fna -db ../RDP/release10_28_unaligned.fa -out ncbi_genomes_16S.blast.txt -evalue 1e-10 -outfmt 6<br />
* Use the ''get16s.py'' script to extract a single 16S sequence from each genome based on the best BLAST match:<br />
> cd ./scripts<br />
> python get16s.py ../ncbi_genomes_16S.blast.txt ../../FCP/training/sequences ../../FCP/taxonomy.txt<br />
> mv sequences_of_16s.fasta ../<br />
* The above script will produce the file ''sequences_of_16s.fasta'' which must be align. This can be done with MOTHUR using the following command:<br />
mothur > set.dir(input=../rita)<br />
mothur > set.dir(output=../rita)<br />
mothur > align.seqs(candidate=sequences_of_16s.fasta, template=core_set_aligned.imputed.fasta, flip=t)<br />
mothur > quit()<br />
* Place a copy of the [http://www.mothur.org/wiki/Lane_mask 1349 character Lane Mask] in your ''mothur'' directory.<br />
** Note: core_set_aligned.imputed.fasta can be obtained from the MOTHUR [http://www.mothur.org/wiki/Greengenes-formatted_databases here].<br />
* Update the MOTHUR_16S_ALIGNMENT setting in ''globalsettings.cfg'' to point to the file ''sequences_of_16s.align'' which will be in the ''rita'' directory.<br />
<br />
You are now ready to use rank-flexible RITA.<br />
<br />
== Rank-flexible example usage ==<br />
To run rank-flexible RITA, you must first generate a proxy file for the 16S sequences contained in your sample:<br />
python rita.py --buildproxy <sample_16s_fragments.fasta> --out proxy.txt<br />
<br />
Then run rank-flexible RITA in the same way as rank-specific RITA, but specify the proxy file and the rank as FLEXIBLE<br />
python rita.py --proxy proxy.txt --rank FLEXIBLE --pipeline NB_DCMEGABLAST,DCMEGABLAST_RATIO,NB_RATIO,NB_ML --query fragments.fasta --out results.txt<br />
<br />
For more information on how rank-flexible RITA works, please see the publication.<br />
<br />
== Pipeline Components ==<br />
<br />
Included pipeline components (labellers) (specify with --pipeline A,B,C,...)<br />
<br />
<pre><br />
NB_DCMEGABLAST - labels fragments that agree at rank X for NB and DCMEGABLAST<br />
NB_BLASTN - labels fragments that agree at rank X for NB and BLASTN<br />
NB_BLASTX - labels fragments that agree at rank X for NB and BLASTX<br />
NB_UBLASTX - labels fragments that agree at rank X for NB and UBLASTX<br />
<br />
DCMEGABLAST_RATIO - labels fragments that where the best DCMEGABLAST match evalue is at least Y times greater than the next best<br />
BLASTN_RATIO - labels fragments that where the best BLASTN match evalue is at least Y times greater than the next best<br />
BLASTX_RATIO - labels fragments that where the best BLASTX match evalue is at least Y times greater than the next best<br />
UBLASTX_RATIO - labels fragments that where the best BLASTX match evalue is at least Y times greater than the next best<br />
NB_RATIO - labels fragments that where the best NB match likelihood is at least Y times greater than the next best<br />
<br />
NB_ML - labels fragments that with the best NB match (if there are no ties)<br />
NULL_LABELLER - labels all remaining fragments with NONE (this should only ever be the last step in the pipeline).<br />
</pre><br />
<br />
== RITA Parameters ==<br />
<br />
''rita.py'' accepts the following command-line parameters:<br />
<br />
<pre><br />
--help Provides a description of accepted command-line parameters.<br />
<br />
--pipeline Specify the components of the pipeline.<br />
--rank Taxonomic rank to classify at.<br />
<br />
--blastne BLASTN E-value threshold.<br />
--dblastne Discontiguous MegaBLASTN E-value threshold.<br />
--blastxe BLASTX E-value threshold.<br />
--ublastxe UBLASTX (usearch) E-value threshold.<br />
<br />
--blastnratio BLASTN E-value ratio.<br />
--dblastnratio Discontiguous MegaBLASTN E-value ratio.<br />
--blastxratio BLASTX E-value ratio.<br />
--ublastxratio UBLASTX (usearch) E-value ratio.<br />
--nb_ratio NB Likelihood ratio.<br />
<br />
--query FASTA file with query sequences.<br />
--out Output filename.<br />
<br />
--jobid Specify a job number. Default is a random 4 digit identifier.<br />
--buildproxy Build a proxy for rank-flexible classifications with the provided 16S sequences.<br />
--proxy Proxy for rank-flexible classifications created with --buildproxy.<br />
</pre><br />
<br />
== Contact Information ==<br />
<br />
Suggestions, comments, and bug reports can be sent to Rob Beiko (beiko [at] cs.dal.ca). If reporting a bug, please provide as much information as possible and a simplified version of the data set which causes the bug. This will allow us to quickly resolve the issue. <br />
<br />
== Funding ==<br />
<br />
The development and deployment of RITA has been supported by several organizations:<br />
<br />
* [http://www.genomeatlantic.ca Genome Atlantic]<br />
* The Dalhousie Centre for Comparative Genomics and Evolutionary Bioinformatics, and the [http://www.tula.org/ Tula Foundation]<br />
* [http://www.nserc.ca The Natural Sciences and Engineering Research Council of Canada]<br />
* The Dalhousie [http://cs.dal.ca Faculty of Computer Science]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=RITA&diff=1007RITA2012-04-04T11:17:54Z<p>Beiko: </p>
<hr />
<div>== Rapid identification of taxonomic assignments (RITA) ==<br />
<br />
RITA is a standalone software package and Web server for taxonomic assignment of metagenomic sequence reads. By combining homology predictions from BLAST or UBLAST with compositional classifications from a Naive Bayes classifier, RITA is able to achieve very high accuracy on short reads. Unlike other hybrid approaches which combine these predictions for all sequences to be classified, RITA uses a pipeline to first identify cases where both types of classifier are in agreement, which constitute the highest-confidence set. Sequences not classified in this manner are subjected to a series of downstream classification steps. <br />
<br />
This work has been accepted for publication:<br />
<br />
MacDonald NJ, Parks DH, and Beiko RG. Rapid identification of taxonomic assignments. Accepted to 'Nucleic Acids Research' April 4, 2012.<br />
<br />
If you have any questions or bug reports, please let us know at <beiko@cs.dal.ca>.<br />
<br />
== Web Server ==<br />
<br />
For smaller datasets, taxonomic attributions can be obtained with the [http://ratite.cs.dal.ca/rita RITA web server].<br />
<br />
== License ==<br />
RITA is released under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Share-Alike Attribution 3.0 License].<br />
<br />
== Downloads ==<br />
* [[Media:RITA_v1_0_1.zip|RITA v1.0.1]] RITA source code<br />
<br />
== Rank-specific Setup ==<br />
<br />
''Prerequisites'': You must have [ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ BLAST+ 2.2.21] or higher installed.<br />
<br />
* These instruction assume the following directory structure:<br />
<pre><br />
/home/<user>/RITA/<br />
FastTree/<br />
FCP/<br />
mothur/<br />
rita/<br />
</pre><br />
<br />
which will contain installations of [http://www.microbesonline.org/fasttree/ FastTree], [http://kiwi.cs.dal.ca/Software/FCP FCP], [http://www.mothur.org/ mothur], and RITA, respectively.<br />
<br />
* Unzip RITA:<br />
> unzip RITA_v1_0_1.zip<br />
<br />
* Download, unzip, and run [http://kiwi.cs.dal.ca/Software/FCP FCP] install (FCP_install.py) with '--protein ncbi_genomes.faa' flag:<br />
> unzip FCP_1_0_3.zip -d ./FCP<br />
> cd FCP<br />
> python FCP_install.py --protein ncbi_genomes.faa<br />
<br />
* Concatenate the files in the training/sequences folder of FCP into a single input nucleotide file (e.g. ncbi_genomes.fna):<br />
> cat ./training/sequences/*.fasta > ncbi_genomes.fna<br />
<br />
* RITA is designed to run over multiple BLAST databases in order to reduce memory consumption. Therefore, you should split both the nucleotide and protein files into X pieces using the splitfasta.py script in the <br />
scripts directory (X can be 1 if you do not wish to split the files, we recommend X=10):<br />
> cd ../rita<br />
> mv ../FCP/ncbi_genomes.faa .<br />
> mv ../FCP/ncbi_genomes.fna . <br />
> cd scripts<br />
> python splitfasta.py ../ncbi_genomes.fna 10<br />
> python splitfasta.py ../ncbi_genomes.faa 10<br />
<br />
* Create a nucleotide database with makeblastdb BLAST+ for each input ncbi_genomes.p*.fna:<br />
> makeblastdb -in "ncbi_genomes.p1.fna" -dbtype nucl<br />
> makeblastdb -in "ncbi_genomes.p2.fna" -dbtype nucl<br />
> ...<br />
> makeblastdb -in "ncbi_genomes.pX.fna" -dbtype nucl<br />
<br />
* Create a protein database with makeblastdb BLAST+ for each input ncbi_genomes.p*.faa.<br />
> makeblastdb -in "ncbi_genomes.p1.faa" -dbtype prot<br />
> makeblastdb -in "ncbi_genomes.p2.faa" -dbtype prot<br />
> ...<br />
> makeblastdb -in "ncbi_genomes.pX.faa" -dbtype prot<br />
<br />
* Set BLASTDB_PARTS to X in globalconfig.cfg and the name of the database to<br />
<database_name>.p%%d (e.g. ncbi_genomes.p%%d.fna if this was your output BLAST database name)<br />
%%d is the placeholder for the database number identifier, 1..X.<br />
<br />
* Configure ''globalsettings.cfg'' appropriately for the above installation directories.<br />
<br />
* To use the UBLASTX classifier, you must also obtain a licensed copy of usearch and set up the configuration file appropriately.<br />
<br />
== Rank-specific example usage ==<br />
python rita.py --rank PHYLUM --pipeline NB_DCMEGABLAST,DCMEGABLAST_RATIO,NB_RATIO,NB_ML --query fragments.fasta --out results.txt<br />
<br />
Note: Use the --jobid flag with a unique job identifier if running rita in parallel to ensure intermediate temporary files <br />
are not overwritten.<br />
<br />
The above command will classify the fragments in 'fragments.fasta' at the rank of PHYLUM, using a pipeline that starts with the<br />
consensus of Naive Bayes and DCMEGABLAST, then the most confident DCMEGABLAST results, then the most confident Naive Bayes results<br />
and finally with the maximum likelihood NB prediction. Note that fragments are not attempted to be classified at <br />
a given step in the pipeline if they have already been classified at an earlier step (the order matters). See the Pipeline Components<br />
section below.<br />
<br />
== Rank-flexible setup ==<br />
''Prerequisites'':<br />
* Install [http://biopython.org/wiki/Biopython BioPython] (needed for tree manipulation).<br />
* Install [http://www.mothur.org/ MOTHUR] for 16S DNA alignments.<br />
* Install [http://www.microbesonline.org/fasttree/ FastTree] for building 16S trees.<br />
<br />
To configure RITA for rank-flexible classifications follow these steps:<br />
<br />
* Follow the instructions for a rank-specific RITA installation given above.<br />
* Update the MOTHUR and FastTree installation paths in ''globalsettings.cfg''.<br />
* Build a trusted BLAST database of 16S sequences. We recommend using the hand-curated sequences from [http://rdp.cme.msu.edu/ RDP].<br />
** Download and extract the unaligned Bacteria and Archaea sequences from RDP ([http://rdp.cme.msu.edu/download/release10_28_unaligned.fa.gz link]) into a directory called ''RDP'':<br />
> gunzip release10_28_unaligned.fa.gz<br />
** Create the BLAST database:<br />
> makeblastdb -in "release10_28_unaligned.fa" -dbtype nucl<br />
* From the ''rita'' directory, BLAST the complete genomes against the 16S database:<br />
> blastn -query ncbi_genomes.fna -db ../RDP/release10_28_unaligned.fa -out ncbi_genomes_16S.blast.txt -evalue 1e-10 -outfmt 6<br />
* Use the ''get16s.py'' script to extract a single 16S sequence from each genome based on the best BLAST match:<br />
> cd ./scripts<br />
> python get16s.py ../ncbi_genomes_16S.blast.txt ../../FCP/training/sequences ../../FCP/taxonomy.txt<br />
> mv sequences_of_16s.fasta ../<br />
* The above script will produce the file ''sequences_of_16s.fasta'' which must be align. This can be done with MOTHUR using the following command:<br />
mothur > set.dir(input=../rita)<br />
mothur > set.dir(output=../rita)<br />
mothur > align.seqs(candidate=sequences_of_16s.fasta, template=core_set_aligned.imputed.fasta, flip=t)<br />
mothur > quit()<br />
* Place a copy of the [http://www.mothur.org/wiki/Lane_mask 1349 character Lane Mask] in your ''mothur'' directory.<br />
** Note: core_set_aligned.imputed.fasta can be obtained from the MOTHUR [http://www.mothur.org/wiki/Greengenes-formatted_databases here].<br />
* Update the MOTHUR_16S_ALIGNMENT setting in ''globalsettings.cfg'' to point to the file ''sequences_of_16s.align'' which will be in the ''rita'' directory.<br />
<br />
You are now ready to use rank-flexible RITA.<br />
<br />
== Rank-flexible example usage ==<br />
To run rank-flexible RITA, you must first generate a proxy file for the 16S sequences contained in your sample:<br />
python rita.py --buildproxy <sample_16s_fragments.fasta> --out proxy.txt<br />
<br />
Then run rank-flexible RITA in the same way as rank-specific RITA, but specify the proxy file and the rank as FLEXIBLE<br />
python rita.py --proxy proxy.txt --rank FLEXIBLE --pipeline NB_DCMEGABLAST,DCMEGABLAST_RATIO,NB_RATIO,NB_ML --query fragments.fasta --out results.txt<br />
<br />
For more information on how rank-flexible RITA works, please see the publication.<br />
<br />
== Pipeline Components ==<br />
<br />
Included pipeline components (labellers) (specify with --pipeline A,B,C,...)<br />
<br />
<pre><br />
NB_DCMEGABLAST - labels fragments that agree at rank X for NB and DCMEGABLAST<br />
NB_BLASTN - labels fragments that agree at rank X for NB and BLASTN<br />
NB_BLASTX - labels fragments that agree at rank X for NB and BLASTX<br />
NB_UBLASTX - labels fragments that agree at rank X for NB and UBLASTX<br />
<br />
DCMEGABLAST_RATIO - labels fragments that where the best DCMEGABLAST match evalue is at least Y times greater than the next best<br />
BLASTN_RATIO - labels fragments that where the best BLASTN match evalue is at least Y times greater than the next best<br />
BLASTX_RATIO - labels fragments that where the best BLASTX match evalue is at least Y times greater than the next best<br />
UBLASTX_RATIO - labels fragments that where the best BLASTX match evalue is at least Y times greater than the next best<br />
NB_RATIO - labels fragments that where the best NB match likelihood is at least Y times greater than the next best<br />
<br />
NB_ML - labels fragments that with the best NB match (if there are no ties)<br />
NULL_LABELLER - labels all remaining fragments with NONE (this should only ever be the last step in the pipeline).<br />
</pre><br />
<br />
== RITA Parameters ==<br />
<br />
''rita.py'' accepts the following command-line parameters:<br />
<br />
<pre><br />
--help Provides a description of accepted command-line parameters.<br />
<br />
--pipeline Specify the components of the pipeline.<br />
--rank Taxonomic rank to classify at.<br />
<br />
--blastne BLASTN E-value threshold.<br />
--dblastne Discontiguous MegaBLASTN E-value threshold.<br />
--blastxe BLASTX E-value threshold.<br />
--ublastxe UBLASTX (usearch) E-value threshold.<br />
<br />
--blastnratio BLASTN E-value ratio.<br />
--dblastnratio Discontiguous MegaBLASTN E-value ratio.<br />
--blastxratio BLASTX E-value ratio.<br />
--ublastxratio UBLASTX (usearch) E-value ratio.<br />
--nb_ratio NB Likelihood ratio.<br />
<br />
--query FASTA file with query sequences.<br />
--out Output filename.<br />
<br />
--jobid Specify a job number. Default is a random 4 digit identifier.<br />
--buildproxy Build a proxy for rank-flexible classifications with the provided 16S sequences.<br />
--proxy Proxy for rank-flexible classifications created with --buildproxy.<br />
</pre><br />
<br />
== Contact Information ==<br />
<br />
Suggestions, comments, and bug reports can be sent to Rob Beiko (beiko [at] cs.dal.ca). If reporting a bug, please provide as much information as possible and a simplified version of the data set which causes the bug. This will allow us to quickly resolve the issue. <br />
<br />
== Funding ==<br />
<br />
The development and deployment of RITA has been supported by several organizations:<br />
<br />
* [http://www.genomeatlantic.ca Genome Atlantic]<br />
* The Dalhousie Centre for Comparative Genomics and Evolutionary Bioinformatics, and the [http://www.tula.org/ Tula Foundation]<br />
* [http://www.nserc.ca The Natural Sciences and Engineering Research Council of Canada]<br />
* The Dalhousie [http://cs.dal.ca Faculty of Computer Science]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=522Main Page2010-05-14T12:09:34Z<p>Beiko: /* Software */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [[PICA]]: genotype-phenotype data mining software (G).<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* VAREB [to be added]<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMonitor]<br />
* [http://bio.mquter.qut.edu.au/Moa/ MOAMap]<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]<br />
* [http://www.cs.dal.ca/~whidden Chris Whidden]<br />
* Norman MacDonald</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=470Main Page2010-03-19T19:07:03Z<p>Beiko: /* Software */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* VAREB [to be added]<br />
<br />
* [http://kiwi.cs.dal.ca/Software/rSPR rSPR]: software to calculate rooted subtree-prune-and-regraft distances and rooted agreement forests. (PL)<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMonitor]<br />
* [http://bio.mquter.qut.edu.au/Moa/ MOAMap]<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=433Main Page2010-01-27T19:22:28Z<p>Beiko: /* Web Services */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* VAREB [to be added]<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMonitor]<br />
* [http://bio.mquter.qut.edu.au/Moa/ MOAMap]<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/Software/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=389Main Page2009-12-09T16:44:29Z<p>Beiko: /* Datasets */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* VAREB [to be added]<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMonitor]<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/wow/index.php/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008) [to be added]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=388Main Page2009-12-09T16:44:13Z<p>Beiko: /* Software */</p>
<hr />
<div>__NOTOC__<br />
[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of Adrian Sharma, Jeremy Koenig, and Olga Zhaxybayeva).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
== Software ==<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer, A = sequence alignment<br />
<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* [[STAMP]]: a software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. (SMV)<br />
<br />
* [http://bioinformatics.org.au/evolsim/ EvolSimulator]: a simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* [http://bioinformatics.org.au/eeep/ EEEP (Efficient Evaluation of Edit Paths)]: software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* [http://bioinformatics.org.au/woof/ WOOF]: a tool designed to rigourously apply the principle of visual alignment validation. (SA)<br />
<br />
* [http://bioinformatics.org.au/gann/ GANN]: a machine learning method designed with the complexities of transcriptional regulation in mind.<br />
<br />
* VAREB [to be added]<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMonitor]<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Datasets ==<br />
* [http://kiwi.cs.dal.ca/GenGIS/Datasets Datasets used in GenGIS] (Parks et al., Genome Research 2009)<br />
* [http://kiwi.cs.dal.ca/wow/index.php/STAMP_example_datasets STAMP datasets]<br />
* [http://bioinformatics.org.au/lgt144/ Lateral genetic transfer in 144 genomes dataset] (Beiko et al., Proc. Natl. Acad. Sci. 2005)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008)<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=165Main Page2009-11-18T01:31:42Z<p>Beiko: </p>
<hr />
<div>[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of <Rob... who did we get this data from - Adrian?>).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
<-- '''Can someone come up with a logo?'''<br />
<br />
<-- '''Cool molecule icon added by Mr. Parks'''<br />
<br />
http://commons.wikimedia.org/wiki/File:HILLGIALLO_molecola.png<br />
<br />
== Software ==<br />
<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer<br />
<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* STAMP: (SMV)<br />
<br />
* EvolSimulator: A simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* EEEP (Efficient Evaluation of Edit Paths): software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* VAREB?<br />
<br />
* WOOF?<br />
<br />
* GANN?<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMontior]<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Datasets ==<br />
* Links to STAMP datasets?<br />
* Datasets used in GenGIS (Parks et al., Genome Research 2009)<br />
* Simulated data for 'The impact of reticulate evolution on genome phylogeny' (Beiko et al., Systematic Biology 2008)<br />
* LGT in 144 genomes<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=164Main Page2009-11-18T01:29:11Z<p>Beiko: /* Software */</p>
<hr />
<div>[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of <Rob... who did we get this data from - Adrian?>).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
<-- '''Can someone come up with a logo?'''<br />
<br />
<-- '''Cool molecule icon added by Mr. Parks'''<br />
<br />
http://commons.wikimedia.org/wiki/File:HILLGIALLO_molecola.png<br />
<br />
== Software ==<br />
<br />
Key: P = phylogenetics, S = statistics, B = biogeography, V = visualization, G = genomics, M = metagenomics, L = lateral genetic transfer<br />
<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. (PSBVM)<br />
<br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree. (PV)<br />
<br />
* STAMP: (SMV)<br />
<br />
* EvolSimulator: A simulation test bed for hypotheses of genome evolution. (PL)<br />
<br />
* EEEP (Efficient Evaluation of Edit Paths): software to infer putative pathways of lateral genetic transfer by comparing gene trees against a rooted reference tree (PL)<br />
<br />
* VAREB?<br />
<br />
* WOOF?<br />
<br />
* GANN?<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMontior]<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=163Main Page2009-11-18T01:21:01Z<p>Beiko: /* Web Services */</p>
<hr />
<div>[[Image:GenGIS_galapagos.jpg|frame|right|Using GenGIS to investigate the distribution of photorhodopsin genes around the Galapagos Islands (data courtesy of <Rob... who did we get this data from - Adrian?>).]]<br />
<br />
Welcome to the Bioinformatics Software and Resources page.<br />
<br />
<-- '''Can someone come up with a logo?'''<br />
<br />
<-- '''Cool molecule icon added by Mr. Parks'''<br />
<br />
http://commons.wikimedia.org/wiki/File:HILLGIALLO_molecola.png<br />
<br />
== Software ==<br />
<br />
<br />
* [http://kiwi.cs.dal.ca/GenGIS GenGIS]: an application that allows users to combine digital map data with information about biological sequences collected from the environment. GenGIS provides a 3D graphical interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries. <br />
* [http://kiwi.cs.dal.ca/~beiko/software-and-data/radie Radié]: a tool that allows characters to be visualized against the background of a phylogenetic tree. The software includes several different visual and numeric representations of the ‘convexity’ of a given character, in other words the extent to which different character traits form distinct groups within the tree.<br />
* STAMP:<br />
<br />
== Web Services ==<br />
* [http://ratite.cs.dal.ca/SeqMonitor SeqMontior]<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* [http://fester.cs.dal.ca/manuel MANUEL]<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]<br />
* [http://dparks.wikidot.com/ Donovan Parks]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=149Main Page2009-11-17T01:39:16Z<p>Beiko: /* Software */</p>
<hr />
<div>Welcome to the Bioinformatics Software and Resources page.<br />
<br />
<-- '''Can someone come up with a logo?'''<br />
<br />
== Software ==<br />
* GenGIS (to be linked)<br />
* MetaStatsWhatnot (to be added by Donovan)<br />
* Radie (to be added)<br />
<br />
== Web Services ==<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* SeqMonitor (to be added)<br />
* MANUEL (to be added)<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=148Main Page2009-11-17T00:53:23Z<p>Beiko: </p>
<hr />
<div>Welcome to the Bioinformatics Software and Resources page.<br />
<br />
<-- '''Can someone come up with a logo?'''<br />
<br />
== Software ==<br />
* GenGIS (to be linked)<br />
<br />
== Web Services ==<br />
* MOA (to be added)<br />
* Visual MOA (to be added)<br />
* SeqMonitor (to be added)<br />
* MANUEL (to be added)<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=Main_Page&diff=147Main Page2009-11-16T19:45:30Z<p>Beiko: </p>
<hr />
<div>Welcome to the Bioinformatics Software and Resources page.<br />
<br />
<-- '''Can someone come up with a logo?'''<br />
<br />
== Software Titles ==<br />
* GenGIS<br />
<br />
== Web Services ==<br />
* MOA<br />
* Visual MOA<br />
* SeqMonitor<br />
* MANUEL<br />
<br />
== Publications ==<br />
<br />
== Contributors ==<br />
* [http://kiwi.cs.dal.ca/~beiko/ Robert Beiko]<br />
* [http://www.cs.dal.ca/~cblouin Christian Blouin]</div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:ESconfig.tar.gz&diff=143File:ESconfig.tar.gz2008-06-24T00:17:12Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:NBLASTP.tar.gz&diff=142File:NBLASTP.tar.gz2008-06-24T00:17:02Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:BeikoCharleboisDoolittleSupFig1.pdf&diff=141File:BeikoCharleboisDoolittleSupFig1.pdf2008-06-24T00:16:51Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:BeikoCharleboisDoolittleFig8.pdf&diff=140File:BeikoCharleboisDoolittleFig8.pdf2008-06-24T00:16:39Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:BeikoCharleboisDoolittleFig7.pdf&diff=139File:BeikoCharleboisDoolittleFig7.pdf2008-06-24T00:16:28Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:BeikoCharleboisDoolittleFig6.pdf&diff=138File:BeikoCharleboisDoolittleFig6.pdf2008-06-24T00:16:15Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:BeikoCharleboisDoolittleFig5.pdf&diff=137File:BeikoCharleboisDoolittleFig5.pdf2008-06-24T00:16:02Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:BeikoCharleboisDoolittleFig4.pdf&diff=136File:BeikoCharleboisDoolittleFig4.pdf2008-06-24T00:15:51Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:BeikoCharleboisDoolittleFig3.pdf&diff=135File:BeikoCharleboisDoolittleFig3.pdf2008-06-24T00:15:41Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:BeikoCharleboisDoolittleFig2.pdf&diff=134File:BeikoCharleboisDoolittleFig2.pdf2008-06-24T00:15:29Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:BeikoCharleboisDoolittleFig1.pdf&diff=132File:BeikoCharleboisDoolittleFig1.pdf2008-06-24T00:14:56Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:HGTchapter-Figure2.pdf&diff=130File:HGTchapter-Figure2.pdf2008-06-24T00:12:47Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:HGTchapter-Figure1.pdf&diff=129File:HGTchapter-Figure1.pdf2008-06-24T00:12:35Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:HGTchapter-maintext.pdf&diff=128File:HGTchapter-maintext.pdf2008-06-24T00:12:20Z<p>Beiko: </p>
<hr />
<div></div>Beikohttps://beikolab.cs.dal.ca/software/index.php?title=File:FOSS4G_2008_Abstract.pdf&diff=125File:FOSS4G 2008 Abstract.pdf2008-06-24T00:10:17Z<p>Beiko: Abstract for FOSS4G conference (Genomic GIS)</p>
<hr />
<div>Abstract for FOSS4G conference (Genomic GIS)</div>Beiko