Express Beta Diversity (EBD) by Donovan Parks and Rob Beiko ------------------------------------------------------------------------------- EBD is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. EBD is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with EBD. If not, see . Installation: ------------------------------------------------------------------------------- EBD is a command-line program written in C++. To compile EBD on OSX or Linux simply type 'make' from the EBD directory. The resulting executable will be in the bin directory. A precompiled executables for Windows ios provided in the bin directory. Please note that even under Windows, EBD must be run from the command-line (i.e., the DOS prompt). Program usage: ------------------------------------------------------------------------------- Usage: EBD [OPTIONS] Calculates taxon- and phylogenetic-basec beta diversity measures. Options: -h, --help Produce help message. -l, --list-calc List all supported calculators. -u, --unit-tests Execute unit tests. -t, --tree-file Tree in Newick format (if phylogenetic beta-diversity is desired). -s, --seq-count-file Sequence count file. -d, --diss-file File to write dissimilarity matrix to. -c, --calculator Desired calculator (e.g., Bray-Curtis, Canberra). -w, --weighted Indicated if sequence abundance data should be used. -m, --mrca Apply 'MRCA weightings' to each branch (experimental). -r, --strict-mrca Restrict calculator to MRCA subtree. -y, --count Use count data as opposed to relative proportions. -x, --max-data-vecs Maximum number of profiles (data vectors) to have in memory at once (default = 1000). -v, --verbose Provide additional information on program execution. Examples of Use: ./ExpressBetaDiversity -t input.tre -s seq.txt -d output.txt -c Bray-Curtis -w ./ExpressBetaDiversity --unit-tests Input file formats: ------------------------------------------------------------------------------- EBD uses Newick formatted trees as input. Information on this tree format can be found at: http://evolution.genetics.washington.edu/phylip/newicktree.html. Here is a simple Newick tree with three leaf nodes labelled A, B, and C: (A:1,(B:1,C:1):1); Taxon-based beta-diversity is calculated if an input tree is not specified. Sequence count information must be specified as a tab-delimited table where each row is a sample and each column is the name of a leaf node in the provided tree. Data must be provided for all leaf nodes in the tree. Consider the following example: A B C Sample1 1 2 3 Sample2 10 1 0 Sample3 0 0 1 The first row begins indicates each leaf node in the tree seperated by a tab. Please note that this line MUST start with a tab. The number of sequences associated with each leaf node is then indicated for each sample on a seperate row. In this example, the first sample is labelled 'Sample1' and contains 1 instance of sequence/OTU A, 2 instances of B, and 3 instances of C. Sample3 contains only instances of C, but note that zeros must be specified for the other sequence/OTU types. Example input files are avaliable in the unit-tests directory. Output file format: ------------------------------------------------------------------------------- The resulting dissimilarity between samples is written as a tab-delimited, lower-triangular dissimilarity matrix with the first line indicating the number of samples. Consider the following output: 3 A B 1 C 2 3 The first line indicates that there are 3 samples. The dissimilarity between samples A and B is 1, A and C is 2, and B and C is 3. Citing EBD: ------------------------------------------------------------------------------- If you use EBD in your research, please cite: Parks, D.H. and Beiko, R.G. Phylogenetic resemblance methods provides robust and complementary insights into microbial communities. 2011. (submitted to Nature Methods, December, 2011). Contact Information: ------------------------------------------------------------------------------- Donovan Parks parks@cs.dal.ca Robert Beiko beiko@cs.dal.ca Program website: http://kiwi.cs.dal.ca/Software/EBD