Phenotype Investigation with Classification Algorithms (PICA) is a Python framework for testing genotype-phenotype association algorithms.
PICA was developed by Norman MacDonald (firstname.lastname@example.org) and is released under the Creative Commons Share-Alike Attribution 3.0 License.
- PICA source code and example datasets
Unzip the folder to the desired location and run from the command line. To access the API, make sure the pica folder is on your Python path.
Use the option -h for help with any command.
- train: Train a given data mining algorithm and output model to file.
train.py --algorithm cpar.CPARTrainer --samples examples/genotype_prokaryote.profile --classes examples/phenotype.profile --targetclass THERM --output output.rules
- test: Test a model with a classification algorithm and given model.
test.py --algorithm cpar.CPARClassifier --samples examples/genotype_prokaryote.profile --classes examples/phenotype.profile --targetclass THERM --model_filename output.rules --model_accuracy mi
- crossvalidate: Replicated cross-validation with the given training and testing algorithms.
crossvalidate.py --training_algorithm cpar.CPARTrainer --classification_algorithm cpar.CPARClassifier --accuracy_measure laplace --replicates 10 --folds 5 --samples examples/genotype_prokaryote.profile --classes examples/phenotype.profile --targetclass THERM --output_filename results.txt --metadata examples/taxonomic_confounders_propagated.txt
Another example of crossvalidate is to use the CPAR2SVMTrainer, which trains with CPAR, then breaks down the rules found into individual features and subsequently trains and tests with LIBSVM. This could also be performed more generally (with feature selection as a separate step) using the Python API below.
crossvalidate.py --training_algorithm cpar.CPAR2SVMTrainer --classification_algorithm libsvm.libSVMClassifier --replicates 10 --folds 5 --samples examples/genotype_prokaryote.profile --classes examples/phenotype.profile --targetclass THERM --output_filename results.txt --metadata examples/taxonomic_confounders_propagated.txt
Shortened example of a paired test between mutual information and conditionally weighted mutual information using the CWMIRankFeatureSelector class and the LIBSVM interface for testing each set of features.
See util/batch_validate.py for more details on an example of setting up a programmatic comparison.
# Create an array to hold paired comparison configurations. test_configurations =  # Create the basic LIBSVM trainer and classifier # that we will use to validate our feature selection. trainer = libSVMTrainer() classifier = libSVMClassifier() # Create two test configurations one for feature selection with mutual # information, the other with conditionally weighted mutual information. for score in ("mi","cwmi"): feature_selector = CWMIRankFeatureSelector(confounders_filename=confounders_filename, scores=(score,), features_per_class=10, confounder="order") tc = TestConfiguration(score,feature_selector,trainer,classifier) test_configurations.append(tc) # Set up the crossvalidation class for 10 replicates of 5-fold cross-validation # and output the model from each replicate/fold to file name patterns starting # with 'root_output'. crossvalidator = CrossValidation(samples=samples, parameters=None, replicates=10, folds=5, test_configurations=test_configurations, root_output=root_output) # After cross-validation, the crossvalidator object holds the results of # the paired comparisons that can be accessed through its methods. crossvalidator.crossvalidate()