Difference between revisions of "PICA"

From Bioinformatics Software
Jump to navigationJump to search
Line 1: Line 1:
 
Phenotype Investigation with Classification Algorithms (PICA) is a Python framework for testing genotype-phenotype association algorithms.
 
Phenotype Investigation with Classification Algorithms (PICA) is a Python framework for testing genotype-phenotype association algorithms.
  
PICA was developed by Norman MacDonald (norman@cs.dal.ca) and is released under the  [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Share-Alike Attribution 3.0 License]
 
  
 +
PICA was developed by Norman MacDonald (norman@cs.dal.ca) and is released under the  [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Share-Alike Attribution 3.0 License].
 +
 +
Example genotype dataset was compiled from the STRING database version 8.0.  Example phenotype data is from the DOE JGI IMG and NCBI lproks.  Example taxonomic data is from the NCBI Taxonomy database.
 +
 +
== Downloads ==
 +
PICA source code and datasets
 +
 +
== Setup ==
 +
Numpy:
 +
 +
Unzip the folder to the desired location and run from the command line.  To access the API, make sure the pica folder is on your Python path.
  
 
== Command-line Interface==
 
== Command-line Interface==

Revision as of 17:30, 12 April 2010

Phenotype Investigation with Classification Algorithms (PICA) is a Python framework for testing genotype-phenotype association algorithms.


PICA was developed by Norman MacDonald (norman@cs.dal.ca) and is released under the Creative Commons Share-Alike Attribution 3.0 License.

Example genotype dataset was compiled from the STRING database version 8.0. Example phenotype data is from the DOE JGI IMG and NCBI lproks. Example taxonomic data is from the NCBI Taxonomy database.

Downloads

PICA source code and datasets

Setup

Numpy:

Unzip the folder to the desired location and run from the command line. To access the API, make sure the pica folder is on your Python path.

Command-line Interface

Use the option -h for help with any command.

  • train: Train a given data mining algorithm and output model to file.

Example usage:

 train.py --algorithm cpar.CPARTrainer 
          --samples examples/genotype_prokaryote.profile 
          --classes examples/phenotype.profile 
          --targetclass THERM 
          --output output.rules 
  • test: Test a model with a classification algorithm and given model.

Example usage:

 test.py --algorithm cpar.CPARClassifier 
         --samples examples/genotype_prokaryote.profile 
         --classes examples/phenotype.profile 
         --targetclass THERM 
         --model_filename output.rules 
         --model_accuracy mi
  • crossvalidate: Replicated cross-validation with the given training and testing algorithms.

Example usage:

 crossvalidate.py --training_algorithm cpar.CPARTrainer 
                  --classification_algorithm cpar.CPARClassifier 
                  --accuracy_measure mi 
                  --replicates 10 
                  --folds 5 
                  --samples examples/genotype_prokaryote.profile 
                  --classes examples/phenotype.profile 
                  --targetclass THERM 
                  --output_filename results.txt 
                  --metadata examples/taxonomic_confounders_propagated.txt

Python API

Shortened example of a paired test between mutual information and conditionally weighted mutual information using the CWMIRankFeatureSelector class and the LIBSVM interface for testing each set of features.

See util/batch_validate.py for more details on an example of setting up a programmatic comparison.

 test_configurations = []
 trainer = libSVMTrainer(kernel_type="LINEAR",C=5)
 classifier = libSVMClassifier()
 
 for score in ("mi","cwmi"):
     feature_selector = CWMIRankFeatureSelector(confounders_filename=confounders_filename,
                                                scores=(score,),
                                                features_per_class=10,
                                                confounder="order")
     
     tc = TestConfiguration(score,feature_selector,trainer,classifier)
     test_configurations.append(tc)
 
 crossvalidator = CrossValidation(samples=samples,
                                  parameters=None,
                                  replicates=10,
                                  folds=5,
                                  test_configurations=test_configurations,
                                  root_output=root_output)
 crossvalidator.crossvalidate()