Difference between revisions of "PICA"

From Bioinformatics Software
Jump to navigationJump to search
Line 2: Line 2:
  
  
== Command-line ==
+
== Command-line Interface==
 
Use the option -h for help with any command.
 
Use the option -h for help with any command.
  
 
* train: Train a given data mining algorithm and output model to file.
 
* train: Train a given data mining algorithm and output model to file.
 
Example usage:
 
Example usage:
train.py --algorithm cpar.CPARTrainer --samples examples/genotype_prokaryote.profile --classes examples/phenotype.profile --targetclass THERM --output output.rules  
+
  train.py --algorithm cpar.CPARTrainer  
 +
          --samples examples/genotype_prokaryote.profile  
 +
          --classes examples/phenotype.profile  
 +
          --targetclass THERM  
 +
          --output output.rules  
  
 
* test
 
* test
 
Test a model with a classification algorithm and given model.
 
Test a model with a classification algorithm and given model.
 
Example usage:
 
Example usage:
test.py --algorithm cpar.CPARClassifier --samples examples/genotype_prokaryote.profile --classes examples/phenotype.profile --targetclass THERM --model_filename output.rules --model_accuracy mi
+
  test.py --algorithm cpar.CPARClassifier  
 +
          --samples examples/genotype_prokaryote.profile  
 +
          --classes examples/phenotype.profile  
 +
          --targetclass THERM  
 +
          --model_filename output.rules  
 +
          --model_accuracy mi
  
 
* crossvalidate
 
* crossvalidate
 
Train and test various replicates with the given training and testing algorithms.
 
Train and test various replicates with the given training and testing algorithms.
 
Example usage:
 
Example usage:
crossvalidate.py --training_algorithm cpar.CPARTrainer --classification_algorithm cpar.CPARClassifier --accuracy_measure mi --replicates 10 --folds 5 --samples examples/genotype_prokaryote.profile --classes examples/phenotype.profile --targetclass THERM --output_filename resutls.txt --metadata examples/taxonomic_confounders_propagated.txt
+
  crossvalidate.py --training_algorithm cpar.CPARTrainer  
 +
                  --classification_algorithm cpar.CPARClassifier  
 +
                  --accuracy_measure mi  
 +
                  --replicates 10  
 +
                  --folds 5  
 +
                  --samples examples/genotype_prokaryote.profile  
 +
                  --classes examples/phenotype.profile  
 +
                  --targetclass THERM  
 +
                  --output_filename results.txt  
 +
                  --metadata examples/taxonomic_confounders_propagated.txt
  
 +
== Python API ==
 +
Shortened example of a paired test between mutual information and conditionally weighted mutual information using the CWMIRankFeatureSelector class and the LIBSVM interface for testing each set of features.
  
 +
See util/batch_validate.py for more details on an example of setting up a programmatic comparison.
  
== Python API ==
+
  test_configurations = []
 +
  trainer = libSVMTrainer(kernel_type="LINEAR",C=5)
 +
  classifier = libSVMClassifier()
 +
 
 +
  for score in ("mi","cwmi"):
 +
      feature_selector = CWMIRankFeatureSelector(confounders_filename=confounders_filename,
 +
                                                scores=(score,),
 +
                                                features_per_class=10,
 +
                                                confounder="order")
 +
     
 +
      tc = TestConfiguration(score,feature_selector,trainer,classifier)
 +
      test_configurations.append(tc)
 +
 
 +
  crossvalidator = CrossValidation(samples=samples,
 +
                                  parameters=None,
 +
                                  replicates=10,
 +
                                  folds=5,
 +
                                  test_configurations=test_configurations,
 +
                                  root_output=root_output)
 +
  crossvalidator.crossvalidate()

Revision as of 20:15, 12 April 2010

Phenotype Investigation with Classification Algorithms (PICA) is a Python framework for testing genotype-phenotype association algorithms.


Command-line Interface

Use the option -h for help with any command.

  • train: Train a given data mining algorithm and output model to file.

Example usage:

 train.py --algorithm cpar.CPARTrainer 
          --samples examples/genotype_prokaryote.profile 
          --classes examples/phenotype.profile 
          --targetclass THERM 
          --output output.rules 
  • test

Test a model with a classification algorithm and given model. Example usage:

 test.py --algorithm cpar.CPARClassifier 
         --samples examples/genotype_prokaryote.profile 
         --classes examples/phenotype.profile 
         --targetclass THERM 
         --model_filename output.rules 
         --model_accuracy mi
  • crossvalidate

Train and test various replicates with the given training and testing algorithms. Example usage:

 crossvalidate.py --training_algorithm cpar.CPARTrainer 
                  --classification_algorithm cpar.CPARClassifier 
                  --accuracy_measure mi 
                  --replicates 10 
                  --folds 5 
                  --samples examples/genotype_prokaryote.profile 
                  --classes examples/phenotype.profile 
                  --targetclass THERM 
                  --output_filename results.txt 
                  --metadata examples/taxonomic_confounders_propagated.txt

Python API

Shortened example of a paired test between mutual information and conditionally weighted mutual information using the CWMIRankFeatureSelector class and the LIBSVM interface for testing each set of features.

See util/batch_validate.py for more details on an example of setting up a programmatic comparison.

 test_configurations = []
 trainer = libSVMTrainer(kernel_type="LINEAR",C=5)
 classifier = libSVMClassifier()
 
 for score in ("mi","cwmi"):
     feature_selector = CWMIRankFeatureSelector(confounders_filename=confounders_filename,
                                                scores=(score,),
                                                features_per_class=10,
                                                confounder="order")
     
     tc = TestConfiguration(score,feature_selector,trainer,classifier)
     test_configurations.append(tc)
 
 crossvalidator = CrossValidation(samples=samples,
                                  parameters=None,
                                  replicates=10,
                                  folds=5,
                                  test_configurations=test_configurations,
                                  root_output=root_output)
 crossvalidator.crossvalidate()