Datasets

From The GenGIS wiki
Revision as of 18:27, 26 August 2009 by Dparks (talk | contribs)
Jump to navigationJump to search

The following datasets were analyzed in GenGIS: A geospatial information system for genomic data (Parks et al., Genome Res., 2009):

  • GOS dataset: taxonomic diversity of Atlanatic seaboard sites from the Global Ocean Sampling expedition.
  • HIV-1 dataset: geographic distribution of non-recombinant HIV-1 subtypes in Africa
  • ISEA mtDNA dataset: phylogenetic distribution of mtDNA haplogroup E, using hypervariable segment I (HVS-I) sequences from Southeast Asia.


The following datasets were analyzed in Tracking the evolution and geographic spread of Influenza A (Parks et al., PLoS Currents: Influenza, submitted Aug. 25, 2009):

  • Location file containing the latitude and longitude of each geographic location, each of which could contain one or more isolates.
  • Sequence file with one row per isolate, containing metadata information including collection date and polymorphic amino acid sites.
  • Phylogenetic tree of 203 complete S-OIV sequences as determined by RAxML.
  • Polymorphic subtree of 136 complete S-OIV sequences containing polymorphisms at neuraminidase sites 106 and 248.
  • Panmixia subtree of 16 complete S-OIV sequences demonstrating the rapid, global spread of S-OIV.
  • Stream video showing the temporal and geographic spread of neuraminidase site N248D.
    • Python script used to create the above video (requires the location and sequence files above along with a world map).
  • Stream video showing the geophylogeny of a subtree exhibiting polymorphism at neuraminidase site 248.
    • Python script used to create the above video (requires the location sequence, and polymorphic subtree files given above plus the world map).