Difference between revisions of "GOS Tutorial"

From The GenGIS wiki
Jump to navigationJump to search
(New page: =Introduction= GenGIS is a free and open-source bioinformatics application that allows geographic data to be merged with information about biological sequences collected from the environm...)
 
 
(17 intermediate revisions by one other user not shown)
Line 3: Line 3:
 
GenGIS is a free and open-source bioinformatics application that allows geographic data to be merged with information about biological sequences collected from the environment. It consists of a 3D graphical user interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries.
 
GenGIS is a free and open-source bioinformatics application that allows geographic data to be merged with information about biological sequences collected from the environment. It consists of a 3D graphical user interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries.
  
In this tutorial, we examine samples collected as part of the Global Ocean Sampling expedition (Rusch et al., 2007) in order to investigate the influence of environmental factors on the composition of microbial communities from marine ecosystems. A more thorough analysis of this data is available in Parks ''et al.'', 2009. This tutorial demonstrates how to load data, illustrates how to change visual properties, presents a detail analysis of the geographic structure of these microbial communities, and examines the distribution of taxa within these samples. A video walkthrough of this tutorial is also available from the GenGIS website [[GenGIS Tutorials]]. To follow along with this tutorial download the data:
+
In this tutorial, we examine samples collected as part of the Global Ocean Sampling expedition (Rusch et al., 2007) in order to investigate the influence of environmental factors on the composition of microbial communities from marine ecosystems. A more thorough analysis of this data is available in Parks ''et al.'', 2009. This tutorial demonstrates how to load data, illustrates how to change visual properties, presents a detailed analysis of the geographic structure of these microbial communities, and examines the distribution of taxa within these samples. A video walkthrough of this tutorial is also available on the [[GenGIS Tutorials | GenGIS website]]. To follow along with this tutorial download the data:
  
 
* [[Media:GenGIS_GOS_Data.zip|GOS data set]]
 
* [[Media:GenGIS_GOS_Data.zip|GOS data set]]
Line 9: Line 9:
 
=Loading Data=
 
=Loading Data=
  
The GOS data  consists of a 2D map (GOS_GS002_GS020.tif), projection information for the map (GOS_GS002_GS020.tfw), location data (atlantic-seaboard-sample-sites.csv), sequence data (atlantic-seaboard-sequence-data.csv), and a UniFrac tree indicating the relative similarity of the microbial communities (asb_unifrac_community_tree.gtm). Load this data into GenGIS. For basic information of using the GenGIS interface and loading data please see the [[Katydid Tutorial | ''Banza'' Katydid Tutorial]].
+
The GOS data  consists of a 2D map (GOS_GS002_GS020.tif), projection information for the map (GOS_GS002_GS020.tfw), location data (atlantic-seaboard-sample-sites.csv), sequence data (atlantic-seaboard-sequence-data.csv), and a UniFrac tree indicating the relative similarity of the microbial communities (asb_unifrac_community_tree.gtm). Load this data into GenGIS. For basic information on using the GenGIS interface and loading data please see the [[Katydid Tutorial | ''Banza'' Katydid Tutorial]].
 
 
Data can be loaded through the ''Layer'' menu or the toolbar (Fig. 2). The ordering of menu items and toolbar buttons correspond to the order in which data should typically be loaded.
 
 
[[Image:LoadData.png|thumb|center|531px|Figure 2. Load data into GenGIS through the Layer menu or the toolbar.]]
 
 
 
==Loading and Navigating the Map==
 
Load the map ''hawaii.ascii'' using the ''Add map'' toolbar button. This will bring up a progress dialog indicating that the map is being loaded. Once the map is finished loading it appears in the ''Layer Tree'' panel.
 
 
 
*''Mouse Navigation'': You can move the map by holding down the left mouse button while moving the mouse. To change the pitch of the camera, hold the right mouse button down and move the mouse up and down. Similarly, to rotate the map move the mouse left or right while holding down the right mouse button. The camera can be zoomed using the scroll wheel on your mouse.
 
 
 
*''Navigation Widget'': The navigation widget can also be used to navigate around the map (Fig. 3). The arrows at the top allow the map to be moved and the plus and minus button let one zoom in and out of the map. Clicking on the compass and dragging the mouse around the compass face allows one to rotate the map. User can jump to a specific point in the map by clicking within the overview map.
 
 
 
[[Image:NavigationWidget.png|thumb|center|108px|Figure 3. Navigation widget.]]
 
 
 
*''Predefined Views'': GenGIS also provides two convenient default views. To quickly switch to a perspective view or a top down view use either the ''View→Camera Position'' menu items or the corresponding toolbar buttons (Fig. 4).
 
 
 
[[Image:ViewToolbarItems.png|thumb|center|161px|Figure 4. Predefined views provided by GenGIS.]]
 
 
==Loading a Location Set==
 
A map can have any number of location sets associated with it. Load the location set (sample sites) in ''katydids-sample-sites.csv'' by first selecting the map layer in the ''Layer Tree'' and then clicking the ''Add location set'' toolbar button. Once the location set is finished loading it appears in the ''Layer Tree'' panel. By default, the locations appear as orange circles within the ''Viewport''. Expanding the location set layer shows all locations contained within the set (Fig. 5). Individual locations or an entire location set can be hidden by checking or unchecking the layer. All elements below a given layer can be hidden from view by unchecking the layer.
 
 
 
[[Image:LayerTree.png|thumb|center|277px|Figure 5. Layer tree showing all location sites.]]
 
 
 
==Loading a Tree==
 
Multiple geographic tree models can be associated with a single map. Load the Katydid phylogeny in ''katydids-ML-tree.gtm'' by first selecting the map layer in the ''Layer Tree'' and then clicking the ''Add tree'' toolbar button. By default, the tree will appear as a 3D geophylogeny. By navigating around this geophylogeny one can qualitatively investigate whether or not the geography of the Hawaiian Islands appears to have played an important role in the phylogenetic relationships between Katydid species.
 
  
 
=Changing Visual Properties=
 
=Changing Visual Properties=
  
 
GenGIS gives users considerable control over the visual properties of the data visualizations in the ''Viewport''. This allows different aspects of the data to be emphasized which facilitates the exploration and communication of different hypotheses.
 
GenGIS gives users considerable control over the visual properties of the data visualizations in the ''Viewport''. This allows different aspects of the data to be emphasized which facilitates the exploration and communication of different hypotheses.
 
==Map Properties==
 
To modify the visual properties of the map, right-click on the map layer in the ''Layer Tree'' and select ''Properties'' from the pop-up menu. This will bring up the ''Map Properties Dialog Box'' (Fig. 6). This dialog box consists of three sections or tabs. On the ''General'' tab, the name of the map can be changed and general information about the layer is provided. The ''Metadata'' tab indicates specific properties of the map such as its dimensions and geographical extents. Visual properties of the map can be set in the ''Symbology'' tab. The ''Colour Map'' sub-tab allows users to specify how colours are mapped to elevation. Modify the properties on this page so they reflect those given in Figure 6 and then hit the ''Apply'' button. This will update the map with the new colour map. Under the ''Advanced'' sub-tab, set the ''Vertical Exaggeration'' to 10 and hit ''Apply''. This scales the elevation of each point in the map which can be useful for visualizing differences in elevation. Click ''OK'' to exit the property dialog.
 
 
[[Image:MapProperties.png|thumb|center|427px|Figure 6. The Map Properties Dialog Box provides information about a map and allows visual properties of the map to be modified.]]
 
  
 
==Location Set Properties==
 
==Location Set Properties==
Properties common to all locations can be set through the ''Location Set Properties Dialog Box'' (Fig. 7). To open this dialog box, right-clicking the ''Location Set'' layer in the ''Layer Tree'' and select ''Properties'' from the pop-up menu. The ''General'' and ''Metadata'' tab contain useful information about this layer. On the ''Chart'' tab, pie charts can be configured to indicate different properties of any sequence data associated with the location sites. Here we will change the appearance of the location sites in order to emphasize which major geographic areas (e.g., Hawaii, East Maui, Lanai) each sample site belongs to. Later we will modify the appearance of our geophylogeny such that this colouring allows us to better understand how geography has influenced the evolutionary history of ''Banza'' katydids. To set the colour of locations based on their geographic area first uncheck the ''Uniform colour'' checkbox and then change the ''Field to chart'' to ''Geographic Region''. Now change the colour map to ''Discrete: Qualitative (12 colours, Medium Contrast)'' as shown in Figure 7 and hit ''OK''. Note that any field specified in the ''katydids-sample-sites.csv'' can be used to set the colour, shape, and size of the location set markers. This allows different aspects of the data to be simultaneously visualized.
+
Properties common to all locations can be set through the ''Location Set Properties Dialog Box'' (Fig. 1). To open this dialog box, right-clicking the ''Location Set'' layer in the ''Layer Tree'' and select ''Properties'' from the pop-up menu. Here we will change the appearance of the location sites in order to emphasize the habitat from which each sample was taken. To set the colour of locations based on their habitat first uncheck the ''Uniform colour'' checkbox and then change the ''Field to chart'' to ''Environment Type''. Now change the colour map to ''Discrete: Qualitative (12 colours, Medium Contrast)'' as shown in Figure 1 and hit ''OK''. To further emphasis the habitat of each sample, click on the ''Shape'' tab and set the ''Field to chart'' to ''Environment Type''. The colour and shape of each sample will now reflect its habitat. Clicking on the ''Locations'' tab within the main window (above the ''Layer Tree') brings up a set of legends describing the colour, shape, and size of each location.  
  
[[Image:LocationSetProperties.png|thumb|center|489px|Figure 7. The Location Set Properties Dialog Box provides information about a location set and allows visual properties of all locations within the set to be modified.]]
+
[[Image:GOS_Location_Set_Prop.png|thumb|center|489px|Figure 1. The Location Set Properties Dialog Box allows visual properties of all locations within the set to be modified.]]
  
 
==Tree Properties==
 
==Tree Properties==
To modify the visual properties of the tree, right-click on the tree layer in the ''Layer Tree'' and select ''Properties'' from the pop-up menu. This will bring up the ''Tree Properties Dialog Box'' (Fig. 8). The visual properties of labels for the leaf nodes of a tree can be modified in the ''Labels'' tab. Visual properties of the tree are set in the ''Symbology'' tab. We will discuss many of these properties in the next section, but for now change the ''Line thickness'' to 5, the ''Height'' of the tree to 0.3, the tree ''Style'' to ''Propogate discrete colours, and the ''Internal node radius'' to zero as shown in Figure 8. Click ''Apply'' when done and observe how the colours of branches in the tree now correlate with the location colours (Fig. 9). Colours are propagate up from the leaf nodes of the tree until the children of a node have different colours, at which point the default colour will be used for all branches above this node.
+
To modify the visual properties of the tree, right-click on the tree and select ''Properties'' from the pop-up menu. This will bring up the ''Tree Properties Dialog Box'' (Fig. 2). In the ''Symbology->Tree'' tab set the ''Line thickness'' to 5, the ''Relative height'' of the tree to 0.5, the tree ''Style'' to ''Propogate discrete colours, and the ''Default Colour'' to grey as shown in Figure 2. Click ''OK'' when done. Observe how the colours of branches in the tree now correlate with the location colours. Colours are propagate up from the leaf nodes of the tree until the children of a node have different colours, at which point the default colour will be used for all branches above this node.
 
   
 
   
[[Image:TreeProperties.png|thumb|center|395px|Figure 8. The Tree Properties Dialog Box provides information about a geographic tree model and allows visual properties of the tree to be modified.]]
+
[[Image:GOS_Tree_Prop.png|thumb|center|395px|Figure 2. The Tree Properties Dialog Box allows visual properties of the tree to be modified.]]
  
Your Viewport should now be similar to Figure 9. Assigning colours to locations based on their geographic region emphasizes the role of geography on the evolution of ''Banza'' katydids. As an exercise, try colouring the locations based on Species. This produces a similar tree, but places more emphases on the position of different species in the tree.
+
=Quantitative Analysis of Geographic Structure=
 
 
[[Image:Geophylogeny.png|thumb|center|559px|Figure 9. Colour-coded geophylogeny which emphases the role of geography on the evolution of Banza katydids.]]
 
  
=Quantitative Analysis of Geographic Structure=
+
The 3D geophylogeny suggests that habitat type has a large influence of the relative similarity of these microbial communities. The [[Media:GenGIS_GOS_Tutorial.mp4|video tutorial]], [[Katydid Tutorial | ''Banza'' Katydid tutorial]], and the Parks ''et al.'', 2009 manuscript describe how 2D geophylogenies can be analyzed within GenGIS. The remainder of this section assumes you are familiar with the basics of how these analyses are conducted.
  
The 3D geophylogeny illustrated in Figure 9 provides strong qualitative evidence that geography has had a substantial influence on the evolution of ''Banza'' katydids. GenGIS also allows a quantitative analysis of the role of geography to be performed. This is done by drawing a 2D geophylogeny where the leaf nodes are ordered such that they maximize the goodness-of-fit between the tree and geography [2]. To perform this analysis follow these steps:
+
GenGIS allows the number of crossing which occur for ''all'' linear gradients to be explored. Right-click on the subtree of the geophylogeny you wish to analyze and select ''Perform linear axis analysis on subtree''. This will produce a graph showing the number of crossings which occur for all possible linear gradients (Fig. 3). Clicking on the graph causes the linear layout line to rotate to a given orientation. Running a permutation test causes a red line to be drawn on the plot which indicates the number of crossings at which the specified critical value (''i.e.'', ''p''-value = 0.05) is obtained. That is, linear gradients with orientations resulting in fewer crossings are significant at the selected ''p''-value.  
# Switch to a top view by clicking on the ''Top view'' toolbar button (Fig. 4).
 
# Click on the ''Layout line'' toolbar button (Fig. 10).
 
# Draw a layout line as shown in Figure 11A. The tree will be drawn on the right-hand side of this line as you ‘walk’ from the starting point of the line to the end point of the line. Since we want the tree to appear below the island chain, the line should be drawn from left to right. Don’t worry if you draw it backwards as it can easily be moved.
 
# Right-click on the tree layer in the ''Layer Tree'' and select ''2D cladogram'' from the pop-up menu. This causes a 2D geophylogeny to be drawn as shown in Figure 11B. Any crossings that occur between the two dashed lines in this figure indicate discordance between the phylogenetic tree and geography. The presence of relatively few crossings provides strong evidence that geography played an important role in shaping the relationships expressed by the phylogenetic tree.
 
# Try clicking on different nodes within the geophylogeny and the geographic locations. These elements can be selected (highlighted) in order to explore and emphasize different aspects of the data.
 
# At this point, you may wish to try changing some of the properties in the ''Tree Properties Dialog Box'' to see what effect they have on the 2D geophylogeny.
 
# Click on the root node of the geophylogeny. Notice that the second panel of the ''Statusbar'' indicates that below this node there are eight crossings. To test if this is statistically significant, a random permutation test can be performed (Parks and Beiko, 2009). Right-click on the root node, and select ''Perform significance test on subtree'' from the pop-up menu. A dialog box will appear indicating the test is being performed. The results will be reported in the ''Console'' window. You should get a ''p''-value near 0.001 indicating that the number of observed crossings is significantly smaller than would be expected by chance alone.
 
 
[[Image:LayoutToolbarItems.png|thumb|center|275px|Figure 10. Layout graphical elements or define non-linear geographic axes using the layout toolbar buttons.]]
 
  
 +
[[Image:GOS_Linear_Axes_Plot.png|thumb|center|600px|Figure 3. Graph showing the number of crossings for all possible linear gradients.]]
  
[[Image:Geophylogeny2D_Mashup.png|thumb|center|600px|Figure 11. (A) The layout line specifies a linear geographic axis. (B) A 2D geophylogeny is drawn along this geographic axis. GenGIS optimizes the ordering of leaf nodes in order to minimize the number of crossings which occur between the two dashed lines. The crossings which remain indicate discordance between the phylogenetic tree and geography.]]
+
=Distribution of Sequences=
 +
The distribution of sequences from the sampled microbial communities can be investigated by generating a pie chart for each sample. Pie charts are configured in the ''Charts'' tab of the ''Location Set Properties Dialog Box'' (Fig. 4). To visualize the distribution of common taxa, set the ''Field to chart'' to ''Common_type'', the ''Colour map'' to ''Discrete: Qualitative (12 colours, Medium Contrast)'', and check the ''Show charts'' checkbox as shown in Figure 4. It is also helpful to manually modify the colour map so the ''Other_bacteria'' and ''Other'' categories are the same colour (e.g., black). You can also modify many properties of the pie charts in the ''Symbology'' tab. For this example, it is helpful to scale the size of the pie charts to reflect the number of sequences collected for each sample. Check the ''Set chart size proportional to number of sequences'' checkbox and set the minimum and maximum size to 20 and 40, respectively. Click ''OK'' when you are done.  
  
==Pie Charts==
+
[[Image:GOS_Pie_Chart_Prop.png|thumb|center|458px|Figure 4. Pie chart properties are set in the Location Set Properties Dialog Box.]]
  
 +
You can drag the pie charts in order to lay them out in a pleasing manner as shown in Figure 5. A legend indicating the colour of each taxa is available in the ''Sequences'' tab within the main window.
  
==Making Movies==
+
[[Image:GOS_Pie_Charts.png|thumb|center|600px|Figure 5. Pie charts showing the distribution of key taxa within microbial communities sampled off the Atlantic seaboard.]]
With minimal effort, fly-through videos can be constructed to help illustrate important aspects of the data. See the GenGIS manual for details on how to modify the camera position through the Python console. A collection of useful functions for creating movies is available in the ''movieHelper.py'' file in the scripts directory. Example movies made with GenGIS can be found on our [[GenGIS|website]].
 
  
 
=Contact Information=
 
=Contact Information=

Latest revision as of 16:49, 13 March 2012

Introduction

GenGIS is a free and open-source bioinformatics application that allows geographic data to be merged with information about biological sequences collected from the environment. It consists of a 3D graphical user interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries.

In this tutorial, we examine samples collected as part of the Global Ocean Sampling expedition (Rusch et al., 2007) in order to investigate the influence of environmental factors on the composition of microbial communities from marine ecosystems. A more thorough analysis of this data is available in Parks et al., 2009. This tutorial demonstrates how to load data, illustrates how to change visual properties, presents a detailed analysis of the geographic structure of these microbial communities, and examines the distribution of taxa within these samples. A video walkthrough of this tutorial is also available on the GenGIS website. To follow along with this tutorial download the data:

Loading Data

The GOS data consists of a 2D map (GOS_GS002_GS020.tif), projection information for the map (GOS_GS002_GS020.tfw), location data (atlantic-seaboard-sample-sites.csv), sequence data (atlantic-seaboard-sequence-data.csv), and a UniFrac tree indicating the relative similarity of the microbial communities (asb_unifrac_community_tree.gtm). Load this data into GenGIS. For basic information on using the GenGIS interface and loading data please see the Banza Katydid Tutorial.

Changing Visual Properties

GenGIS gives users considerable control over the visual properties of the data visualizations in the Viewport. This allows different aspects of the data to be emphasized which facilitates the exploration and communication of different hypotheses.

Location Set Properties

Properties common to all locations can be set through the Location Set Properties Dialog Box (Fig. 1). To open this dialog box, right-clicking the Location Set layer in the Layer Tree and select Properties from the pop-up menu. Here we will change the appearance of the location sites in order to emphasize the habitat from which each sample was taken. To set the colour of locations based on their habitat first uncheck the Uniform colour checkbox and then change the Field to chart to Environment Type. Now change the colour map to Discrete: Qualitative (12 colours, Medium Contrast) as shown in Figure 1 and hit OK. To further emphasis the habitat of each sample, click on the Shape tab and set the Field to chart to Environment Type. The colour and shape of each sample will now reflect its habitat. Clicking on the Locations tab within the main window (above the Layer Tree') brings up a set of legends describing the colour, shape, and size of each location.

Figure 1. The Location Set Properties Dialog Box allows visual properties of all locations within the set to be modified.

Tree Properties

To modify the visual properties of the tree, right-click on the tree and select Properties from the pop-up menu. This will bring up the Tree Properties Dialog Box (Fig. 2). In the Symbology->Tree tab set the Line thickness to 5, the Relative height of the tree to 0.5, the tree Style to Propogate discrete colours, and the Default Colour to grey as shown in Figure 2. Click OK when done. Observe how the colours of branches in the tree now correlate with the location colours. Colours are propagate up from the leaf nodes of the tree until the children of a node have different colours, at which point the default colour will be used for all branches above this node.

Figure 2. The Tree Properties Dialog Box allows visual properties of the tree to be modified.

Quantitative Analysis of Geographic Structure

The 3D geophylogeny suggests that habitat type has a large influence of the relative similarity of these microbial communities. The video tutorial, Banza Katydid tutorial, and the Parks et al., 2009 manuscript describe how 2D geophylogenies can be analyzed within GenGIS. The remainder of this section assumes you are familiar with the basics of how these analyses are conducted.

GenGIS allows the number of crossing which occur for all linear gradients to be explored. Right-click on the subtree of the geophylogeny you wish to analyze and select Perform linear axis analysis on subtree. This will produce a graph showing the number of crossings which occur for all possible linear gradients (Fig. 3). Clicking on the graph causes the linear layout line to rotate to a given orientation. Running a permutation test causes a red line to be drawn on the plot which indicates the number of crossings at which the specified critical value (i.e., p-value = 0.05) is obtained. That is, linear gradients with orientations resulting in fewer crossings are significant at the selected p-value.

Figure 3. Graph showing the number of crossings for all possible linear gradients.

Distribution of Sequences

The distribution of sequences from the sampled microbial communities can be investigated by generating a pie chart for each sample. Pie charts are configured in the Charts tab of the Location Set Properties Dialog Box (Fig. 4). To visualize the distribution of common taxa, set the Field to chart to Common_type, the Colour map to Discrete: Qualitative (12 colours, Medium Contrast), and check the Show charts checkbox as shown in Figure 4. It is also helpful to manually modify the colour map so the Other_bacteria and Other categories are the same colour (e.g., black). You can also modify many properties of the pie charts in the Symbology tab. For this example, it is helpful to scale the size of the pie charts to reflect the number of sequences collected for each sample. Check the Set chart size proportional to number of sequences checkbox and set the minimum and maximum size to 20 and 40, respectively. Click OK when you are done.

Figure 4. Pie chart properties are set in the Location Set Properties Dialog Box.

You can drag the pie charts in order to lay them out in a pleasing manner as shown in Figure 5. A legend indicating the colour of each taxa is available in the Sequences tab within the main window.

Figure 5. Pie charts showing the distribution of key taxa within microbial communities sampled off the Atlantic seaboard.

Contact Information

We encourage you to send us suggestions for new features. GenGIS is in active development and we are interested in discussing all potential applications of this software. Suggestions, comments, and bug reports can be sent to Rob Beiko (beiko@cs.dal.ca). If reporting a bug, please provide as much information as possible and, if possible, a simplified version of the data set which causes the bug. This will allow us to quickly resolve the issue.

References

Rusch DB, Halpern AL, Sutton G, et al. 2007. The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol. 5:e77. PubMed

Parks DH, Porter M, Churcher S, Wang S, Blouin C, Whalley J, Brooks S and Beiko RG. 2009. GenGIS: A geospatial information system for genomic data. Genome Research, 19: 1896-1904. (Abstract)