Difference between revisions of "Cartogram and GCPD Tutorial"

From The GenGIS wiki
Jump to: navigation, search
(Created page with "=Introduction= GenGIS is a free and open-source bioinformatics application that allows geographic data to be merged with information about biological sequences collected from...")
 
 
(28 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
=Introduction=
 
=Introduction=
  
GenGIS is a free and open-source bioinformatics application that allows geographic data to be merged with information about biological sequences collected from the environment. It consists of a 3D graphical user interface in which the user can navigate and explore the data, as well as a Python interface that allows easy scripting of statistical analyses using the Rpy libraries.
+
California is a global hotspot for salamander diversity owing to its climate and geography, including the arboreal salamander ''Aneides lugubris'', which lives in western California. A 2015 study by Reilly ''et al.'' used BEAST Drummond et al., 2012) to construct a phylogenetic tree relating mitochondrial DNA from 35 salamanders at 26 locations in California, combined with 43 additional samples associated with 27 locations from GenBank. Using phylogeography, the authors classified individuals into Northern, SF Bay/Sierra Nevada, Santa Cruz, Pinnacles, Central Coast and Southern clades.
  
The ''Aneides lugubris'' of Western California, on of the top hotspots on the planet for species diversity where 33 of 44 salamander species are endemic to the state.  
+
In this short tutorial, we illustrate the application of point-based cartograms in GenGIS to the phylogeographic visualization of this dataset.
  
=Loading Data=
+
=Loading Data and Changing Visual Properties=
  
The Salamander data consists of a 2D map (Cali.tiff), location data (Aneides_Locations_Filtered.csv) and a phylogenetic tree (Aneides_Filtered.tre). For basic information on using the GenGIS interface and loading data please see the [[Katydid Tutorial | ''Banza'' Katydid Tutorial]].
+
The Salamander data consists of a map (Cali.tiff), location data (Aneides_Locations_Filtered.csv) and a phylogenetic tree (Aneides_Filtered.tre). The zip archive containing these files can be found *[[Media:Aneides.zip|here]].
[[Image:Aneides_Unstransformed.png|thumb|center|600px]]
+
  
=Changing Visual Properties=
+
For basic information on using the GenGIS interface and loading data please see the [[Katydid Tutorial | ''Banza'' Katydid Tutorial]].
  
The location set properties were selected to differentiate the data set my ''mtDNA clades''. The properties of the phylogenetic display were selected to ensure maximum visibility of locations on the map. For further details on setting visual properties see the [[Katydid Tutorial | ''Banza'' Katydid]] or [[GOS Tutorial | GOS]] tutorials.  
+
To recreate the image below, do the following steps in order. Note that the aesthetic changes (colours, vertical exaggeration) are optional, and you will get the same cartograms no matter what.
 +
 
 +
(1) Load the "Cali.tif" map
 +
 
 +
(2) Change the map colours according to the following scheme:
 +
 
 +
[[Image:Slide1.PNG|600px]]
 +
 
 +
(3) Change the vertical exaggeration to 5.
 +
 
 +
(4) Load the location file, and edit the colours as follows:
 +
 
 +
[[Image:Slide2.PNG|600px]]
 +
 
 +
This is the "Continuous: Scientific" colour scale, with the Northern clade set to yellow to increase the contrast.
 +
 
 +
(5) Load the tree, and choose "propagate discrete colours", set the default colour to black, and change the line thickness to 5 in the Properties menu
 +
 
 +
You should end up with an image that looks like this:
 +
[[Image:Slide3.PNG|center|600px]]
  
 
=Creating Cartograms=
 
=Creating Cartograms=
  
In order to create a cartogram from a location set loaded in GenGIS you require as a basis a map and location set as a minimum. Sequence files can also be used as metadata to transform the map. By default the count of samples at each location is used to transform the map, which defaults to one per location if no Sequence file is loaded. To create a cartogram right click the map loaded map layer and select the ''Cartogram'' tab from the properties menu. To fine tune the cartogram two operations are available ''Location Radius'' and ''Variable Multiplier''. ''Location Radius'' adjusts the size of the grid cells the map is divided into when the cartogram is created. ''Variable Multiplier'' amplifies the value of populated grid cells by a X times multiple of the stored value. The ''Resize'' density option can also be used increase the run time of cartogram creation by decreasing the resolution by the specified percentage during creation time.
+
In this section we will create cartograms based on the location density and phylogenetic information in our data.
 +
 
 +
==Location density==
 +
 
 +
By default the count of samples at each location is used to transform the map, which defaults to one per location if no Sequence file is loaded. To create a cartogram right click the Map in the menu under the "Layers" tab and select the ''Cartogram'' tab from the Properties menu.  
 +
 
 +
Let's go with 10 and 10 for location radius and variable multiplier now. Let's also choose "Resize Density Map" at 50% to increase the speed of the procedure. The percentage controls the decrease when the map is resized, with 60% as the maximum value (in practice, values over 60% give cartograms that are unusable). In general, the larger the values of location radius and variable multiplier, the more pronounced the effect of resizing will be, and the points will track the map less well.
 +
 
 +
[[Image:Slide4.PNG|600px]]
 +
 
 +
Click "Go!" to create the cartogram. Note that "Apply" has no effect in this menu.
 +
 
 +
If we hide the tree, then the cartogram should look like this:
 +
 
 +
[[Image:Slide5.PNG|center|600px]]
 +
 
 +
To fine tune the cartogram two parameters can be adjusted: ''Location Radius'' and ''Variable Multiplier''. The ''Location Radius'' adjusts the size of the grid cells the map is divided into when the cartogram is created, while the ''Variable Multiplier'' amplifies the value of populated grid cells by a X times multiple of the stored value.  
 +
 
 +
You can click "Restore Map" in the cartogram tab to reset the map to its original state. Now try setting the location radius to 100 and re-running.
 +
 
 +
[[Image:Slide7.PNG|center|600px]]
 +
 
 +
==Phylogenetic transformations using the GCPD==
 +
 
 +
GenGIS can use data besides location count to adjust the map. Any quantitative value in the location file can be used for this purpose, but we have developed the Geographically Coupled Phylogenetic Distance (GCPD) to adjust the distortion effects based on phylogenetic diversity. Briefly, the GCPD for a location is computed by considering its diversity relative to the closest other locations in the map.
 +
 
 +
To compute the GCPD, go to the Plugins menu and select the GCPD option.
 +
 
 +
[[Image:GCPDSlide1.PNG|center|600px]]
 +
 
 +
In this window, you can choose the method for aggregating diversity across pairs of sites, the location and tree layer (which are fixed in our case, since they have only one of each), and the choice whether to use phylogenetic diversity (PD) or 1 - PD to define the distance. Choosing PD will tend to assign large values to sites whose neighbours are phylogenetically distinct, whereas 1 - PD will do the opposite.
 +
 
 +
Click "Calculate" to compute the GCPD for each point. Once the values have been calculated, select "Add to GenGIS" to add them as an additional location attribute. You can use the Location table viewer plugin to verify that it has been added:
 +
 
 +
[[Image:GCPDSlide2.PNG|center|600px]]
 +
 
 +
Now we're ready to generate the cartogram as before. This time, select "PhylogeneticDistance" from the cartogram menu, and hit "Go!". The following example was generated with a location radius of 5 and a variable multiplier of 30:
 +
 
 +
[[Image:GCPDSlide3.PNG|center|600px]]
  
[[Image:Aneides_l15_m15.png|thumb|center|600px|Figure 1. A cartogram created using ''Aneides lugubris'' and a Location Radius value of 15 and a Variable Multiplier of 5.]]
+
The arrow shows an area of high phylogenetic diversity - three different clades in one small region of the map - that has been disproportionately been expanded because of the GCPD.
  
 
=Contact Information=
 
=Contact Information=

Latest revision as of 20:44, 19 May 2017

Contents

Introduction

California is a global hotspot for salamander diversity owing to its climate and geography, including the arboreal salamander Aneides lugubris, which lives in western California. A 2015 study by Reilly et al. used BEAST Drummond et al., 2012) to construct a phylogenetic tree relating mitochondrial DNA from 35 salamanders at 26 locations in California, combined with 43 additional samples associated with 27 locations from GenBank. Using phylogeography, the authors classified individuals into Northern, SF Bay/Sierra Nevada, Santa Cruz, Pinnacles, Central Coast and Southern clades.

In this short tutorial, we illustrate the application of point-based cartograms in GenGIS to the phylogeographic visualization of this dataset.

Loading Data and Changing Visual Properties

The Salamander data consists of a map (Cali.tiff), location data (Aneides_Locations_Filtered.csv) and a phylogenetic tree (Aneides_Filtered.tre). The zip archive containing these files can be found *here.

For basic information on using the GenGIS interface and loading data please see the Banza Katydid Tutorial.

To recreate the image below, do the following steps in order. Note that the aesthetic changes (colours, vertical exaggeration) are optional, and you will get the same cartograms no matter what.

(1) Load the "Cali.tif" map

(2) Change the map colours according to the following scheme:

Slide1.PNG

(3) Change the vertical exaggeration to 5.

(4) Load the location file, and edit the colours as follows:

Slide2.PNG

This is the "Continuous: Scientific" colour scale, with the Northern clade set to yellow to increase the contrast.

(5) Load the tree, and choose "propagate discrete colours", set the default colour to black, and change the line thickness to 5 in the Properties menu

You should end up with an image that looks like this:

Slide3.PNG

Creating Cartograms

In this section we will create cartograms based on the location density and phylogenetic information in our data.

Location density

By default the count of samples at each location is used to transform the map, which defaults to one per location if no Sequence file is loaded. To create a cartogram right click the Map in the menu under the "Layers" tab and select the Cartogram tab from the Properties menu.

Let's go with 10 and 10 for location radius and variable multiplier now. Let's also choose "Resize Density Map" at 50% to increase the speed of the procedure. The percentage controls the decrease when the map is resized, with 60% as the maximum value (in practice, values over 60% give cartograms that are unusable). In general, the larger the values of location radius and variable multiplier, the more pronounced the effect of resizing will be, and the points will track the map less well.

Slide4.PNG

Click "Go!" to create the cartogram. Note that "Apply" has no effect in this menu.

If we hide the tree, then the cartogram should look like this:

Slide5.PNG

To fine tune the cartogram two parameters can be adjusted: Location Radius and Variable Multiplier. The Location Radius adjusts the size of the grid cells the map is divided into when the cartogram is created, while the Variable Multiplier amplifies the value of populated grid cells by a X times multiple of the stored value.

You can click "Restore Map" in the cartogram tab to reset the map to its original state. Now try setting the location radius to 100 and re-running.

Slide7.PNG

Phylogenetic transformations using the GCPD

GenGIS can use data besides location count to adjust the map. Any quantitative value in the location file can be used for this purpose, but we have developed the Geographically Coupled Phylogenetic Distance (GCPD) to adjust the distortion effects based on phylogenetic diversity. Briefly, the GCPD for a location is computed by considering its diversity relative to the closest other locations in the map.

To compute the GCPD, go to the Plugins menu and select the GCPD option.

GCPDSlide1.PNG

In this window, you can choose the method for aggregating diversity across pairs of sites, the location and tree layer (which are fixed in our case, since they have only one of each), and the choice whether to use phylogenetic diversity (PD) or 1 - PD to define the distance. Choosing PD will tend to assign large values to sites whose neighbours are phylogenetically distinct, whereas 1 - PD will do the opposite.

Click "Calculate" to compute the GCPD for each point. Once the values have been calculated, select "Add to GenGIS" to add them as an additional location attribute. You can use the Location table viewer plugin to verify that it has been added:

GCPDSlide2.PNG

Now we're ready to generate the cartogram as before. This time, select "PhylogeneticDistance" from the cartogram menu, and hit "Go!". The following example was generated with a location radius of 5 and a variable multiplier of 30:

GCPDSlide3.PNG

The arrow shows an area of high phylogenetic diversity - three different clades in one small region of the map - that has been disproportionately been expanded because of the GCPD.

Contact Information

We encourage you to send us suggestions for new features. GenGIS is in active development and we are interested in discussing all potential applications of this software. Suggestions, comments, and bug reports can be sent to Rob Beiko (beiko@cs.dal.ca). If reporting a bug, please provide as much information as possible and, if possible, a simplified version of the data set which causes the bug. This will allow us to quickly resolve the issue.

References