The GenGIS Manual

From The GenGIS wiki
Revision as of 18:01, 7 June 2011 by Timothy (Talk | contribs)

Jump to: navigation, search

We continue to develop the manual.

A recent version of the manual will soon be available in Media:.PDF format.

Contents

Introduction / Overview of GenGIS

Purpose

Geography has always been an important component of evolutionary and ecological theory. But the advent of sequence typing approaches such as 16S ribotyping, DNA barcoding using the COX1 gene, and multi-locus sequence typing, gives us the opportunity to understand how communities of organisms interact, move around and evolve. This sequencing revolution is tightly coupled to the development of new algorithms for assessing and comparing populations based on their genes.

Coupled with these developments is the ready availability of high quality, public domain digital map data. By integrating molecular data with cartography and habitat parameters, we can visualize the geographic and ecological factors that influence community composition and function.

GenGIS is designed to bring these components together into a single software package that satisfies the following criteria:

Free and open source. GenGIS is released under a Creative Commons Attribution - Share Alike 3.0 license, and we have made extensive use of other free packages such as wxWidgets, R, and Python. Making GenGIS freely available allows it to be downloaded and used anywhere in the world, and allows users to inspect and modify the source code.

User-friendly. Although GenGIS is built to deal with challenging scientific questions, our goal is to make the software as easy to use as possible. This is particularly important as many users will have little experience with digital map data, apart from applications such as Google Earth.

Adaptible and extensible. The principal strength of many open-source projects lies in the ability of a loosely organized community of users to develop and enhance the software: R and BioPerl are two examples of successful open-source projects with many contributors. Since the potential applications of GenGIS are much broader than those we have in mind, we aim to make it as easy as possible to extend its capabilities by exposing the internal data structures and offering a plugin architecture.

Platforms and System Requirements

GenGIS has been developed and tested on the following operating systems:

  • Windows (XP, Vista & 7) 32-bit binaries compatible with 64-bit Windows releases.
  • Mac OS X (v10.5 'Leopard' & v10.6 'Snow Leopard')

Support for Linux is not a development priority at this time. Since GenGIS has been developed using cross-platform libraries, porting to Linux should be fairly straightforward. We will happily support any efforts to develop such a version. If you do choose to port to other operating systems, please let us know whether you experience any difficulties. Your feedback on this project is always greatly appreciated.

GenGIS can have fairly demanding memory requirements, depending on the size of map that is loaded. We typically load digital maps of size 10 megabyte or less. Please note that we have not tested GenGIS on Windows operating systems with anything less than a Pentium 4-class CPU, nor on anything older than a 2008-vintage iMac.

Citing GenGIS

The best citation for GenGIS will always be indicated in boldface on the Main GenGIS page.

Where to Go for Help

  • The latest version of the GenGIS manual is available here.
  • Text and video tutorials are available on the Tutorials page.
  • Visit our FAQ page for a list of GenGIS-related questions.
  • We may also at some point initiate a discussion forum.
  • Please email beiko [at] cs.dal.ca with any further questions about the software.

Installation

Getting the Release Version

To get started visualizing and analyzing data in GenGIS, download the latest Release Version from the Download page.

Developer Version – building from source code

The source code for GenGIS is available on the Download page.

Building on Windows

GenGIS can be compiled using Microsoft Visual C++ 2008 Express Edition. The solution file (GenGIS.sln) is located in the 'win32/build/msvc' directory. Please note that the built-in Python console is only available in Release builds.

Building on Mac

GenGIS can be compiled using the Makefile located in the 'mac/build/gcc' directory. To compile, simply run 'make' within the terminal. Development has been performed using the gcc 4.3.3 compiler.

Input data

File dependencies

Currently there are four types of files that can be imported directly by the GenGIS application (i.e., not via the Python console or Rpy):

  • The digital map file
  • The location file
  • The sequence file
  • The phylogenetic tree file

Details of the requirements for these files are contained later in this section.

We currently require that you load the map file first, followed by the location file, after which any properly formatted set of 1 or more sequence and/or tree files can be loaded. To be considered in a visual or statistical analysis, every sequence in a sequence file, and every leaf in a phylogenetic tree, must be associated with one of the locations in the Location file.

Maps

Thanks to GDAL, GenGIS can import a wide array of digital map file formats and projections. We cannot provide direct support for GDAL, but there is a considerable amount of support available at the project website, and we have found the script 'gdal_merge.py' and the executables 'gdalwarp.exe' and 'gdal_translate.exe' to be very useful in preparing maps that can be handled by GenGIS.

Supported formats

GenGIS should be able to support any of the formats listed on this page due to the use of GDAL as our data reader, but many of these remain untested. We have had success with the following widely used formats:

  • GeoTIFF
  • Arc/Info ASCIIGRID
  • USGS DEM (and variations thereof)

Projections

If you wish to use a specific projection, you must specify it before loading your map - GenGIS is unable to do reprojections on the fly. This is particularly true if you are loading the default world map (from GTOPO30) that ships with GenGIS: the default Mercator projection stretches the polar regions to an absurd degree, whereas Plate Carre or Robinson will provide a much less distorted world view.

GenGIS currently does not support projections in which a single point is displayed in multiple locations. The best example of this is the default world map, which is actually 'fudged' to stretch only from 89.9 degrees North to 89.9 degrees South latitude. Since the poles stretch across the entire upper and lower edges of a map in a projection such as Plate Carre, GenGIS is unable to display these properly.

To specify the projection before loading your map, right click "New Study : Study" in the Layers tab, and select Properties. Selecting the Projection tab will allow you to choose your projection.

IMAGE SS1-MapProj.jpg (to upload)

Typical limits on map size

The size of map you can load and usefully work with in GenGIS is proportional to the speed of your processor and amount of RAM you have. With 1 GB of RAM you should be able to work with maps that are 10 MB or slightly greater in size.

If the resolution of your map is too high to load efficiently into GenGIS, you can use one of the GDAL executables (gdalwarp or gdal_translate) to reduce the density of points in your map. This will of course decrease the level of detail you can see in the application, but is an acceptable tradeoff in many cases.

Location File

The location file must be provided in a comma-separated format (e.g., the .csv files that can be exported from Microsoft Excel). The first line of the file must be a comma-separated series of headers. Each subsequent line will contain a set of attributes for a single location.

The first three entries on each line must be:

  • A unique location identifier
  • A vertical coordinate, either decimal degrees of latitude or Universal Transverse Mercator (UTM) northing. Note that positive values = north and negative values = south.
  • A horizontal coordinate, either decimal degrees of longitude or UTM easting. Positive values = east and negative values = west.

The first line of the file must therefore begin with the following three column headings:

Site ID,Latitude,Longitude

or

Site ID,Northing,Easting

depending on the coordinate system.

After these three columns, you can specify anything you like in the Location file, including longer descriptive site names, environmental parameters, and a time stamp. So, for instance, a location file header might look like this:

Site ID,Latitude,Longitude,File Size,Environment Type,Geographic Location,Site Name,Country

Each of these values must then be specified for every entity (= row in the file), even if they are called NULL or some other placeholder value.

Sequence File

The basic specification of the sequence file is even simpler, with only two required field:

  • A unique location identifier that is also found in the location file
  • A unique sequence identifier

The first line of the file must begin with the following column headings:

Site ID, Sequence ID

As with the location file, after these essential columns any type of information can be provided. Note that the 'sequence file' need not contain any molecular sequence data, nor do the entities necessarily need to have a one-to-one correspondence to actual sampled sequences.

A simple sequence file might summarize the taxonomic classification of each sampled sequence:

Site ID,Sequence ID,Best_match,Species,Genus,Family,Order,Class,Phylum,Superkingdom

As with the location file, each row of the sequence file must define a value for each of the columns identified in the header line.

Phylogenetic trees

Input phylogenetic trees should adhere to the Newick file format, with the additional constraint that leaf labels must match up exactly with either a Site ID from the location file or a Sequence ID from the sequence file.

The Environment

The graphical user interface for GenGIS consists of a collection of different interface elements. Many features of GenGIS can be accessed through the Menu. The most commonly used features are exposed on the Toolbar. Data loaded into GenGIS is organized into a Layer Tree, which is made explicit to the user in the panel on the left. This hierarchical structure provides a natural organization of data and allows the properties of related data items to be set easily. Data visualizations are displayed in the 3D Viewport. Mouse navigation within this 3D environment follows a standard world-in-hand navigation model. Alternatively, the camera position and angle can be modified using the Navigation Widget, which also provides an overview map. The Console provides feedback to the user such as the results of statistical tests and warnings about potential problems with loaded data. Almost all features available through the graphical user interface can be accessed within the Python Console window. This includes loading data layers, modifying camera parameters, and accessing location or sequence data. Information about interface elements or graphical features within the Viewport is displayed on the Statusbar.

Overview of GenGIS graphical user interface.

Menu Items

File Menu

  • New study: (In development) Creates a new study. A study can contain any number of maps, location sets, sequence data files, and geographic tree models.
  • Open study: (In development) Open a previously saved study.
  • Save study: (In development) Save the current study.
  • Save study as: (In development) Save the current study under a new filename.
  • Save image as: Save Viewport contents to a PNG file.
  • Exit: Exits the program.

View Menu

  • Panes->Side Panel: Hide/unhide the left side panel containing the Layer tree.
  • Panes->Console: Hide/unhide the bottom panel containing the Console and Python Console.
  • Camera Position->Default: Move camera to its default perspective position.
  • Camera Position->Top: Move camera to give a top-down view of the map.
  • Detail->Fine: Increase the level-of-detail of the map.
  • Detail->Coarse: Decrease the level-of-detail of the map.

Layer Menu

  • Add map: Add a map to the currently selected study. Adding multiple maps is experimental.
  • Add location set: Add a location set to the currently selected map. Adding multiple location sets is experimental.
  • Add sequence data: Add sequence data to the currently selected location set. Adding multiple sequence sets is experimental.
  • Add tree: Add a geographic tree model to the currently selected map. Adding multiple trees is experimental.
  • Remove layer: Remove the currently selected layer.
  • Hide all layers: Hide all layers so they are no longer displayed in the Viewport.
  • Show all layers: Show all layers in the Viewport.

Settings Menu

  • Lighting: Brings up the Lighting Properties dialog box which allows properties of the light source used to render the Viewport to be modified.
  • Layout objects: Brings up the Layout Objects Properties dialog box which all visual properties of layout primatives to be modified.

Help Menu

  • About: Brings up an About GenGIS dialog box which contains a link to this website in addition to other information.

Toolbar Buttons

The Toolbar provides easy access to frequently used features.

ToolbarMashup.jpg
  • Add map: Add a map to the currently selected study. Adding multiple maps is experimental.
  • Add location set: Add a location set to the currently selected map. Adding multiple location sets is experimental.
  • Add sequence data: Add sequence data to the currently selected location set. Adding multiple sequence sets is experimental.
  • Add tree: Add a geographic tree model to the currently selected map. Adding multiple trees is experimental.
  • Default perspective view: Move camera to its default perspective position.
  • Top view: Move camera to give a top-down view of the map.
  • Draw layout line: Draw a straight line which can be used to layout graphic elements such as a 2D tree or pie charts.
  • Draw layout ellipse: Draw an ellipse which can be used to layout graphic elements such as a 2D tree or pie charts.
  • Draw geographic axis: Draw a polyline which can be used to test the goodness-of-fit between a tree and a non-linear geographic axes (see 2D Pylogenetic Trees).

Navigating the Viewport

Mouse

You can move the map by holding down the left mouse button while moving the mouse. To change the pitch of the camera, hold the right mouse button down and move the mouse up and down. Similarly, to rotate the map move the mouse left or right while holding down the right mouse button. The camera can be zoomed using the scroll wheel on your mouse.

Navigation Widget

The navigation widget can also be used to navigate around the map. The arrows at the top of the widget allow the map to be moved while the plus and minus button let one zoom in and out of the map. Clicking on the compass and dragging the mouse around the compass face allows one to rotate the map. User can jump to a specific point in the map by clicking within the overview map.

NavigationWidget.jpg

Predefined Views

GenGIS also provides two convenient default views. To quickly switch to a perspective view or a top down view use either the View→Camera Position menu items or the corresponding toolbar buttons.

CameraViewToolbar.jpg

Layer Tree

Hide / Unhide

Popup Menus

Hotkeys

Console Panels

Console

Python Console

Interacting with sample sites

Property Dialogs

Graphical analysis tools in GenGIS

Basic data visualizations

Pie charts

2D phylogenetic trees

Coming soon!

The algorithm to layout a 2D tree has an exponential running time in the degree of a node. As such, we do not suggest trying to layout trees with nodes of degree > 10. Details of this algorithm are given in:

Parks, D.H. and Beiko, R.G. (2009). Quantitative visualizations of hierarchically organized data in a geographic context. Accepted to Geoinformatics 2009, Fairfax, VA.

Defining axes

Manipulation

3D phylogenetic trees

The Python console and API functions

What you can do with the console

The Python Console provides access to a standard Python interpreter. Python is a general-purpose high-level programming language with many packages available for phylogenetics, population genetics, and statistics. Data loaded into GenGIS is exposed to the Python Console allowing quanitative hypothesis testing to be performed directly within GenGIS. Results of analyses can be visualized within the Viewport to aid in interpretation of results and generation of new hypotheses.

Here is a list of all API functions. Below we give several short examples of using this API. You can also find information about using the API on our tutorials page.

Accessing location site and sequence data

Location site and sequence data can be accessed directly from the Python Console. The easiest way to do this is to utilize functions within the dataHelper.py file located in your scripts directory. To make use of these functions they must first be imported:

import dataHelper

You can access location data using:

 locData = dataHelper.getLocationSetData()

This will read in the location data from the first location set under the first map in the Layer Tree. Location data is returned as a Python list of location objects. Location objects contain a number of functions along with a Python dictionary holding all metadata associated with the location:

 temp = locData[0].data['Temperature']

To access location set data from the 2nd location set under the 3rd map use:

 locData = dataHelper.getLocationSetData(2, 3)

Sequence data is accessed in an analogous manner:

 seqData = dataHelper.getSequenceData()

Sequence data is returned as a Python list of sequence objects. Sequence objects contain a Python dictionary of all metadata associated with a sequence:

 phylum = seqData[0].data['Phylum']

To get sequence data from the ith location of the jth location set of the kth map use:

 seqData = dataHelper.getSequenceData(i, j, k)

If you wish to understand the low-level details on how to access data please consult the functions in dataHelper.py. We plan to extend the dataHelper interface in the future to allow data to be queried by name in addition to being queried by index position.

Filtering data

We have provided a simple function, filterData, for filtering data. This function is contained in dataHelper.py. As an example, all locations with a temperature greater than 20 can be obtained as follows:

import dataHelper
locData = dataHelper.getLocationSetData()
filteredData = dataHelper.filterData(locData, 'Temperature', 20, dataHelper.filterFunc.greater)

The function filterData takes 4 parameters:

  • the data to be filtered
  • the field to filter on
  • the value to filter on
  • a filtering function which returns true for all items passing the filter

Filtering can be done on either strings or numeric values:

import dataHelper
seqData = dataHelper.getSequenceData()
filteredData = dataHelper.filterData(seqData, 'Phylum', 'Actinobacteria', dataHelper.filterFunc.equal)

Basic filtering functions are provided in filterFunc.py, but it is easy to write your own filtering functions. For example, the equal filter used above is simply:

def equal(val1, val2):
  return str(val1) == str(val2)

Selecting sequences

GenGIS supports the notion of an active set of sequences. Calculations and visualizations reflect only those sequences in the active set. You can add or remove sequences from the active set using the SetActive member function. As an example, one could select only those sequences classified as Gammaproteobacteria, Betaproteobacteria, or Epsilonproteobacteria in order to produce pie charts showing the relative proportion of sequences from these three classes:

seqData = dataHelper.getSequenceData()
for seq in seqData:
  if seq.data['Class'] in ['Gammaproteobacteria', 'Betaproteobacteria', 'Epsilonproteobacteria']:
    seq.SetActive(True)
  else:
    seq.SetActive(False)			
dataHelper.getChartSetView().UpdateCharts()
refresh()

You can also use the helper function selectData to place sequences into the active set:

import dataHelper
seqData = dataHelper.getSequenceData()
dataHelper.selectData(seqData, 'Phylum', 'Actinobacteria', dataHelper.filterFunc.equal)

The helper function selectAllSeqs can be used to place all sequences into the active set:

seqData = dataHelper.getSequenceData()
dataHelper.selectAllSeqs(seqData)

Creating custom data visualizations

Using the VisualLine, VisualMarker, and VisualLabel classes one can create custom data visualizations with GenGIS. The VisualLine class allows user defined lines to be drawn in the Viewport. Suppose we have two locations within our location set with ids of 'GBR' and 'ITA'. We can draw a line between these locations as follows:

# get location data
import dataHelper as dh
locData = dh.getLocationSetData()
# create a dictionary indicating the geographic coordinates of each location 
locDict = {}
for loc in locData:
  locDict[loc.id] = [loc.northing, loc.easting]

# get the 3D position of GBR and ITA
gbrPt = Point3D()
convertGeoCoord(locDict['GBR'][0], locDict['GBR'][1], gbrPt)
itaPt = Point3D()
convertGeoCoord(locDict['ITA'][0], locDict['ITA'][1], itaPt)
# draw a solid red line with a width of 2 between these countries
line = VisualLine(Colour(1,0,0), 2, LINE_STYLE.SOLID, Line3D(gbrPt, itaPt))
lineId = addLine(line)			
refresh()

The visual properties of this line can easily be changed at any time to reflect different aspects of your data:

line.SetColour(Colour(1,0,1))
line.SetSize(5)
line.SetLineStyle(LINE_STYLE.SHORT_DASH)
refresh() 

We can later remove this line using:

removeLine(lineId)

The VisualMarker class allows user defined markers to be drawn in the Viewport. It is similar to the VisualLine class. For example, we can draw a blue circle over London which is situated at a latitude of 51.51N and a longitude 0.128W as follows:

# get 3D position of geographic coordinate in Viewport
pt = Point3D()
convertLatLong(51.51, -0.128, pt)
marker = VisualMarker(Colour(0,0,1), 6, MARKER_SHAPE.CIRCLE, pt)
markerId = addMarker(marker)
refresh()

VisualLabels can be used to create orthographic (e.g., to indicate a legend or figure caption) or perspective text (e.g., to label points on a map). We can create a "Hello World!" label for a map as follows:

label = VisualLabel("Hello World!", Colour(0,0,0), 12)
label.SetScreenPosition(Point3D(20,20,1))
label.SetRenderingStyle(LABEL_RENDERING_STYLE.ORTHO)
labelId = addLabel(label)
refresh()

Alternatively, we can use a VisualLabel to label our marker at London:

label = VisualLabel("London", Colour(0,0,0), 12)
pt = Point3D()
convertLatLong(51.51, -0.128, pt)
label.SetGridPosition(pt)
label.SetRenderingStyle(LABEL_RENDERING_STYLE.PERSPECTIVE)
labelId = addLabel(label)
refresh()

By combining these graphical primative and encoding key aspects of your data to different visual properties (i.e., colour, size, shape) GenGIS can be used to identify interesting patterns within a wide-range of datasets. An example which uses these classes to visualizing a distance matrix indicating the rate of import and export of HIV-1 subtype B for different European countries as reported by Paraskevis et al. (2009) is available here.

Creating fly-through movies

A collection of functions for creating fly-through movies are available in movieHelper.py found in the scripts directory. A useful movie is to rotate the map about its origin. Such a movie can be made using the rotateAboutOrigin function which takes the number of degrees to rotate and the time of the movie as parameters:

import movieHelper
movieHelper.rotateAboutOrigin(360, 10)

More general movies can be created by capturing the camera parameters at key frames using the function getCameraParam and then interpolating between these key frames using the function linearInterpolateParams:

import movieHelper 
# move camera to first key frame (for example, use the toolbar to set a top down view)
keyFrame1 = movieHelper.getCameraParam()
# move camera to next key frame (for example, use the toolbar to set the default perspective view)
keyFrame2 = movieHelper.getCameraParam()
# set camera back to first key frame
movieHelper.setCameraParam(keyFrame1)
# smoothly move between these key frames in 5 seconds
movieHelper.linearInterpolateParams(keyFrame1, keyFrame2, 5)

For examples of creating custom movies which do not use the movieHelper API, have a look at the series of H1N1 movies we have developed. In particular, note that wxSafeYield must be called occasionally for time series to run correctly.

By stitching together multiple key frames complex fly-through movies can be created. Commercial software such as Camtasia or open source software such as CamStudio can be used to record these movies.

RPy and analyzing data

Accessing sample data as tables in R

Capturing output from R