The GenGIS Manual

From The GenGIS wiki
Revision as of 03:01, 4 May 2009 by Dparks (Talk | contribs)

Jump to: navigation, search

We continue to develop the manual.

A recent version of the manual will soon be available in Media:.PDF format.

Contents

Introduction / Overview of GenGIS

Purpose

Geography has always been an important component of evolutionary and ecological theory. But the advent of sequence typing approaches such as 16S ribotyping, DNA barcoding using the COX1 gene, and multi-locus sequence typing, gives us the opportunity to understand how communities of organisms interact, move around and evolve. This sequencing revolution is tightly coupled to the development of new algorithms for assessing and comparing populations based on their genes.

Coupled with these developments is the ready availability of high quality, public domain digital map data. By integrating molecular data with cartography and habitat parameters, we can visualize the geographic and ecological factors that influence community composition and function.

GenGIS is designed to bring these components together into a single software package that satisfies the following criteria:

Free and open source. GenGIS is released under a Creative Commons Attribution - Share Alike 3.0 license, and we have made extensive use of other free packages such as wxWidgets, R, and Python. Making GenGIS freely available allows it to be downloaded and used anywhere in the world, and allows users to inspect and modify the source code.

User-friendly. Although GenGIS is built to deal with challenging scientific questions, our goal is to make the software as easy to use as possible. This is particularly important as many users will have little experience with digital map data, apart from applications such as Google Earth.

Adaptible and extensible. The principal strength of many open-source projects lies in the ability of a loosely organized community of users to develop and enhance the software: R and BioPerl are two examples of successful open-source projects with many contributors. Since the potential applications of GenGIS are much broader than those we have in mind, we aim to make it as easy as possible to extend its capabilities by exposing the internal data structures and offering a plugin architecture.

Platforms and System Requirements

GenGIS has been developed and tested on the following operating systems:

  • Win32 (XP and Vista)
  • Mac OSX 'Leopard'

Support for Linux is not a development priority at the moment. However, it should be fairly straightforward to port GenGIS to Linux and we will happily support any efforts to develop such a version. Similarly, let us know if you experience problems (or successes!) in trying to run GenGIS on an operating system not listed above.

GenGIS can have fairly demanding memory requirements, depending on the size of map that is loaded. We typically try to keep digital maps to a size of 10 MB or less. Please note that we have not tested GenGIS on Windows with anything less than a Pentium 4-class CPU, nor on anything older than a 2008-vintage iMac.

Citing GenGIS

The best citation for GenGIS will always be indicated in boldface on the [Main GenGIS page].

Where to go for help

The latest version of the GenGIS manual will always be available [here].

Tutorials are available [here].

We maintain a [FAQ] for GenGIS as well.

We may also at some point initiate a discussion forum.

Email beiko [at] cs.dal.ca if you have further questions about the software.

Installation

Getting the Release Version

If all you want to do is load in your data and start analyzing and visualizing them, your best bet is the latest Release version which is available [here].

Developer Version – building from source code

To build from source code, we currently use Microsoft Visual C++ 2008 Express Edition on Win32 platforms and gcc 4.3.3 on Mac. Source code is available [here].

Building on Windows

GenGIS can be compiled using the solution file contained in win32/build/msvc. Please note that the Python console is only available for Release builds.

Building on Mac

TBD

Input data

File dependencies

Currently there are four types of files that can be imported directly by the GenGIS application (i.e., not via the Python console or Rpy):

  • The digital map file
  • The location file
  • The sequence file
  • The phylogenetic tree file

Details of the requirements for these files are contained later in this section.

We currently require that you load the map file first, followed by the location file, after which any properly formatted set of 1 or more sequence and/or tree files can be loaded. To be considered in a visual or statistical analysis, every sequence in a sequence file, and every leaf in a phylogenetic tree, must be associated with one of the locations in the Location file.

Maps

Thanks to GDAL, GenGIS can import a wide array of digital map file formats and projections. We cannot provide direct support for GDAL, but there is a considerable amount of support available at the project website, and we have found the script 'gdal_merge.py' and the executables 'gdalwarp.exe' and 'gdal_translate.exe' to be very useful in preparing maps that can be handled by GenGIS.

Supported formats

GenGIS should be able to support any of the formats listed on this page due to the use of GDAL as our data reader, but many of these remain untested. We have had success with the following widely used formats:

  • GeoTIFF
  • Arc/Info ASCIIGRID
  • USGS DEM (and variations thereof)

Projections

If you wish to use a specific projection, you must specify it before loading your map - GenGIS is unable to do reprojections on the fly. This is particularly true if you are loading the default world map (from GTOPO30) that ships with GenGIS: the default Mercator projection stretches the polar regions to an absurd degree, whereas Plate Carre or Robinson will provide a much less distorted world view.

GenGIS currently does not support projections in which a single point is displayed in multiple locations. The best example of this is the default world map, which is actually 'fudged' to stretch only from 89.9 degrees North to 89.9 degrees South latitude. Since the poles stretch across the entire upper and lower edges of a map in a projection such as Plate Carre, GenGIS is unable to display these properly.

To specify the projection before loading your map, right click "New Study : Study" in the Layers tab, and select Properties. Selecting the Projection tab will allow you to choose your projection.

IMAGE SS1-MapProj.jpg (to upload)

Typical limits on map size

The size of map you can load and usefully work with in GenGIS is proportional to the speed of your processor and amount of RAM you have. With 1 GB of RAM you should be able to work with maps that are 10 MB or slightly greater in size.

If the resolution of your map is too high to load efficiently into GenGIS, you can use one of the GDAL executables (gdalwarp or gdal_translate) to reduce the density of points in your map. This will of course decrease the level of detail you can see in the application, but is an acceptable tradeoff in many cases.

Location File

The location file must be provided in a comma-separated format (e.g., the .csv files that can be exported from Microsoft Excel). The first line of the file must be a comma-separated series of headers. Each subsequent line will contain a set of attributes for a single location.

The first three entries on each line must be:

  • A unique location identifier
  • A vertical coordinate, either decimal degrees of latitude or Universal Transverse Mercator (UTM) northing. Note that positive values = north and negative values = south.
  • A horizontal coordinate, either decimal degrees of longitude or UTM easting. Positive values = east and negative values = west.

The first line of the file must therefore begin with the following three column headings:

Site ID,Latitude,Longitude

or

Site ID,Northing,Easting

depending on the coordinate system.

After these three columns, you can specify anything you like in the Location file, including longer descriptive site names, environmental parameters, and a time stamp. So, for instance, a location file header might look like this:

Site ID,Latitude,Longitude,File Size,Environment Type,Geographic Location,Site Name,Country

Each of these values must then be specified for every entity (= row in the file), even if they are called NULL or some other placeholder value.

Sequence File

The basic specification of the sequence file is even simpler, with only two required field:

  • A unique location identifier that is also found in the location file
  • A unique sequence identifier

The first line of the file must begin with the following column headings:

Site ID, Sequence ID

As with the location file, after these essential columns any type of information can be provided. Note that the 'sequence file' need not contain any molecular sequence data, nor do the entities necessarily need to have a one-to-one correspondence to actual sampled sequences.

A simple sequence file might summarize the taxonomic classification of each sampled sequence:

Site ID,Sequence ID,Best_match,Species,Genus,Family,Order,Class,Phylum,Superkingdom

As with the location file, each row of the sequence file must define a value for each of the columns identified in the header line.

Phylogenetic trees

Input phylogenetic trees should adhere to the Newick file format, with the additional constraint that leaf labels must match up exactly with either a Site ID from the location file or a Sequence ID from the sequence file.

The Environment

The map environment

Map pane

Minimap

Getting around in GenGIS

Control elements

Hotkeys

Console input

Mouse

Menu options

Shortcut buttons

Interacting with sample sites

Working with Layers

Adding and removing

Active / inactive

wx Elements

Graphical analysis tools in GenGIS

Basic data visualizations

Pie charts

2D phylogenetic trees

Defining axes

Manipulation

3D phylogenetic trees

The Python console and API functions

What you can do with the console

Accessing sample and sequence data

Capturing results

RPy and analyzing data

Installing R modules

Accessing sample data as tables in R

Capturing output from R

Implemented API functions

Writing your own API functions