Difference between revisions of "Phylogenetics: APE Lab"

From EEBedia
Jump to: navigation, search
Line 134: Line 134:
 
LTT plots usually have a log scale for the number of lineages (y-axis), and this can be easily accomplished:
 
LTT plots usually have a log scale for the number of lineages (y-axis), and this can be easily accomplished:
 
  > ltt.plot(t, log = "y")
 
  > ltt.plot(t, log = "y")
 
+
Now add a line extending from the point (t = -0.265, N = 2) to the point (t = 0, N = 20) using the command <tt>segments</tt> (this is an R command, not one provided by APE):
 +
> segments(-0.265, 2, 0, 20, lty="dotted")
 +
The slope of this line should (ideally) be equal to the birth rate of the yule process used to generate the tree, which was <math>\lambda=10</math>.
 +
<div style="background-color: $eeeeff">
 +
* Calculate the slope of this line. Is it close to the birth rate 10?
 +
</div>
  
 
== Literature Cited ==
 
== Literature Cited ==

Revision as of 02:47, 29 April 2009

Adiantum.png EEB 349: Phylogenetics
This lab is an introduction to some of the capabilities of APE, a phylogenetic analysis package written for the R language. You may want to review the R Primer lab if you've already forgotten everything you learned about R.

Installing APE and apTreeshape

APE is a package largely written and maintained by Emmanuel Paradis, who has written a very nice book[1] explaining in detail how to use APE. APE is designed to be used inside the R programming language, to which you were introduced earlier in the semester (see Phylogenetics: R Primer). APE can do an impressive array of analyses. For example, it is possible to estimate trees using neighbor-joining or maximum likelihood, estimate ancestral states (for either discrete or continuous data), perform Sanderson's penalized likelihood relaxed clock method to estimate divergence times, evaluate Felsenstein's independent contrasts, estimate birth/death rates, perform bootstrapping, and even automatically pull sequences from GenBank given a vector of accession numbers! APE also has impressive tree plotting capabilities, of which we will only scratch the surface today (flip through Chapter 4 of the Paradis book to see what more APE can do).

apTreeshape is a different R package (written by Nicolas Bortolussi et al.) that we will use today.

To install APE and apTreeshape, start R and type the following at the R command prompt:

> install.packages("ape")
> install.packages("apTreeshape")

Assuming you are connected to the internet, R should locate these packages and install them for you. After they are installed, you will need to load them into R in order to use them (note that no quotes are used this time):

> library(ape)
> library(apTreeshape)

You should never again need to issue the install.packages command for APE and apTreeshape, but you will need to use the library command to load them whenever you want to use them.

Reading in trees from a file

Download this tree file and save it as a file named yule.tre in a new folder somewhere on your computer. Tell R where this folder is using the setwd (set working directory) command. For example, I created a folder named apelab on my desktop, so I typed this to make that folder my working directory:

> setwd("/Users/plewis/Desktop/apelab")

Now you should be able to read in the tree using this ape command (the t is an arbitrary name I chose for the variable used to hold the tree; you could use tree if you want):

> t <- read.nexus("yule.tre")

APE stores trees as an object of type "phylo".

Getting a tree summary

Some basic information about the tree can be obtained by simply typing the name of the variable you used to store the tree:

> t

Phylogenetic tree with 20 tips and 19 internal nodes.

Tip labels:
	B, C, D, E, F, G, ...

Rooted; includes branch lengths.

Obtaining vectors of tip and internal node labels

The variable t has several attributes that can be queried by following the variable name with a dollar sign and then the name of the attribute. For example, the vector of tip labels can be obtained as follows:

> t$tip.label
[1] "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U"

The internal node labels, if they exist, can be obtained this way:

> t$node.label
NULL

The result above means that labels for the internal nodes were not stored with this tree.

Obtaining the nodes attached to each edge

The nodes at the ends of all the edges in the tree can be had by asking for the edge attribute:

> t$edge
      [,1] [,2]
 [1,]   21   22
 [2,]   22   23
 [3,]   23    1
   .     .    .
   .     .    .
   .     .    .
[38,]   38   12 

Obtaining a vector of edge lengths

The edge lengths can be printed thusly:

> t$edge.length
 [1] 0.07193600 0.01755700 0.17661500 0.02632500 0.01009100 0.06893900 0.07126000 0.03970200 0.01912900
[10] 0.01243000 0.01243000 0.03155800 0.05901300 0.08118600 0.08118600 0.00476400 0.14552600 0.07604800
[19] 0.00070400 0.06877400 0.06877400 0.02423800 0.02848800 0.01675100 0.01675100 0.04524000 0.19417200
[28] 0.07015000 0.12596600 0.06999200 0.06797400 0.00201900 0.00201900 0.12462600 0.07128300 0.00004969
[37] 0.00004969 0.07133200

About this tree

This tree in the file yule.tre was obtained using PAUP from 10,000 nucleotide sites simulated in Phycas from a Yule tree (which was also generated by Phycas). The model used to generated the simulated data (HKY model, kappa = 4, base frequencies = 0.3 A, 0.2 C, 0.2 G, and 0.3 T, no rate heterogeneity) was also used in the analysis by PAUP (the final ML tree was made ultrametric by enforcing the clock constraint). I analyzed this data in BEAST for part of a lecture. See this PDF file for details.

Fun with plotting trees in APE

You can plot the tree using all defaults with this ape command:

> plot(t)

Let's try changing a few defaults and plot the tree in a variety of ways. All of the following change just one default option, but feel free to combine these to create the plot you want.

Left-facing, up-facing, or down-facing trees

> plot(t, direction="l")
> plot(t, direction="u")
> plot(t, direction="d")

The default is to plot the tree right-facing (direction="r").

Hide the taxon names

> plot(t, show.tip.label=FALSE)

The default behavior is to show the taxon names.

Make the edges thicker

> plot(t, edge.width=4)

An edge width of 1 is the default. If you specify several edge widths, APE will alternate them as it draws the tree:

> plot(t, edge.width=c(1,2,3,4))

Color the edges

> plot(t, edge.color="red")

Black edges are the default. If you specify several edge colors, APE will alternate them as it draws the tree:

> plot(t, edge.color=c("black","red","green","blue"))

Make taxon labels smaller or larger

> plot(t, cex=0.5)

The cex parameter governs relative scaling of the taxon labels, with 1.0 being the default. Thus, the command above makes the taxon labels half the default size. To double the size, use

> plot(t, cex=2.0)

Plot tree as an unrooted or radial tree

> plot(t, type="u")

The default type is "p" (phylogram), but "c" (cladogram), "u" (unrooted), "r" (radial) are other options. Some of these options (e.g. "r") create very funky looking trees, leading me to think there is something about the tree description in the file yule.tre that APE is not expecting.

Labeling internal nodes

> plot(t)
> nodelabels()

This is primarily useful if you want to annotate one of the nodes:

> plot(t)
> nodelabels("Clade A", 22)
> nodelabels("Clade B", 35)

To put the labels inside a circle rather than a rectangle, use frame="c" rather than the default (frame="r"). To use a background color of white rather than the default "lightblue", use bg="white":

> plot(t)
> nodelabels("Clade A", 22, frame="c", bg="white")
> nodelabels("Clade B", 35, frame="c", bg="yellow")

Adding a scale bar

> plot(t)
> add.scale.bar(length=0.05)

The above commands add a scale bar to the bottom left of the plot. To add a scale going all the way across the bottom of the plot, try this:

> plot(t)
> axisPhylo()

Diversification analyses

APE can perform some lineage-through-time type analyses. The tree read in from the file yule.tre that you already have in memory is perfect for testing APE's diversification analyses because we know (since it is based on simulated data) that this tree was generated under a pure-birth (Yule) model.

Lineage through time plot

This is a rather small tree, so a lineage through time (LTT) plot will be rather crude, but let's go through the motions anyway.

> ltt.plot(t)

LTT plots usually have a log scale for the number of lineages (y-axis), and this can be easily accomplished:

> ltt.plot(t, log = "y")

Now add a line extending from the point (t = -0.265, N = 2) to the point (t = 0, N = 20) using the command segments (this is an R command, not one provided by APE):

> segments(-0.265, 2, 0, 20, lty="dotted")

The slope of this line should (ideally) be equal to the birth rate of the yule process used to generate the tree, which was \lambda=10.

  • Calculate the slope of this line. Is it close to the birth rate 10?

Literature Cited

  1. Paradis, E. 2006. Analysis of phylogenetics and evolution with R. Springer. ISBN: 0-387-32914-5