Difference between revisions of "Phylogenetics: Compositional Heterogeneity Lab"

From EEBedia
Jump to: navigation, search
(Simulated data and the true tree)
Line 17: Line 17:
 
  mkdir nhlab
 
  mkdir nhlab
 
  cd nhlab
 
  cd nhlab
  curl http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/sim500.nex > sim500.nex
+
  curl <nowiki>http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/sim500.nex</nowiki> > sim500.nex
  
 
At the end of this lab, I will show you how I simulated this data set in p4, but I don't want you to take time doing that now - the analysis will take quite a bit of lab time so I want you to get started on that right away.
 
At the end of this lab, I will show you how I simulated this data set in p4, but I don't want you to take time doing that now - the analysis will take quite a bit of lab time so I want you to get started on that right away.

Revision as of 22:18, 30 March 2014

Adiantum.png EEB 5349: Phylogenetics
The goal of this lab is to introduce you to the influence of compositional heterogeneity on phylogeny.

Compositional heterogeneity means that the equilibrium nucleotide frequencies (or amino acid frequencies for protein data) change across the tree, something that is not accounted for by the standard nucleotide and amino acid models, which assume stationarity (transition probabilities do not change across the tree and one set of equilibrium frequencies applies to every point along any edge of the tree). Non-stationarity can lead to compositional attraction artifacts in which tips with similar nucleotide composition group together even though they may be completely unrelated

Under Construction (should be finished later today, March 30, 2014)

Simulated data and the true tree

Nt-comp-het-tree.png
The program p4, written by Peter Foster, specializes in simulating and analyzing data in which nucleotide composition varies across the tree. I used p4 to simulate data on the tree show on the right. The black-colored lineages were characterized by an AT-biased nucleotide composition very different from the red-colored lineages, which were strongly GC-biased. My goal was to generate a data set that would be very susceptible to nucleotide compositional attraction under ordinary substitution models, and in that I succeeded (as you will see). Taxa C and H share many similarities due to the large number of G and C bases they have independently acquired from their AT-rich ancestors, and it will be very tempting for an ordinary nucleotide model such as GTR to place C and H together.

First, log in to the cluster and use qlogin to acquire a free slot. Then download the 500-site simulated data set to a directory named nhlab in your home directory on the cluster:

mkdir nhlab
cd nhlab
curl http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/sim500.nex > sim500.nex

At the end of this lab, I will show you how I simulated this data set in p4, but I don't want you to take time doing that now - the analysis will take quite a bit of lab time so I want you to get started on that right away.

Obtaining nhPhyloBayes

You will perform analyses using the program nhPhyloBayes, written by Samuel Blanquart. nhPhyloBayes is available as a tar archive,

nh_PhyloBayes_0.2.3.tar

Ordinary models yield the compositional attraction tree

Non-homogeneous models yield the true tree

  • Question answer

Literature Cited