Phylogenetics: Compositional Heterogeneity Lab
|EEB 5349: Phylogenetics|
|The goal of this lab is to introduce you to the influence of compositional heterogeneity on phylogeny.|
Compositional heterogeneity means that the equilibrium nucleotide frequencies (or amino acid frequencies for protein data) change across the tree, something that is not accounted for by the standard nucleotide and amino acid models, which assume stationarity (transition probabilities do not change across the tree and one set of equilibrium frequencies applies to every point along any edge of the tree). Non-stationarity can lead to compositional attraction artifacts in which tips with similar nucleotide composition group together even though they may be completely unrelated
Under Construction (should be finished later today, March 30, 2014)
Simulated data and the true treep4, written by Peter Foster, specializes in simulating and analyzing data in which nucleotide composition varies across the tree. I used p4 to simulate data on the tree show on the right. The black-colored lineages were characterized by an AT-biased nucleotide composition very different from the red-colored lineages, which were strongly GC-biased. My goal was to generate a data set that would be very susceptible to nucleotide compositional attraction under ordinary substitution models, and in that I succeeded (as you will see). Taxa C and H share many similarities due to the large number of G and C bases they have independently acquired from their AT-rich ancestors, and it will be very tempting for an ordinary nucleotide model such as GTR to place C and H together.
First, log in to the cluster and use qlogin to acquire a free slot. Then download the 500-site simulated data set to a directory named nhlab in your home directory on the cluster:
mkdir nhlab cd nhlab curl http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/sim500.nex > sim500.nex
At the end of this lab, I will show you how I simulated this data set in p4, but I don't want you to take time doing that now - the analysis will take quite a bit of lab time so I want you to get started on that right away.
You will perform analyses using the program nhPhyloBayes, written by Samuel Blanquart. nhPhyloBayes is available as a tar archive,
Ordinary models yield the compositional attraction tree
Non-homogeneous models yield the true tree