Difference between revisions of "Phylogenetics: Distances Lab"

From EEBedia
Jump to: navigation, search
Line 61: Line 61:
 
During the course of this lab, you can simply add commands to this paup block rather than creating a paup block in the data file itself. The <tt>set torder=right</tt> command simply causes trees to be displayed so that the outgroup is at the top and the tree appears to flow to the right (this is often called ''ladderizing right''). Try changing this to <tt>set torder=left</tt> to see what ''ladderizing left'' looks like. The <tt>set autoclose</tt> command just makes the analysis run more quickly because PAUP* closes the progress dialog box automatically after a search is done rather than requiring you to press the "Close" button.
 
During the course of this lab, you can simply add commands to this paup block rather than creating a paup block in the data file itself. The <tt>set torder=right</tt> command simply causes trees to be displayed so that the outgroup is at the top and the tree appears to flow to the right (this is often called ''ladderizing right''). Try changing this to <tt>set torder=left</tt> to see what ''ladderizing left'' looks like. The <tt>set autoclose</tt> command just makes the analysis run more quickly because PAUP* closes the progress dialog box automatically after a search is done rather than requiring you to press the "Close" button.
  
== Perform an exhaustive search under the minimum evolution and least squares criteria ==
+
=== Perform an exhaustive search under the minimum evolution and least squares criteria ===
 
Use both the JC and logdet models combined with both Minimum Evolution and Least Squares (using the default power of 2) optimality criteria. Use <tt>dset dist=?</tt> to find out how to specify the model, and use <tt>dset objective=?</tt> to find out how to specify which of the two distance-based optimality criteria to use. Set up all 4 analyses (2 models times 2 optimality criteria) in the paup block (the first one has already been done for you), then run them all at once by executing the file. Note the printable comment (it starts with an exclamation point). Making comments like this in your paup block will allow you to easily find where the results from each model start in the output.
 
Use both the JC and logdet models combined with both Minimum Evolution and Least Squares (using the default power of 2) optimality criteria. Use <tt>dset dist=?</tt> to find out how to specify the model, and use <tt>dset objective=?</tt> to find out how to specify which of the two distance-based optimality criteria to use. Set up all 4 analyses (2 models times 2 optimality criteria) in the paup block (the first one has already been done for you), then run them all at once by executing the file. Note the printable comment (it starts with an exclamation point). Making comments like this in your paup block will allow you to easily find where the results from each model start in the output.
  
Line 67: Line 67:
  
 
The main point here is that the model used can make a difference in whether you get the correct answer. We will explore this dataset further using maximum likelihood in a later lab.
 
The main point here is that the model used can make a difference in whether you get the correct answer. We will explore this dataset further using maximum likelihood in a later lab.
 +
 +
=== Perform a bootstrap analysis ===
 +
Bootstrapping (introduced to phylogenetics by Joe Felsenstein in 1985<ref>Felsenstein, J. 1985. Confidence intervals on phylogenies: an approach using the bootstrap. Evolution 39:783-791.</ref>) is one of the most common methods used to assess which parts (i.e. edges) of an estimated tree are best supported and which are poorly supported by the data. In bootstrapping, many new datasets are created by sampling (with replacement) characters from the original dataset. A tree is estimated from each bootstrap replicate dataset, and at the end a consensus tree is produced that has in it all splits showing up in a majority of the bootstrap trees. The idea is that your original dataset can be thought of as one sampling of characters from a vast pool of characters, and producing other data sets by resampling comes as close as possible to collecting data from other genes similar to the one for which you did collect data. The majority-rule consensus tree does not include splits that were present in fewer than half of the trees estimated from bootstrap replicate data sets, but the bipartition table that is produced by PAUP* provides the relative frequency of these less common splits.
 +
 +
Perform a bootstrap analysis (500 replicates, heuristic search using least squares) under the F84 model. The commands for doing this are shown below. I suggest you copy these into a new paup block in your runalgae.nex file (be sure to disable your previous paup block by renaming it):<pre>
 +
dset distance=f84 objective=lsfit power=2;
 +
bootstrap nrep=500 search=heuristic bseed=5555;</pre>
 +
 +
* ''Examine the bipartition table PAUP* produces and locate the bootstrap proportion for the bipartition separating the chlorophyll a/b organisms from ''Anacystis'' and ''Olithodiscus'' ''
 +
* ''What is the bootstrap proportion for the bipartition separating ''Olithodiscus'' and ''Euglena'' from all the other taxa?''
 +
 +
Sometimes a split will just barely fail to make the cut (e.g. it appeared in 49% of the bootstrap trees), and so it is a good idea to check the bipartition table to make sure you don't fail to notice these.
 +
 +
  
 
== References cited ==
 
== References cited ==

Revision as of 19:45, 30 January 2007

Under construction.png This article is still under construction.
Expect it to change frequently until this notice is removed.
This article is part of an EEB course.
Please do not edit the content of this page without the approval of the course instructor.
Adiantum.png EEB 349: Phylogenetics
The goal of this lab exercise is to show you how to conduct various distance based analyses in PAUP* and SplitsTree

Part A: Using PAUP* to check your answers for homework #3

Part B: Analysis of algae.nex

Create the data file algae.nex

Click here to get the data, then use Ctrl-a to copy all of it from your browser window and save it to a file named algae.nex in a folder on your local hard drive. These data were originally used in a 1994 study by Lockhart et al.[1] and comprise eight 16S ribosomal RNA sequences:

Anacystis a cyanobacterium (has chlorophyll a but not b or c)
Olithodiscus a chloroplast from a chromophyte alga (chlorophylls a and c)
Euglena a chloroplast from a photosynthetic euglenophyte protist
Chlorella a chloroplast from a a chlorophyte green alga
Chlamydomonas a chloroplast from a chlorophyte green alga
Marchantia a chloroplast from a thallose liverwort (non-vascular bryophyte land plant)
Oryza a chloroplast from a monocot (the flowering plant rice)
Nicotiana a chloroplast from a dicot (the flowering plant tobacco)

All of these organisms except Anacystis and Olithodiscus have chlorophylls a and b. It is probable (based on independent evidence) that all chlorophyll a/b-containing chloroplasts have a common endosymbiotic origin, so we would expect trees constructed from these data to show a branch separating Anacystis and Olithodiscus from everything else. The cyanobacterium Anacystis uses phycobilin accessory pigments rather than chlorophylls for photosynthesis, and the chromophyte alga Olithodiscus has chlorophylls a and c (but not b).

Create a PAUP* command file.

Instead of executing the file algae.nex directly in PAUP*, start PAUP* (cancel the dialog box that appears) and choose File > New from the main menu to create a blank text file. Copy the following text and paste it into the new text file window, then save this file as runalgae.nex, placing it in the same directory as algae.nex:
#nexus

begin paup;
  log file=lab3.txt start replace;
  set torder=right autoclose;
  execute algae.nex;
  outgroup Anacystis_nidulans;
  set criterion=distance;
  
  [!
  ******************************
  ** JC69 Model, ME Criterion **
  ******************************
  ]
  dset distance=jc objective=me;
  alltrees;
  describe 1;
  
  [other commands here]
  log stop;
end;

During the course of this lab, you can simply add commands to this paup block rather than creating a paup block in the data file itself. The set torder=right command simply causes trees to be displayed so that the outgroup is at the top and the tree appears to flow to the right (this is often called ladderizing right). Try changing this to set torder=left to see what ladderizing left looks like. The set autoclose command just makes the analysis run more quickly because PAUP* closes the progress dialog box automatically after a search is done rather than requiring you to press the "Close" button.

Perform an exhaustive search under the minimum evolution and least squares criteria

Use both the JC and logdet models combined with both Minimum Evolution and Least Squares (using the default power of 2) optimality criteria. Use dset dist=? to find out how to specify the model, and use dset objective=? to find out how to specify which of the two distance-based optimality criteria to use. Set up all 4 analyses (2 models times 2 optimality criteria) in the paup block (the first one has already been done for you), then run them all at once by executing the file. Note the printable comment (it starts with an exclamation point). Making comments like this in your paup block will allow you to easily find where the results from each model start in the output.

  • Which of the 4 analyses produced an estimated tree placing the chlorophyll a/b clade together?

The main point here is that the model used can make a difference in whether you get the correct answer. We will explore this dataset further using maximum likelihood in a later lab.

Perform a bootstrap analysis

Bootstrapping (introduced to phylogenetics by Joe Felsenstein in 1985[2]) is one of the most common methods used to assess which parts (i.e. edges) of an estimated tree are best supported and which are poorly supported by the data. In bootstrapping, many new datasets are created by sampling (with replacement) characters from the original dataset. A tree is estimated from each bootstrap replicate dataset, and at the end a consensus tree is produced that has in it all splits showing up in a majority of the bootstrap trees. The idea is that your original dataset can be thought of as one sampling of characters from a vast pool of characters, and producing other data sets by resampling comes as close as possible to collecting data from other genes similar to the one for which you did collect data. The majority-rule consensus tree does not include splits that were present in fewer than half of the trees estimated from bootstrap replicate data sets, but the bipartition table that is produced by PAUP* provides the relative frequency of these less common splits.

Perform a bootstrap analysis (500 replicates, heuristic search using least squares) under the F84 model. The commands for doing this are shown below. I suggest you copy these into a new paup block in your runalgae.nex file (be sure to disable your previous paup block by renaming it):
dset distance=f84 objective=lsfit power=2;
bootstrap nrep=500 search=heuristic bseed=5555;
  • Examine the bipartition table PAUP* produces and locate the bootstrap proportion for the bipartition separating the chlorophyll a/b organisms from Anacystis and Olithodiscus
  • What is the bootstrap proportion for the bipartition separating Olithodiscus and Euglena from all the other taxa?

Sometimes a split will just barely fail to make the cut (e.g. it appeared in 49% of the bootstrap trees), and so it is a good idea to check the bipartition table to make sure you don't fail to notice these.


References cited

  1. Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Molecular Biology and Evolution 11:605-612.
  2. Felsenstein, J. 1985. Confidence intervals on phylogenies: an approach using the bootstrap. Evolution 39:783-791.