Phylogenetics: Likelihood Lab

From EEBedia
Revision as of 02:49, 7 February 2007 by PaulLewis (Talk | contribs)

Jump to: navigation, search
Template:Under Construction
Adiantum.png EEB 349: Phylogenetics
The goal of this lab exercise is to show you how to conduct maximum likelihood analyses in PAUP* using several models

Part A: Using PAUP* to check your answers for homework #4

Create a data file

Create a new file in PAUP* and enter the following text:


begin paup;
  set storebrlens;

begin data;
  dimensions ntax=4 nchar=2;
  format datatype=dna;
    taxon1 AA
    taxon2 AC
    taxon3 CG
    taxon4 TT

begin trees;
  utree hw4 = (taxon1:0.1, taxon2:0.1, (taxon3:0.1, taxon4:0.1):0.1);

begin paup;
  lset nst=1 basefreq=equal;
  lscores 1 / userbrlen sitelike;

Understanding the data file

The NEXUS file you just created has four blocks.

First paup block

The first block is a paup block that sets the storebrlens flag. This tells PAUP* to save branch lengths found in any trees. By default, PAUP* throws away branch lengths that it finds and estimates them. In this case, we are trying to get PAUP* to compute likelihoods for a tree in which all five branch lengths are set to the specific value 0.1, so it is important to get PAUP* to not discard branch lengths.

Data block

The second block is the data block. Data for two sites are provided, the first site being the one you used for homework #4. The second site is necessary because PAUP* will refuse to calculate the likelihood of a tree with data from only one site. We will simply ignore results for the second (dummy) site.

Trees block

The third block is a trees block that defines the tree and branch lengths.

  • 'Can you find where in the tree description the length of the central branch is defined?

The keyword utree can be used in PAUP* (but not necessarily other programs) to explicitly define an unrooted tree. The hw4 part is just an arbitrary name for this tree: you could use any name here.

Final paup block

The fourth (paup) block comprises an lset command that specifies the likelihood settings. The nst option specifies the number of substitution parameters, which is 1 for the JC model, and basefreq=equal specifies that base frequencies are assumed to be equal. Together, nst=1 and basefreq=equal specify the JC model because the only other model with one substitution parameter is the F81 model (which has unequal base frequencies).

The command lscores 1 tells PAUP* to compute likelihood scores for the first tree in memory (which is the one we entered in this file). The keyword userbrlen tells PAUP* to use the branch lengths in the tree description (i.e. don't estimate branch lengths), and the sitelike keyword tells PAUP* to output the individual site likelihoods (the default behavior is to just output the overall likelihood).

Ok, go ahead and execute the file in PAUP* and see if your hand calculation in homework #4 was correct.

Part B:

References cited