Difference between revisions of "Phylogenetics: Likelihood Lab"

From EEBedia
Jump to: navigation, search
 
Line 40: Line 40:
  
 
=== Understanding the data file ===
 
=== Understanding the data file ===
 +
The NEXUS file you just created has four blocks. The first block is a paup block that sets the <tt>storebrlens</tt> flag. This tells PAUP* to save branch lengths found in any trees. By default, PAUP* throws away branch lengths that it finds and estimates them. In this case, we are trying to get PAUP* to compute likelihoods for a tree in which all five branch lengths are set to the specific value 0.1, so it is important to get PAUP* to not discard branch lengths.
  
 +
The second block is the data block. Data for two sites are provided, the first site being the one you used for homework #4. The second site is necessary because PAUP* will refuse to calculate the likelihood of a tree with data from only one site. We will simply ignore results for the second (dummy) site.
 +
 +
The third block is a trees block that defines the tree and branch lengths.
 +
* '''Can you find where in the tree description the length of the central branch is defined?''
 +
The keyword <tt>utree</tt> can be used in PAUP* (but not necessarily other programs) to explicitly define an ''unrooted'' tree. The <tt>hw4</tt> part is just an arbitrary name for this tree: you could use any name here.
 +
 +
The fourth (paup) block comprises an <tt>lset</tt> command that specifies the likelihood settings. The <tt>nst</tt> option specifies the number of substitution parameters, which is 1 for the JC model, and <tt>basefreq=equal</tt> specifies that base frequencies are assumed to be equal. Together, <tt>nst=1</tt> and <tt>basefreq=equal</tt> specify the JC model because the only other model with one substitution parameter is the F81 model (which has unequal base frequencies).
 +
 +
The command <tt>lscores 1</tt> tells PAUP* to compute likelihood scores for the first tree in memory (which is the one we entered in this file). The keyword <tt>userbrlen</tt> tells PAUP* to use the branch lengths in the tree description (i.e. don't estimate branch lengths), and the <tt>sitelike</tt> keyword tells PAUP* to output the individual site likelihoods (the default behavior is to just output the overall likelihood).
 +
 +
Ok, go ahead and execute the file in PAUP* and see if your hand calculation in homework #4 was correct.
  
 
== Part B: ==
 
== Part B: ==

Revision as of 02:46, 7 February 2007

Template:Under Construction
Adiantum.png EEB 349: Phylogenetics
The goal of this lab exercise is to show you how to conduct maximum likelihood analyses in PAUP* using several models

Part A: Using PAUP* to check your answers for homework #4

Create a data file

Create a new file in PAUP* and enter the following text:

#nexus

begin paup;
  set storebrlens;
end; 

begin data;
  dimensions ntax=4 nchar=2;
  format datatype=dna;
  matrix
    taxon1 AA
    taxon2 AC
    taxon3 CG
    taxon4 TT
    ;
end;

begin trees;
  utree hw4 = (taxon1:0.1, taxon2:0.1, (taxon3:0.1, taxon4:0.1):0.1);
end;

begin paup;
  lset nst=1 basefreq=equal;
  lscores 1 / userbrlen sitelike;
end;

Understanding the data file

The NEXUS file you just created has four blocks. The first block is a paup block that sets the storebrlens flag. This tells PAUP* to save branch lengths found in any trees. By default, PAUP* throws away branch lengths that it finds and estimates them. In this case, we are trying to get PAUP* to compute likelihoods for a tree in which all five branch lengths are set to the specific value 0.1, so it is important to get PAUP* to not discard branch lengths.

The second block is the data block. Data for two sites are provided, the first site being the one you used for homework #4. The second site is necessary because PAUP* will refuse to calculate the likelihood of a tree with data from only one site. We will simply ignore results for the second (dummy) site.

The third block is a trees block that defines the tree and branch lengths.

  • 'Can you find where in the tree description the length of the central branch is defined?

The keyword utree can be used in PAUP* (but not necessarily other programs) to explicitly define an unrooted tree. The hw4 part is just an arbitrary name for this tree: you could use any name here.

The fourth (paup) block comprises an lset command that specifies the likelihood settings. The nst option specifies the number of substitution parameters, which is 1 for the JC model, and basefreq=equal specifies that base frequencies are assumed to be equal. Together, nst=1 and basefreq=equal specify the JC model because the only other model with one substitution parameter is the F81 model (which has unequal base frequencies).

The command lscores 1 tells PAUP* to compute likelihood scores for the first tree in memory (which is the one we entered in this file). The keyword userbrlen tells PAUP* to use the branch lengths in the tree description (i.e. don't estimate branch lengths), and the sitelike keyword tells PAUP* to output the individual site likelihoods (the default behavior is to just output the overall likelihood).

Ok, go ahead and execute the file in PAUP* and see if your hand calculation in homework #4 was correct.

Part B:

References cited