Phylogenetics: Likelihood Lab
|EEB 349: Phylogenetics|
|The goal of this lab exercise is to show you how to conduct maximum likelihood analyses in PAUP* using several models|
Part A: Using PAUP* to check your answers for homework #4
Create a data file
Create a new file in PAUP* and enter the following text:
#nexus begin paup; set storebrlens; end; begin data; dimensions ntax=4 nchar=2; format datatype=dna; matrix taxon1 AA taxon2 AC taxon3 CG taxon4 TT ; end; begin trees; utree hw4 = (taxon1:0.1, taxon2:0.1, (taxon3:0.1, taxon4:0.1):0.1); end; begin paup; lset nst=1 basefreq=equal; lscores 1 / userbrlen sitelike; end;
Understanding the data file
The NEXUS file you just created has four blocks.
First paup block
The first block is a paup block that sets the storebrlens flag. This tells PAUP* to save branch lengths found in any trees. By default, PAUP* throws away branch lengths that it finds and estimates them. In this case, we are trying to get PAUP* to compute likelihoods for a tree in which all five branch lengths are set to the specific value 0.1, so it is important to get PAUP* to not discard branch lengths.
The second block is the data block. Data for two sites are provided, the first site being the one you used for homework #4. The second site is necessary because PAUP* will refuse to calculate the likelihood of a tree with data from only one site. We will simply ignore results for the second (dummy) site.
The third block is a trees block that defines the tree and branch lengths.
- 'Can you find where in the tree description the length of the central branch is defined?
The keyword utree can be used in PAUP* (but not necessarily other programs) to explicitly define an unrooted tree. The hw4 part is just an arbitrary name for this tree: you could use any name here.
Final paup block
The fourth (paup) block comprises an lset command that specifies the likelihood settings. The nst option specifies the number of substitution parameters, which is 1 for the JC model, and basefreq=equal specifies that base frequencies are assumed to be equal. Together, nst=1 and basefreq=equal specify the JC model because the only other model with one substitution parameter is the F81 model (which has unequal base frequencies).
The command lscores 1 tells PAUP* to compute likelihood scores for the first tree in memory (which is the one we entered in this file). The keyword userbrlen tells PAUP* to use the branch lengths in the tree description (i.e. don't estimate branch lengths), and the sitelike keyword tells PAUP* to output the individual site likelihoods (the default behavior is to just output the overall likelihood).
Ok, go ahead and execute the file in PAUP* and see if your hand calculation in homework #4 was correct.