Lab 1: Introduction to PAUP* and the NEXUS data file format
Contents
Introduction to PAUP* 4.0
PAUP* 4.0 is the successor to PAUP 3.1, which was published in 1993 by David L. Swofford,
currently at the
School of Computational Science & Information Technology,
Florida State University.
The name PAUP means
Phyogenetic
Analysis
Using
Parsimony
because parsimony was the only optimality criterion employed at the time. The
asterisk in the name PAUP* means
and other methods. PAUP* is one
of the most comprehensive phylogenetic analysis computer programs available, and we
will spend much of the first half of the semester learning how to use this program.
PAUP* Home Page
The
PAUP* Home Page is the
best place to go for up-to-date information about program availability, known
problems/workarounds, and help in the form of a FAQ and electronic forum. As of
this writing, PAUP* is being sold by
Sinaur Associates
(price varies according to platform). While it is not a free program, you really
do get a lot for your money compared to most other commercial software, as the
next section is designed to illustrate.
What can PAUP* do?
PAUP* is capable of performing most of the types of phylogenetic analyses you have already performed using other programs
(e.g., Puzzle), as well as many more. The following listing is not exhaustive, but
is designed to give you an idea of what PAUP* can currently do:
- Algorithmic searching: Exhaustive, Branch-and-bound, Stepwise Addition, Neighbor-joining, Puzzling, UPGMA, Star Decomposition
- Heuristic searching: Nearest-neighbor Interchange (NNI), Subtree Pruning/Regrafting (SPR), Tree Bisection/Reconnection (TBR)
- Optimality criteria: parsimony, likelihood, minimum-evolution, least-squares
- Parsimony variants: Camin-Sokal, Wagner, Fitch, transversion, generalized (=weighted)
- Substitution models: JC69, F81, K80, F84, HKY85, GTR, logdet/paralinear
- Descriptive statistics: base frequencies, pairwise sequence comparisons
- Manipulating data scope: include/exclude characters, delete/restore taxa, partitions (characters and taxa)
- Statistical tests: KH test,
homogeneity partition test, permutation tests, base frequency
homogeneity test, likelihood ratio test of molecular clock
- Nodal support measures: jackknife, bootstrap
- Consensus methods: strict/semistrict/majority-rule/Adams consensus trees, agreement subtrees
- Trees: generation of random trees, tree-to-tree distances
- Other: Lake's invariants, plots of gamma distribution, likelihood surface check, ancestral state reconstruction, printing of trees
What can PAUP* not do?
Despite its completeness, there are a few things that PAUP* cannot do for you at the present time:
- PAUP* does not allow tree editing (like
MacClade or
TreeView)
- PAUP* is not able to do maximum likelihood analyses on amino acid sequences
- PAUP* does not provide codon models that allow you to take into account the codon structure of
protein coding genes when analyzing nucleotide sequences (use PAML or HyPhy for this)
- PAUP* does not perform Bayesian analyses (we will use MrBayes later on for this)
- PAUP* (like almost all other phylogenetic analysis programs) assumes your sequences are already aligned;
it will not align them for you, nor will it help you find sequences in GenBank or other databases.
Typographical conventions
In this and subsequent web pages, I will try to stick to the following typographical conventions:
- New terms will look like this
- Text that I want to emphasize will look like this
- Command names or portions of commands that you might type into a program such
as PAUP* will look like this
- Keywords used in Nexus files will look like this
- Names of files will look like this
PAUP* tips
PAUP* is not finished at this point. For the most part, this is not a problem since you can
purchase and use it just like a finished product. The primary drawback of PAUP*'s unfinished status
is that there is currently not a complete manual for the program.
On the
PAUP* Download Page you can find a
PDF command summary and "Quick Start" tutorial;
however, much of the explanatory portion of the manual is not present in any form. There are easy
ways to obtain information from the program
itself, however. Some of the tips listed below are concerned with getting the program to tell you
what commands and command options are available.
Here are some tips to keep in mind while you use PAUP*. This list is not comprehensive; these are
just some things that are not immediately apparent but which make your life easier once you know
about them.
- A command line can be made visible on the Mac version, and is always apparent on all
other versions of PAUP*. This may not sound like a tip, but having a command line allows you to
explore many of the other tips described below.
- The help command provides a list of available commands. Often you can spot the command
you need by looking at this list. Once you see a command name that looks promising, you can get a description of how to
invoke the command like this:
help hsearch;
- The ? option works for all commands and provides a list of the options for that command
as well as the current default settings for those options. This is extremely useful! For example, this command would list
all the current likelihood settings:
lset ?;
- All PAUP* menu commands have command line equivalents. While the command line is not as fun or
easy to use as the menu system, there are benefits to using the command line interface. For example,
you can put all the commands for an analysis in the data file itself (see section on
PAUP blocks, below), allowing you to have a
complete record of what you did (often very useful when a reviewer asks
you to be more specific about how you performed your analysis!). PAUP
blocks are also useful for making
sure certain settings are always invoked when you execute the data
file.
- PAUP* uses the Nexus data file format. This is a fairly complex file format used by
several programs that perform phylogenetic analyses (PAUP*, MacClade, TreeView, and Component, for example).
It is described in more detail below, so here I will only point out that PAUP* can put your
data in Nexus format automatically if your data are in one of several recognized formats
already. This is done with the tonexus command as follows:
tonexus fromfile=mydata.txt format=text tofile=mydata.nex;
tonexus fromfile=mydata.msf format=gcg tofile=mydata.nex;
tonexus fromfile=mydata.dat format=phylip tofile=mydata.nex;
The first of these commands converts a data file (mydata.txt) in plain text format (each sequence on a separate
line, with the name first followed by the sequence after one or more blank spaces) to Nexus format, storing
it in a file named mydata.nex. The second line converts mydata.msf (GCG MSF format)
into Nexus format, again storing the resulting file as mydata.nex. The third line converts
a PHYLIP formatted data file (mydata.dat) to Nexus format.
Use the command tonexus ? to list other options, including other formats that can be converted.
- PAUP* allows you to easily include and exclude sites, making it possible to leave primer sites, introns,
and dubiously aligned regions in the data file even though you do not wish to include them in analyses. You can also
include or exclude entire classes of sites using the keywords all,
gapped, missambig, constant,
and uninf. For example,
exclude gapped;
would exclude all sites containing a gap for at least one taxon (sequence).
If you needed to exclude only 3rd. codon position sites, even this is easy:
assuming that the first nucleotide site in each sequence corresponds to a 1st
codon position, this command would exclude all the 3rd. position sites (the dot
stands for the last nucleotide site in the sequence):
exclude 1-. \ 3;
This is how you include again all the sites you have excluded:
include all;
- There are parallel commands for deleting and restoring taxa.
The term taxon (plural taxa) refers to whatever forms the terminal nodes
of your phylogeny (the taxa will be individual sequences in most cases for purposes of this course).
Don't be confused by the command names delete and
restore: these act just like exclude and
include except they act on taxa and not characters (=sites). If the first,
second and fourth taxa were named Thermus,
Sulfolobus and Pyrococcus,
you could tell PAUP* to ignore them in subsequent analyses using either of the two commands below:
delete 1 2 4;
delete Thermus Sulfolobus Pyrococcus;
This is how you would reinstate the taxa deleted above:
restore all;
- For long runs, PAUP* reports progress once per minute in the form of a line written to the output
buffer. To have PAUP* report once every two minutes, specify 120 seconds instead of the default of 60 seconds:
set dstatus=120;
- By default, PAUP* does not save the output that is generated to a file. The output is stored
in what is known as an output buffer. When this buffer becomes full, the
first part will begin to be overwritten by newer output. Thus, one of the first things you should do
when starting any serious analysis is to start a log file, using either
the menu command or a command similar to this:
log file=myoutput.txt start replace
The keyword start means to start logging output, whereas stop means to stop logging output,
and the replace keyword causes the file to be overwritten without warning if it exists. Using append instead
of replace is safer: in this case, new output is added at the end of the file and none of the data already
in the file is lost. If you do not use either replace or append, then PAUP* will ask you what you want to
do (ok if you are sitting there watching it, but not so good if you have started a run and walked away
from the computer for a long time!).
- PAUP* almost always produces unrooted trees, however, the trees
look rooted when PAUP* draws them! You can reroot the tree by specifying an outgroup taxon (or taxa);
however, this does not change the fact that the analysis did not estimate or determine the root
(you did this, either implicitly or explicitly). Here's how to tell PAUP* to always draw trees
with Giardia as the outgroup:
outgroup Giardia;
The Nexus Data File Format
Nexus blocks
PAUP* uses a data file format known as
Nexus. This file format is now shared among several
programs. Nexus data files always begin with the characters
#nexus but are otherwise
organized into major units known as
blocks. Some blocks are recognized by most of the programs
using the Nexus file format, whereas other blocks are
private blocks (recognized by only
one program). A Nexus block has the following basic structure:
#nexus
...
begin characters;
...
end;
Note that the elipsis (...) is never used in a Nexus data file; it is used here simply to indicate that some text has been omitted. The name of the Nexus block used as an example above is
characters. Because Nexus data
files are organized in named blocks, PAUP* and other programs are able to read blocks whose names they
recognize and ignore blocks that are not recognized. This allows many different programs to use the same
overall format without crashing when they encounter data they cannot interpret.
Nexus commands
Blocks are in turn organized into semicolon-terminated
commands.
It is very important that you remember to terminate all commands
with a semicolon. This is especially hard to
remember for very long commands. PAUP* is pretty good about pointing out forgotten semicolons,
but sometimes it doesn't realize you've left something out until some distance downstream,
which can make the problem point difficult to find. Some common commands will be provided
below in the description of the common blocks.
Nexus comments
Comments can be placed in a Nexus file using square brackets. Comments can be placed anywhere,
and they are used for many purposes. For example,
you can effectively remove some of your
data by commenting it out. You can also annotate your sequences using comments. For example,
a comment like that below is useful for locating specific sites in your alignment:
[----+--10|----+--20|----+--30|----+--40|----+--50|----]
Ephedra TTAAGCCATGCATGTCTAAGTATGAACTAATT-CAAACGGTGAAACTGCGGATG
Gnetum TTAAGCCATGCATGTCTATGTACGAACTAATC-AGAACGGTGAAACTGCGGATG
Welwitschia TTAAGCCATGCACGTGTAAGTATGAACTAGTC-GAAACGGTGAAACTGCGGATG
If you would like your comment printed out in the output when PAUP* executes the data file,
just insert an exclamation point (!) as the first character inside the opening left square
bracket:
[!This is the data file used for my dissertation]
Commonly-used Nexus blocks
Here is a list of common Nexus blocks and the most-common commands within
these blocks. For a complete description of the Nexus file format, take a look at this
paper:
Taxa block
The purpose of a Taxa block is to provide names for your taxa (i.e., sequences).
You may not use a Taxa block very often, since you can also supply names for your
taxa directly in the Data block (see below). Here is an example of a Taxa block.
#nexus
...
begin taxa;
dimensions ntax=5;
taxlabels
Giardia
Thermus
Deinococcus
Sulfolobus
Haobacterium
;
end;
Note that there are four
commands in this example of a Taxa block.
Can you find the terminating semicolon for each of them?
- the begin command giving the block's name
- the dimensions command giving the number of taxa
- the taxlabels command providing the actual taxon labels
- the end command, telling PAUP* that there are no more commands
to process for this block
Data block
The
Data block is the workhorse of Nexus blocks. This is where you place the
actual sequence data, and, as mentioned above, this can also be where you define
the names of your sequences. Here is an example of a
Data block:
#nexus
...
begin data;
dimensions ntax=5 nchar=54;
format datatype=dna missing=? gap=-;
matrix
Ephedra TTAAGCCATGCATGTCTAAGTATGAACTAATTCCAAACGGTGAAACTGCGGATG
Gnetum TTAAGCCATGCATGTCTATGTACGAACTAATC-AGAACGGTGAAACTGCGGATG
Welwitschia TTAAGCCATGCACGTGTAAGTATGAACTAGTC-GAAACGGTGAAACTGCGGATG
Ginkgo TTAAGCCATGCATGTGTAAGTATGAACTCTTTACAGACTGTGAAACTGCGAATG
Pinus TTAAGCCATGCATGTCTAAGTATGAACTAATTGCAGACTGTGAAACTGCGGATG
[----+--10|----+--20|----+--30|----+--40|----+--50|----]
;
end;
Some things to note in this example are:
- The dimensions command comes first in a Data block, and specifies
the number of sequences (taxa; ntax) and number of sites (characters;
nchar).
- The format command tells PAUP* what kind of data follow
(dna, rna, protein,
or standard), and provides the
symbols used for missing data (?) and gaps (-).
- The matrix command dominates the Data block,
providing the sequences themselves (as well as the taxon names). Note the semicolon terminating the
matrix command!!!
- You can use upper or lower case symbols for nucleotides
- You can place whitespace anywhere except inside a taxon name or keyword (e.g.,
data type = dna would cause problems because datatype
should not have embedded whitespace).
- If you simply must have a space in one of your taxon names, either use an underscore
character in place of the space (e.g., Ginkgo_biloba) or surround the taxon
name in single quotes (e.g., 'Ginkgo biloba'). In either case, PAUP* will
output the space in its output.
- One item missing from the format command in the example above but which is quite useful
is something known as an equate list. The following format statement will cause
all occurrences of T to be changed to C and all occurrences
of G to be changed to A as the data are being read into PAUP*:
format datatype=dna missing=? gap=- equate="T=C G=A";
This is like telling PAUP* to do a search-and-replace operation on the sequences
before reading them in, except that your original file remains intact. Be careful when
using equate, because the replacement is case sensitive
(i.e., equate="t=c g=a"
would have had no effect if all the nucleotides are represented by upper case letters!).
- PAUP* recognizes all the standard ambiguity codes (e.g., R for purine,
Y for pyrimidine, N for undetermined, etc.).
Trees block
A Trees block has the following structure:
#nexus
...
begin trees;
translate
1 Ephedra,
2 Gnetum,
3 Welwitschia,
4 Ginkgo,
5 Pinus
;
tree one = [&U] (1,2,(3,(4,5));
tree two = [&U] (1,3,(5,(2,4));
end;
Some things to note in this example are:
- The translate command provides short alternatives to the taxon names,
making the tree descriptions shorter (takes up fewer bytes of disk space).
- the translate command is not necessary however; it is ok to use
the taxon names directly in the tree descriptions
- the tree command denotes the start of a tree description, which
consists of a tree name (e.g., one and two are used here),
followed by an equals sign and then the tree topology in the standard, parenthetical
notation (often referred to as the Newick or
New Hampshire format).
- The special comments consisting of an ampersand symbol followed by the letter U
tell PAUP* to interpret the tree as being an unrooted tree.
- Files containing only the #nexus plus a trees
block are called tree files
Sets block
The only commands you need to know at this point from a
sets block are the
charset and the
taxset commands.
#nexus
...
begin sets;
charset trnL_intron = 562-4226;
taxset gnetales = Ephedra Gnetum Welwitschia;
end;
This sets block defines both a set of characters (in this
case the sites composing the trnL intron) and a set of taxa (consisting of the three
genera in the seed plant order Gnetales: Ephedra, Gnetum and Welwitschia).
We could have used the taxon numbers for the taxset definition
(e.g., taxset gnetales = 1-3;) but using the actual names is
clearer and less prone to error (just think of what might happen if you
decided to reorder your sequences!). These definitions may be used in other blocks. A
common use is in commands placed inside a paup block (see below) or
typed directly at the PAUP* command prompt.
Assumptions block
There is only one command I will introduce from the
assumptions
block (although there are a number of others that exist). The
exset
command (the word
exset stands for
exclusion set) is useful for
creating a set of characters that are automatically excluded whenever the data file is executed. Given the following block:
#nexus
...
begin assumptions;
exset* badsites = 1 5 47-.;
end;
PAUP* would automatically exclude characters (i.e., sites) 1, 5, and 47 through the end of the sequence.
It is the asterisk after the newterm exset that denotes this as the default
exclusion set. If you left out the asterisk, PAUP* would define the exclusion set but would not
automatically exclude these sites as the data file was being executed.
Paup blocks provide a way to give PAUP* commands from within a data file itself. Any command you can
type at the command prompt or perform using menu commands you can place in the data file. This allows
you to specify an entire analysis right in the data file. For any serious analysis, I always
run PAUP* using a paup block. That way I know exactly what I did for a given analysis
several days or weeks in the future. Paup blocks are also a handy way to perform certain
commands every time the data file is executed. For example, you can set up your favorite
likelihood substitution model, delete certain taxa or exclude certain sites from a paup block
located just after your data block. Here is an example of a typical paup block:
#nexus
...
begin paup;
log file=myoutput.txt start stop;
outgroup Ephedra;
set criterion=likelihood;
lset nst=2 basefreq=empir rates=equal tratio=estim variant=hky;
hsearch swap=tbr addseq=random nreps=100 start=stepwise;
describe 1 / plot=phylogram;
savetrees file=mytrees.tre brlens;
log stop;
quit;
end;
Here is what each line does (but don't worry too much about this since we will be talking
much more about individual commands later in lab):
- The log command starts a log file (the file will be called
myoutput.txt and will be overwritten if it already exists)
- The outgroup command specifies that the resulting trees should be
rooted between Ephedra and everything else (this just affects the appearance of the tree when drawn)
- The set command changes the optimality criterion from the default (parsimony)
to maximum likelihood
- The lset command sets up PAUP* so that the HKY85 model will be used
(number of substitution rates is 2, empirical base frequencies, rates are homogeneous across
sites, estimate the transition/transversion ratio, and use the HKY model rather than the
other, similar F84 model)
- The hsearch command causes PAUP* to conduct 100 heuristic searches (each
beginning from a different, random starting tree); each search will start with a stepwise
addition tree using random addition of taxa, and this starting tree will be rearranged using
the tree bisection/reconnection branch swapping method
- The describe command produces a depiction of the tree (rooted at the
specified outgroup) on the output (and in the log file, since we opened a log file earlier);
the tree will be shown as a phylogram, which means branch lengths will appear proportional
to the average number of nucleotide substitutions per site that were inferred for that branch.
- The savetrees command saves the best tree found during the search (this is
quite important and easy to forget to do!). The brlens keyword tells PAUP to save branch length information along with the tree topology.
- The log command stops the logging of output to the file
myoutput.txt
- The quit command causes PAUP* to quit running; if you left out this command,
PAUP* would remain running at this point, allowing you to issue other commands
Note that because PAUP* ignores blocks whose names it does not recognize, you can easily "comment out"
a paup block by simply adding a character to its name. For example, adding an underscore
#nexus
...
begin _paup;
.
.
.
end;
is enough to cause PAUP* to completely ignore this paup block. This is handy because it allows
you to create multiple paup blocks for different purposes and turn them off and on whenever
you need them.
You can also "comment out" a portion of a paup block using the
leave command. For example, in this paup block, PAUP*
will be set up for doing a likelihood analysis but will not
actually conduct the search; the leave command causes PAUP* to exit the block
early:
#nexus
...
begin paup;
log file=myoutput.txt start stop;
outgroup Ephedra;
set criterion=likelihood;
lset nst=2 basefreq=empir rates=equal tratio=estim variant=hky;
leave;
hsearch swap=tbr addseq=random nreps=100 start=stepwise;
describe 1 / plot=phylogram;
savetrees file=mytrees.tre brlens;
log stop;
quit;
end;
Today's lab exercise
First, a note about characters blocks versus
data blocks: the characters
block is essentially a new and improved version of the data
block. Feel free to use either one, but be aware
that programs such as PAUP* may eventually stop using the data
block since the characters block accomplishes the same thing
and has features missing in the data block. To convert
a data block to a characters block,
just change the block name and add the keyword newtaxa
to the dimensions command just before the keyword
ntax. This tells PAUP* that you will be
defining the names of your taxa in the characters
block itself (rather than in a preceding taxa block).
Questions that should be answered (or excercises that you should do on your own)
appear in this style.
There is no need to turn in your answers to these
exercises. It is up to you to make sure you are comfortable with this material. Please ask questions if anything is unclear. While it is possible to do these exercises outside of the scheduled lab time, working through them in lab is better because we are here to help with questions that arise.
- First create a folder with a name that is unique (i.e. base it on your name). Everyone is using the same account, so it is important to do everything within your own folder so that you do not interfere with others!
- Copy the angio35.txt file from the data folder into your own newly-created folder. (If you are not in the computer lab, you can download the file by right-clicking here and using your browser's menu option)
- Start PAUP* but be careful to not execute the angio35.txt file
(it is not yet in Nexus format). Do open the file in edit mode (using
and clicking on the
radio button before selecting the name of the file to open)
and note that it comprises 35 DNA sequences. These are rbcL gene sequences from various green plants. The important thing to notice is that the format is quite simple: each line consists of a taxon name followed by at least one blank space, which is followed by the sequence for that taxon. Note that the blank space is important: taxon names cannot contain embedded spaces, because spaces are used to separate taxon names from the corresponding sequences.
- Now type in the following command:
tonexus from=angio35.txt to=angio35.nex datatype=nucleotide format=text;
After the conversion, the file angio35.nex should be present.
Open this Nexus file in edit mode to see what PAUP* did to convert the original file to Nexus format. Do not execute the file just yet because there are some additions we need to make before it is ready for analyzing.
- Create an assumptions block containing a default exclusion set that excludes the following sites automatically whenever the data file is executed.
This should be added to the bottom of the newly-created Nexus file (i.e., after the data). This may be most easily done using PAUP*'s built-in editor, although you may use any editor you choose (just remember to save the file as plain text).
begin assumptions;
exset * unused = 1-41 234-241 246 506-511 555 681-689 1393-1399 1797-1855 1856-1884 4754-4811;
end;
These numbers represent nucleotide sites that either are missing a lot of data or are difficult to align. The name I gave to this exclusion set is unused, but you could name it anything you like. The asterisk tells PAUP* that you want this exset applied (i.e. you want these sites excluded) every time the file is executed.
- Create a sets block comprising the following three charset commands:
- The first charset should be named 18S and include sites 1 through 1855
- The second charset should be named rbcL and include sites 1856 through 3283
- The third charset should be named atpB and include sites 3284 through 4811
This block should be placed after the assumptions block. Look at the description above of the sets block and try to do this part on your own.
- Now execute the data file. Use from the main menu to execute your new angio35.nex file. If your assumptions block is correct, the output should include a statement saying that 219 characters have been excluded. If you set up your sets block correctly you should be able to enter this command:
exclude all;
include rbcL;
and get no errors. In addition, PAUP* should tell you that 4592 characters were excluded (as a result of the exclude all command) and 1428 were re-included (as a result of the include rbcL command). For the rest of the exercise, we will be working with the data from all 3 genes, so re-include the 18S and atpB data:
include 18S atpB;
PAUP* should now say that there are a total of 4811 included characters.
- The first item of business in starting an analysis in PAUP* is to begin logging the
output to a file. The following command will begin saving all output to the file
output.txt. Note that we have chosen to automatically
replace the file if it already exists. If you are nervous about this (and would rather
have PAUP* ask before overwriting an existing file), either leave off the replace keyword
or substitute append, which tells PAUP* to simply add new output to the end of the file if
it already exists.
log file=output.txt start replace;
- Type set ? to get a listing of the general settings. PAUP* has
four "settings" commands: set for general settings;
pset for settings specifically related to parsimony;
lset for settings specifically related to likelihood; and
dset for settings specifically related to distance methods.
From the output of the set command, can you determine which optimality
criterion PAUP* would use if we were to do a search at this point?
- To perform a parsimony search, first try the alltrees command. This command asks PAUP* to calculate the optimality criterion for every possible tree
alltrees;
Did PAUP* allow you to perform an exhaustive search for 35 taxa?
- Now try heuristic searching. This approach does not attempt to look at all possible trees, but instead only examines trees that are in the realm of possibility (which can still be a lot of trees!):
hsearch;
The search progress will be displayed in a dialog box. When the button says Close rather than Stop, take a look at the numbers summarizing this search. What is the parsimony score of the best tree found during the search? (Write down this score somewhere for later reference.) How many trees were examined (look at # Rearrangements tried)?
- Now you probably want to take a look at the tree that PAUP* found and is now
holding in memory. First, however, choose an outgroup taxon so that the (unrooted) tree will
be drawn in a way that looks like it is rooted in a reasonable place, say between the gymnosperms (first 7 taxa) and angiosperms (remaining taxa):
outgroup 1-7;
showtree;
To make the tree appear to flow downward, which is more pleasing to the eye, tell PAUP* that you would like to use the tree order "right"
(this is also commonly known as "ladderizing right"):
set torder=right;
showtree;
Before doing anything else, we should save this tree in a file so that it will
be available later, perhaps for viewing or printing in TreeView. Let's call the
treefile pars.tre. The brlens keyword in the
command below tells PAUP* that you want to save the branch lengths as well as
just the tree topology (almost always a good option to include):
savetrees file=pars.tre brlens;
- You may have noticed that PAUP* found 5 most-parsimonious trees. These 5 trees are all indistinguishable using the parsimony criterion. Let's now use the likelihood criterion to evaluate these 5 trees:
set criterion=likelihood;
lscores all;
These commands ask PAUP* to simply evaluate the likelihood score of the trees in memory. Note that because we arrived at these trees using parsimony, it is quite possible that none of these trees represents the maximum likelihood tree. That is, we may be able to find better trees under the likelihood criterion if we performed a search using the likelihood criterion. What is the likelihood score of the best tree? (As for parsimony, write this number down for later comparison.) Is the likelihood score the same for all 5 trees? Which tree is best? Important: PAUP* reports the negative of the natural logarithm of the likelihood score. This means that smaller numbers are better, as smaller numbers represent higher likelihoods.
- Next, we will obtain a neighbor-joining tree. Neighbor-joining (NJ for short) is one of the algorithmic methods: that is, it uses an optimality criterion (the minimum evolution criterion) at each step of the algorithm, but in the end produces a tree without actually examining many trees:
nj;
- Let's see how the NJ tree compares to the tree found by parsimony. First, use the lscores command to compute the log-likelihood of the NJ tree:
lscores all;
Now compute the parsimony score of the NJ tree using the pscores command:
pscores all;
According to the parsimony criterion, is the NJ tree better than any of the trees found by parsimony? According to the likelihood criterion, is the NJ tree better than the best tree you have found thus far? Why is it not possible to say definitively whether the NJ tree is better (according to the likelihood criterion) than the maximum likelihood tree?
- You may have noticed that PAUP* does not let you copy text from the output window. It will, however, make a copy of the text currently displayed in the output window and put this in an editor window. Chose from the main menu. You can now cut/copy/paste text from this window to other applications.
- That's all for today. The only thing left to do is to close the
log file you opened and quit PAUP*:
log stop;
quit;