Phylogenetics: Searching Lab
|EEB 5349: Phylogenetics|
|This lab explores different search strategies under the parsimony criterion|
- 1 Getting started
- 2 Create a command file
- 3 Delete all taxa except the first five
- 4 Exhaustive parsimony search
- 5 Determine NNI rearrangements
- 6 Perform an NNI search
- 7 Find the most parsimonious tree for all 35 taxa
- 8 Conduct a second search with SPR swapping
- 9 Now conduct a third search with TBR swapping
Log into your account on the cluster (ssh email@example.com). If you do not have the angio35.nex file, you can recreate it as follows:
curl http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/data/angio35.nex > angio35.nex
The word curl stands for "Copy URL" and the program curl fetches the contents of the specified web site (which in this case is not really a web site, it is just a data file). Without the trailing "> angio35.nex", you would see the contents of the angio35.nex file simply spill across your screen. That ">" symbol followed by a file name causes the output of curl to be captured and redirected to a file named angio35.nex.
Create a command file
Now start pico (do not specify a file name because you will be creating a new file) and enter the following text, saving the file as run.nex:
#nexus begin paup; log file=output.txt start replace; execute angio35.nex; end;
There are at least a couple of advantages to creating little NEXUS files like run.nex. For now, the only advantage is that executing run.nex automatically starts a log file so that you will have a record of what you did. Later, when you get in the habit of putting commands in paup blocks, you will appreciate the separation of the data from the commands that initiate analyses. I have many times opened a data file, forgetting about the embedded paup block that starts a long search, overwrites my previous log file, and otherwise creates havoc!
Note that because we used the replace keyword in the log command, the file output.txt will be overwritten without warning if it exists. Yes, this is called living dangerously but if you plan to run your analysis on the cluster, it is important to keep PAUP from asking you questions after you submit your run via qsub.
Delete all taxa except the first five
Using this command
will cause PAUP* to ignore all taxa except Ephedrasinica, Gnetum_gnemJS, WelwitschiaJS, Ginkgo_biloba, and Pinus_ellCH.
Exhaustive parsimony search
Use the alltrees command to conduct an exhaustive search under the parsimony criterion (parsimony is the default optimality criterion).
You should add the delete and alltrees file to the paup block in your run.nex file, which should now look like this:
#nexus begin paup; log file=output.txt start replace; execute angio35.nex; delete 6-.; alltrees; end;
Start paup, specifying run.nex as the file to execute:
This analysis should go fast because you now have only 5 taxa.
Determine NNI rearrangements
Because we performed an exhaustive enumeration, we now know which tree is the globally most parsimonious tree. We are thus guaranteed to never find a better tree were we to start an heuristic search with this tree. Let's do an experiment: perform an NNI heuristic search, starting with the best tree, and have PAUP* save all the trees it encounters in this search. In the end, PAUP* will have 5 trees in memory: the starting tree and the 4 trees corresponding to all possible NNI rearrangements of that starting tree.
Before you start the NNI search, use the showtree command to show you the tree obtained from the exhaustive enumeration. Draw this tree on a piece of paper and then draw the 4 possible NNI rearrangements (refer to the description of NNI in my lecture slides if you've forgotten).
Perform an NNI search
Add an hsearch and describe command to your run.nex file, which should afterwards look like this:
#nexus begin paup; log file=output.txt start replace; execute angio35.nex; delete 6-.; alltrees; hsearch start=1 swap=nni nbest=15; describe all; end;
The hsearch command is broken down as follows:
- start=1 starts the search from the tree currently in memory (i.e., the best tree resulting from your exhaustive search using the parsimony criterion)
- swap=nni causes the Nearest-Neighbor Interchange (NNI) method to be used for branch swapping
- nbest=15 saves the 15 best trees found during the search. Thus, were PAUP to examine every possible tree, we would end up saving all of them in memory. The reason this command is needed is that PAUP ordinarily does not save trees that are worse than the best one it has seen thus far. Here, we are interested in seeing the trees that are examined during the course of the search, even if they are not as good as the starting tree.
The describe all command plots the 5 trees currently in memory. The reason we are using the describe command rather than the showtrees command is because we want PAUP to show us the numbers it has assigned to the internal nodes, something that showtrees doesn't do.
Find the most parsimonious tree for all 35 taxa
Modify your run.nex file to conduct a heuristic search on all 35 taxa having the following characteristics:
- The starting trees are each generated by the stepwise addition method, using random addition of sequences (you will employ the addseq and start keywords for this)
- Swap using NNI branch swapping (you will employ the swap keyword for this)
- Reset the nbest option to all because we want to be saving just the best trees, not suboptimal trees (yes, this option is a little confusing).
- Set the random number seed to 5555 using the rseed option (this determines the sequence of pseudorandom numbers used for the random additions; ordinarily you would not need to set the random number seed, but we will do this here to ensure that we all get the same results)
- Do 500 replicate searches; each replicate represents an independent search starting from a different random-addition tree (you will use the nreps keyword for this).
Use the following command to get PAUP to list the options for hsearch:
Remember you can comment out portions of your Nexus file if you don't want to lose them: e.g.,
#nexus begin paup; log file=output.txt start replace; execute angio35.nex; [delete 6-.;] [alltrees;] hsearch [put your new options here]; [describe all;] end;
Conduct a second search with SPR swapping
Construct another heuristic search using SPR branch swapping. Be sure to reset the random number seed to 5555.
Now conduct a third search with TBR swapping
Wondering about this warning? Multiple hits on islands of unsaved trees may in fact represent different islands When PAUP encounters a new island, it will find all trees composing that particular island in the process of branch swapping. If, in a new search, it encounters any trees already stored in memory, it knows that it has hit an island that it found previously. Note that it would be pointless to continue on this tack, because it will only find all the trees on that island again. For trees retained in memory, PAUP can keep track of which island they belong to (remember that it is possible for trees with the same parsimony score to be in different tree islands!). But for trees that are not retained in memory, PAUP only knows that it has encountered an island of trees having score X; it has no way of finding out how many islands are actually represented amongst the trees having score X.