Phylogenetics: Simulating sequence data

From EEBedia
Revision as of 18:47, 21 February 2018 by Kevin Keegan (Talk | contribs) (Getting Started)

Jump to: navigation, search
Adiantum.png EEB 5349: Phylogenetics

by Kevin Keegan


To get you acquainted with simulating DNA sequence data in PAUP*, which can be useful in testing how models and algorithms perform.


The development of models and algorithms of any kind requires testing to see how they perform. All models and algorithms make assumptions: they take the infinite complexity of nature and distill them into few components that the maker of the model/algorithm assumes are important. With models of DNA evolution and phylogenetic inference algorithms, one important way of testing the capability of a model/algorithm is by simulating DNA sequence data based on a known phylogeny, and seeing how the model/algorithm performs. If the model/algorithm allows for the recovery of the known or "true" phylogeny then we can rest assured that our model/algorithm is relatively accurate in its distillation of the complexity of the processes it attempts to capture.

Getting Started

We will be using cutting-edge features in PAUP* -- so cutting edge that you will not be able to find any information about these features anywhere online or by using the help command in PAUP*! So don't get confused when you try to look up some of the components of the NEXUS file you will be using. Create an empty text file and add the following lines to it:

[This example demonstrates the dreaded Felsenstein Zone]
begin paup;
    cd *;
    set storebrlens nostatus autoclose=yes warntree=no notifybeep=no;
begin taxa;
    dimensions ntax=4;
    taxlabels A B C D;
begin trees;
    tree 1 = [&R] ((A:1.0,B:0.1):0.1,(C:0.1,D:1.0):0.1);
begin dnasim;
    simdata nchar=(10 100 1000 10000);
    lset model=jc nst=1 basefreq=eq;
    sitemodels jc:1;
    truetree source=memory treenum=1 showtruetree=brlens;
    beginsim nreps=100 seed=0 monitor=y resultsfile=(name=sim4results.txt replace output=means);
            set criterion=parsimony;
            tally parsimony;
        [likelihood under JC]
            set criterion=likelihood;
            lset basefreq=equal nst=1;
            tally 'ML-JC';
    set monitor=y;