Difference between revisions of "Phylogenetics: Simulating sequence data"

From EEBedia
Jump to: navigation, search
(Getting Started)
(Getting Started)
Line 18: Line 18:
 
== Getting Started ==
 
== Getting Started ==
  
We will be using cutting-edge features in PAUP* -- so cutting edge that you will not be able to find any information about these features anywhere online or by using the <tt>help</tt> command in PAUP*! So don't get confused when you try to look up some of the components of the NEXUS file you will be using. There are some familiar blocks and commands in the NEXUS file though. Feel free to look at past labs or use the <tt>help</tt> command to refresh your memory. Create an empty text file and add the following lines to it and save it as a .nex file:
+
We will be using cutting-edge features in PAUP* -- so cutting edge that you will not be able to find any information about these features anywhere online or by using the <tt>help</tt> command in PAUP*! So don't get confused when you try to look up some of the components of the NEXUS file you will be using. There are some familiar blocks and commands in the NEXUS file though. Feel free to look at past labs or use the <tt>help</tt> command to refresh your memory.  
 
+
  
 +
Create an empty text file and add the following lines to it and save it as a .nex file:
 
  #nexus
 
  #nexus
 
  [This example demonstrates the dreaded Felsenstein Zone]
 
  [This example demonstrates the dreaded Felsenstein Zone]

Revision as of 19:01, 21 February 2018

Adiantum.png EEB 5349: Phylogenetics

by Kevin Keegan

Goals

To get you acquainted with simulating DNA sequence data in PAUP*, which can be useful in testing how models and algorithms perform.

Introduction

The development of models and algorithms of any kind requires testing to see how they perform. All models and algorithms make assumptions: they take the infinite complexity of nature and distill them into few components that the maker of the model/algorithm assumes are important. With models of DNA evolution and phylogenetic inference algorithms, one important way of testing the capability of a model/algorithm is by simulating DNA sequence data based on a known phylogeny, and seeing how the model/algorithm performs. If the model/algorithm allows for the recovery of the known or "true" phylogeny then we can rest assured that our model/algorithm is relatively accurate in its distillation of the complexity of the processes it attempts to capture.

Getting Started

We will be using cutting-edge features in PAUP* -- so cutting edge that you will not be able to find any information about these features anywhere online or by using the help command in PAUP*! So don't get confused when you try to look up some of the components of the NEXUS file you will be using. There are some familiar blocks and commands in the NEXUS file though. Feel free to look at past labs or use the help command to refresh your memory.

Create an empty text file and add the following lines to it and save it as a .nex file:

#nexus
[This example demonstrates the dreaded Felsenstein Zone]
begin paup;
    cd *;
    set storebrlens nostatus autoclose=yes warntree=no notifybeep=no;
end;
begin taxa;
    dimensions ntax=4;
    taxlabels A B C D;
end;
begin trees;
    tree 1 = [&R] ((A:1.0,B:0.1):0.1,(C:0.1,D:1.0):0.1);
end;
begin dnasim;
    simdata nchar=(10 100 1000 10000);
    lset model=jc nst=1 basefreq=eq;
    sitemodels jc:1;
    truetree source=memory treenum=1 showtruetree=brlens;
    beginsim nreps=100 seed=0 monitor=y resultsfile=(name=sim4results.txt replace output=means);
        [parsimony]
            set criterion=parsimony;
            alltrees;
            tally parsimony;
        [likelihood under JC]
            set criterion=likelihood;
            lset basefreq=equal nst=1;
            alltrees;
            tally 'ML-JC';
    endsim;
    set monitor=y;
 end;