Difference between revisions of "Phylogenetics: NEXUS Format"

From EEBedia
Jump to: navigation, search
Line 17: Line 17:
 
Now type in the following (PAUP) command:
 
Now type in the following (PAUP) command:
 
  tonexus from=angio35.txt to=angio35.nex datatype=nucleotide format=text;
 
  tonexus from=angio35.txt to=angio35.nex datatype=nucleotide format=text;
After the conversion, the file <tt>angio35.nex</tt> should be present. Open this Nexus file in the pico editor to see what PAUP* did to convert the original file to Nexus format.
+
After the conversion, the file <tt>angio35.nex</tt> should be present. Type <tt>quit</tt> to quit PAUP*, then open this Nexus file in the pico editor to see what PAUP* did to convert the original file to Nexus format. (The most important thing PAUP* did was to count the number of nucleotides and set <tt>nchar</tt> for you.)
 +
 
 +
Create an assumptions block containing a default exclusion set that excludes the following sites automatically whenever the data file is executed. This should be added to the bottom of the newly-created Nexus file (i.e., after the data). You can use the pico editor for this.
 +
begin assumptions;
 +
  exset * unused = 1-41 234-241 246 506-511 555 681-689 1393-1399 1797-1855 1856-1884 4754-4811;
 +
end;
 +
These numbers represent nucleotide sites that are either missing a lot of data or are difficult to align. The name I gave to this exclusion set is <tt>unused</tt>, but you could name it anything you like. The asterisk tells PAUP* that you want this exset to be applied automatically every time the file is executed.

Revision as of 18:09, 21 January 2009

Adiantum.png EEB 349: Phylogenetics
The goal of this lab exercise is to show you how to easily create a NEXUS-formatted data file from a set of sequences. The NEXUS format is widely used in phylogenetics, and its basic features are described in the second part of this tutorial.

Using PAUP to create a NEXUS data file

First, download the file angio35.txt to your hard drive and then upload it to the cluster (instructions in Phylogenetics: Bioinformatics Cluster).

Now login to the cluster (bbcxsrv1.biotech.uconn.edu) and type paup to start the PAUP* program.

Important! Ordinarily, you should not run PAUP* directly like this. Only use this method for extremely short-lived activities. To run an analysis on the cluster, you should use the Sun Grid Engine's qsub program to submit a job. Using qsub starts your run on one of the computing nodes (whichever is free at the moment), while typing paup starts PAUP* on the head node, which is the node that everyone logs into to start runs. Bogging down the head node with a long PAUP* run is the quickest way to lose your cluster privileges! That said, what we are doing today will not be computationally demanding and thus should not attract the attention of Jeff Lary (if it does, I will take the blame).

Now type in the following (PAUP) command:

tonexus from=angio35.txt to=angio35.nex datatype=nucleotide format=text;

After the conversion, the file angio35.nex should be present. Type quit to quit PAUP*, then open this Nexus file in the pico editor to see what PAUP* did to convert the original file to Nexus format. (The most important thing PAUP* did was to count the number of nucleotides and set nchar for you.)

Create an assumptions block containing a default exclusion set that excludes the following sites automatically whenever the data file is executed. This should be added to the bottom of the newly-created Nexus file (i.e., after the data). You can use the pico editor for this.

begin assumptions;
  exset * unused = 1-41 234-241 246 506-511 555 681-689 1393-1399 1797-1855 1856-1884 4754-4811;
end;

These numbers represent nucleotide sites that are either missing a lot of data or are difficult to align. The name I gave to this exclusion set is unused, but you could name it anything you like. The asterisk tells PAUP* that you want this exset to be applied automatically every time the file is executed.