Difference between revisions of "Phylogenetics: r8s Lab"

From EEBedia
Jump to: navigation, search
 
Line 4: Line 4:
 
|<span style="font-size: x-large">[http://hydrodictyon.eeb.uconn.edu/eebedia/index.php/Phylogenetics:_Syllabus EEB 349: Phylogenetics]</span>
 
|<span style="font-size: x-large">[http://hydrodictyon.eeb.uconn.edu/eebedia/index.php/Phylogenetics:_Syllabus EEB 349: Phylogenetics]</span>
 
|-
 
|-
|The goals of this final lab are to learn how to: (1) compile a program from source code; (2) set up a NEXUS file for the r8s program; (3) interpret the most important parts of the output of r8s; (4) view a tree with divergence times produced by r8s in TreeView.
+
|The goals of this final lab are to learn how to: (1) set up a NEXUS file for the [http://loco.biosci.arizona.edu/r8s/index.html r8s] program; (2) interpret the most important parts of the output of r8s; (4) view a tree with divergence times produced by r8s in TreeView.
 
|}
 
|}
  
 
== Using r8s to estimate divergence times ==
 
== Using r8s to estimate divergence times ==
  
There is no version of [http://loco.biosci.arizona.edu/r8s/index.html r8s] for the Windows operating system, so we will use the version installed on the Bioinformatics cluster. The file (r8s1.7.dist.tar.Z) that is distributed on [http://loco.biosci.arizona.edu/ Mike Sanderson's web site] is not something you can run straight away. Because it is becoming common for programs to be distributed this way, especially now that the modern MacIntosh operating system OSX is a variant of Unix, I thought I would use this opportunity to show you how to deal with this kind of software distribution format. I will assign each of you a temporary account on alleyn, just as we did Feb. 23 when we used PAUP* under Linux. I have placed the r8s1.7.dist.tar.Z file in your home directory. This simulates the situation where you have just downloaded the file from Sanderson's web site to your Linux machine, and you now need to get it running. The first task is to get connected to the computer in my lab (alleyn).
+
There is no version of [http://loco.biosci.arizona.edu/r8s/index.html r8s] for the Windows operating system, so we will use the version installed on the Bioinformatics cluster.  
  
Start the program PuTTY
+
=== Start the program PuTTY and connect to bbcxsrv1.biotech.uconn.edu ===
There should be a shortcut to PuTTY on your desktop.
+
There should be a shortcut to PuTTY on your desktop. Using PuTTY, connect to <tt>bbcxsrv1.biotech.uconn.edu</tt>.
Connect to alleyn
+
In the PuTTY Configuration dialog box that pops up, set the Host Name to alleyn.eeb.uconn.edu. Leave the port set to 22 and the protocol set to SSH, then press the Open button to connect.
+
Enter your username (this will be one of the 20 temporary accounts tmp1 through tmp20)
+
A terminal window with a black background should appear, with the words login as: at the top. This is where you type in the username that I gave you.
+
Enter your password
+
Assuming you typed in tmp1, PuTTY will now respond with tmp1@alleyn.eeb.uconn.edu's password: This is where you type in the usual EEB 349 lab password. Note that Linux is generally case sensitive (unlike the Windows operating system), so be sure to type in the password exactly.
+
Recognizing the Linux prompt
+
You should now see alleyn [/home/tmp1] 1% This is called the Linux prompt. This means that the Linux operating system is waiting for you to give it something to do.
+
Using the ls command to see what files are present
+
The first thing you might be tempted to do is find out what files are in your home directory, /home/tmp1. The Linux command for listing filenames is simply ls (try it). You should see only one file listed: r8s1.7.dist.tar.Z
+
Uncompressing and unpacking the "tar.Z" file (often affectionately referred to as a tarball)
+
The suffix ".Z" on the file name r8s1.7.dist.tar.Z tells you that this file is compressed. To uncompress it use the uncompress command:
+
 
+
alleyn [/home/tmp1] 1% uncompress r8s1.7.dist.tar.Z
+
 
+
Now if you use the ls command, you will find that the only file present is r8s1.7.dist.tar. The ".Z" ending has been removed (because the file is no longer compressed). The ending ".tar" means the file is a "tape archive" (an obsolete term that harks back to the days when this program was used to create big files containing entire directories that could be moved to a magnetic tape backup). A tar file consists of a number of files concatenated (placed end on end). To unpack this file, use the tar command:
+
 
+
alleyn [/home/tmp1] 2% tar xvf r8s1.7.dist.tar
+
 
+
The x means "xtract", the v means "verbose" and the f means "filename is the next thing on the command line after a space". After running this, you should have a directory named dist in addition to the r8s1.7.dist.tar file. Unlike running uncompress, running tar does not change the file name.
+
Exploring the dist directory
+
Use the cd command and then the ls command to descend down into the dist directory and see what is there:
+
 
+
alleyn [/home/tmp1] 3% cd dist
+
alleyn [/home/tmp1/dist] 4% ls
+
bin doc sample src
+
 
+
The four words you see are all directories. The bin directory holds an executable file, but we will ignore this directory because the executable file only works on MacIntosh computers running Mac OSX. We will need to create our own executable file by compiling the source code in the src directory.
+
Compiling the source code to create an executable (i.e. runnable) program
+
Descend into the src directory using the cd command once more:
+
 
+
alleyn [/home/tmp1/dist] 5% cd src
+
 
+
Now being the compile process by typing make:
+
 
+
alleyn [/home/tmp1/dist/src] 6% make
+
 
+
Usually authors of software distributed in this way will provide a project file named Makefile. This is an ordinary text file that contains instructions for building the software from the source code. All you need to do is type make to start the build process. The command make looks by default for a file named Makefile, and if it finds it, begins the build process using the instructions therein.
+
 
+
The build process itself will produce a lot of horribly complicated looking output, but as long the last couple of lines don't mention errors, then everything worked. r8s is actually written using a combination of Fortran and C, which are two computer programming languages. The build process uses programs called compilers, which translate (compile) the Fortran or C source code (which looks vaguely like English) into a sequence of 0s and 1s that the computer recognizes as a program that it can execute.
+
Copying the executable file to the sample directory
+
The r8s software comes with some example data files. One of these sample files is called SAMPLE_SIMPLE, so let's use that one. First, we need to copy the compiled r8s program from the src directory, where it was just built, to the sample directory where the data file lives:
+
 
+
alleyn [/home/tmp1/dist/src] 7% cp r8s ../sample
+
 
+
This command copies the file r8s from the current directory (/dist/src) to its sister directory (/dist/sample). Now use cd to navigate over to the sample directory, then use ls to see what's there:
+
  
 
  alleyn [/home/tmp1/dist/src] 8% cd ../sample
 
  alleyn [/home/tmp1/dist/src] 8% cd ../sample

Revision as of 01:15, 13 April 2007

Adiantum.png EEB 349: Phylogenetics
The goals of this final lab are to learn how to: (1) set up a NEXUS file for the r8s program; (2) interpret the most important parts of the output of r8s; (4) view a tree with divergence times produced by r8s in TreeView.

Using r8s to estimate divergence times

There is no version of r8s for the Windows operating system, so we will use the version installed on the Bioinformatics cluster.

Start the program PuTTY and connect to bbcxsrv1.biotech.uconn.edu

There should be a shortcut to PuTTY on your desktop. Using PuTTY, connect to bbcxsrv1.biotech.uconn.edu.

alleyn [/home/tmp1/dist/src] 8% cd ../sample
alleyn [/home/tmp1/dist/sample] 9% ls
r8s SAMPLE_CONSTRAINTS SAMPLE_FLU SAMPLE_SIMPLE
SAMPLE_1.7 SAMPLE_CROSSVAL SAMPLE_LOCAL_CLOCK SAMPLE_SUPERTREE

Inspecting the SAMPLE_SIMPLE file Use the more command to see the first few lines of the SAMPLE_SIMPLE file. Note that r8s uses the NEXUS format for its input files:

alleyn [/home/tmp1/dist/sample] 10% more SAMPLE_SIMPLE
#NEXUS

[
** A sample data set illustrating use of divergence time estimators in r8s.
]

begin trees;

[
** The following branch lengths were obtained from PAUP using maximum likelihood
]

tree PAUP_9 = [&U] 
 (Marchantia:0.033817,(Lycopodium:0.040281,((Equisetum:0.048533,(Osmunda:0.033640,Asplenium:
0.036526):0.000425):0.011806,((((Cycas:0.009460,Zamia:0.018847):0.005021,Ginkgo:0.014702):1.687e-86,((Pinus:0.
   021500,(Podocarpac:0.015649,Taxus:0.021081):0.006473):0.002448,(Ephedra:0.029965,(Welwitsch:0.011298,Gnetum:0.
   014165):0.006883):0.016663):0.006309):0.010855,((Nymphaea:0.016835,(((((Saururus:0.019902,Chloranth:0.020151):
   1.687e-86,((Araceae:0.020003,(Palmae:0.006005,Oryza:0.031555):0.002933):0.007654,Acorus:0.038488):0.007844):1.
   777e-83,(Calycanth:0.013524,Lauraceae:0.035902):0.004656):1.687e-86,((Magnolia:0.015119,Drimys:0.010172):0.005
   117,(Ranunculus:0.029027,((Nelumbo:0.006180,Platanus:0.002347):0.003958,(Buxaceae:0.013294,((Pisum:0.035675,(F
   agus:0.009848,Carya:0.008236):0.001459):0.001994,(Ericaceae:0.019136,Solanaceae:0.041396):0.002619):1.687e-86)
   :0.004803):1.687e-86):0.006457):0.002918):0.007348,Austrobail:0.019265):1.687e-86):1.687e-86,Amborella:0.01926
   3):0.003527):0.021625):0.012469):0.019372);
End;

[** Beginning of the rates block containing commands for r8s **]

begin rates;

[* The next line is REQUIRED.]

blformat nsites=952 lengths=persite;

--More-- (43%)

You can see more of the file by pressing the spacebar, and you can quit viewing the file by pressing the letter q (for quit). Dissecting the data file See if you can answer these questions just by looking at the SAMPLE_SIMPLE file:

Which NEXUS blocks are present? (Hint: you should find 2 NEXUS blocks in this file)

Why does r8s need to know the number of sites? (as in the "blformat nsites =952 lengths=persite" command)

Below is the tree specified in the SAMPLE_SIMPLE file: