Difference between revisions of "Phylogenetics: BayesTraits Lab"

From EEBedia
Jump to: navigation, search
(Do the tutorial)
(Part 2: Running BayesTraits on the cluster)
Line 50: Line 50:
  
 
== Part 2: Running BayesTraits on the cluster ==
 
== Part 2: Running BayesTraits on the cluster ==
 +
 +
We will now switch to using the cluster to run BayesTraits. BayesTraits is not installed on the cluster, so you will need to download and unpack it into your home directory in order to use it. Using PuTTY, connect to <tt>bbcxsrv1.biotech.uconn.edu</tt> to get a command prompt.
 +
 +
=== Downloading and unpacking BayesTraits on the cluster ===
 +
The full URL to the OS X PPC version of BayesTraits is
 +
http://www.evolution.reading.ac.uk/Files/BayesTraits-OSX-PPC-V1.0.tar.gz
 +
If you had a browser open, and you typed in this URL, your browser would save the file <tt>BayesTraits-OSX-PPC-V1.0.tar.gz</tt> on your hard drive. But you are not using a browser on the cluster, you are logged in using the Secure Shell client program PuTTY.
 +
 +
You could download the file to your PC, then upload it to the cluster using PSFTP, but let's instead use the curl command:
 +
  curl -o BayesTraits-OSX-PPC-V1.0.tar.gz http://www.evolution.reading.ac.uk/Files/BayesTraits-OSX-PPC-V1.0.tar.gz
 +
This tells curl to access the specified URL (curl will stand in for a web browser) and save the resulting file as <tt>BayesTraits-OSX-PPC-V1.0.tar.gz</tt>.
 +
 +
Once you have the file in your home directory (use the ls command to check), you will need to unpack it using the '''tar''' command:
 +
  tar zxvf BayesTraits-OSX-PPC-V1.0.tar.gz
 +
The file extension <tt>.tar.gz</tt> has a specific meaning: the .tar part means that the file is actually an archive (bundle) comprising several files saved one after the other. The <tt>.gz</tt> part means that the archive has been compressed using the <tt>gzip</tt> program. The <tt>tar</tt> command can both ungzip the archive (that's what the <tt>z</tt> in <tt>zxvf</tt> means) and separate the component files (the <tt>x</tt> in <tt>zxvf</tt> stands for extract). The <tt>v</tt> in <tt>zxvf</tt> means verbose (<tt>tar</tt> will show you what it's doing), and the <tt>f</tt> simply means that the name of the file to extract follows (i.e. <tt>BayesTraits-OSX-PPC-V1.0.tar.gz</tt>).
 +
 +
Once the tar command has completed, you should have a directory named BayesTraits. Use the cd command to move into that directory, then use the ls command to see what's there. You should see the same 5 files as before, except the ''executable'' (BayesTraits) does not have the <tt>.exe</tt> file name extension this time (that extension typically denotes Windows executables).
  
 
[[Category:EEB courses]]
 
[[Category:EEB courses]]
 
[[Category:Phylogenetics]]
 
[[Category:Phylogenetics]]

Revision as of 21:47, 15 April 2007

Under construction.png This article is still under construction.
Expect it to change frequently until this notice is removed.
Adiantum.png EEB 349: Phylogenetics
In this lab you will learn how to use the program BayesTraits, written by Andrew Meade and Mark Pagel. BayesTraits can perform several analyses related to evaluating evolutionary correlation in discrete morphological traits. This program is meant to replace the older programs Discrete and Multistate. You will learn not only how to use the program on the Windows-based PCs in the computer lab, but also how to download and use it on the cluster (the cluster is better for long runs).

We will use BayesTraits interactively for awhile on the PCs in the computer room (Part 1), then we will set up a non-interactive run on the cluster in Part 2 so that you know how to do this.

Part 1: Running BayesTraits under Windows

Download BayesTraits

BayesTraits has not been installed on the machines in this room, so you will need to download it yourself. Go to Mark Pagel's web site, click on the "Software" link, then click on the "Description and Downloads" link under "BayesTraits". Finally, click on the "BayesTraits - Windows" link to download a zip file containing the program itself and some sample tree and data files. Right-click the BayesTraits-PC-V1.0.zip file and choose "Extract to BayesTraits-PC-V1.0" to unpack it on your local hard drive. Navigate to the BayesTraits-PC-V1.0\BayesTraits folder and verify that it contains the BayesTraits.exe file, as well as the PPI.txt, PPI.trees, Primates.txt and Primates.trees example files. I will hereafter refer to this folder as simply the BayesTraits folder. Go back to Mark Pagel's web site and download the manual for BayesTraits. This is a PDF file and should open in your browser window.

Download the modified example files

You will be going through the tutorial presented in the manual for the program during this lab, but there are a couple of modifications we need to make to the example data files first:

Use Primates.first.tree instead of Primates.trees

The Primates.trees file that comes with BayesTraits contains 500 trees, which makes any analysis take a very long time. We'll avoid the long waits by using a version of this file that contains only the first tree. Download Primates.first.tree and save it in your BayesTraits folder. Whenever the tutorial refers to the file Primates.trees, use Primates.first.tree instead.

Obtain the missing MatingSystem.txt file

The first part of the tutorial in the manual will not work out of the box because it assumes you have the file MatingSystem.txt, which is not included in the distribution. It turns out that the missing MatingSystem.txt is just Primates.txt with the first of the two characters deleted. I've done the modification for you, so download the MatingSystem.txt file now and save it in your BayesTraits folder.

Do the tutorial

Work through the tutorial stating on p. 10 of the BayesTraits draft manual PDF file (but only after reading the Tutorial Notes section below). The heading of the section is "Using MultiState to estimate the model of evolution and ancestral states for a binary trait". Stop when you get to the "Functional Gene Links" section (p. 18 of the manual).

Tutorial Notes

Remember throughout the tutorial to use Primates.first.tree instead of Primates.trees! Note that your output will only correspond to that of tree number 1 in the sample output from the BayesTraits manual.

BayesTraits must be run from the command line, which means you must open a command window to run the program. Simply double-clicking BayesTraits.exe will cause it to run but not for long! The problem is that when you double-click the program, you have no way to tell it what tree and data file to use, so it simply quits immediately.

There are two ways to get a command window (or shell) in which you can run BayesTraits. The first (not preferred) method is to click on the Start button, and choose Run..., then type cmd in the dialog box that appears. Pressing the Enter key will get you a command window, but you will need to navigate (using the unfriendly cd command) to your BayesTraits directory before starting the tutorial.

The second, and preferred, approach is to create a Windows batch file. In your BayesTraits folder, create a file named run_matingsystem.bat that contains the following text:

BayesTraits Primates.first.tree MatingSystem.txt
pause

Double-clicking run_matingsystem.bat will open a command window and start the BayesTraits program, saving the trouble of having to type the name of the tree file and data file each time you start the program. The pause command means that the window will stay open after BayesTraits finishes.

This batch file will work for the example involving MatingSystem.txt. Later, however, the tutorial switches to using the data file Primates.txt. At this point, you might want to create a second batch file named run_primates.bat containing the following text:

BayesTraits Primates.first.tree Primates.txt
pause

One final note: the default number of MCMC iterations is 5,050,000. This will take some time to run. For our purposes, it is ok to reduce this number. For example, to tell BayesTraits to only run for 550,000 iterations, type in the following command before you type run:

it 550000

There is a listing of all commands recognized by BayesTraits at the end of the manual.

Part 2: Running BayesTraits on the cluster

We will now switch to using the cluster to run BayesTraits. BayesTraits is not installed on the cluster, so you will need to download and unpack it into your home directory in order to use it. Using PuTTY, connect to bbcxsrv1.biotech.uconn.edu to get a command prompt.

Downloading and unpacking BayesTraits on the cluster

The full URL to the OS X PPC version of BayesTraits is

http://www.evolution.reading.ac.uk/Files/BayesTraits-OSX-PPC-V1.0.tar.gz

If you had a browser open, and you typed in this URL, your browser would save the file BayesTraits-OSX-PPC-V1.0.tar.gz on your hard drive. But you are not using a browser on the cluster, you are logged in using the Secure Shell client program PuTTY.

You could download the file to your PC, then upload it to the cluster using PSFTP, but let's instead use the curl command:

 curl -o BayesTraits-OSX-PPC-V1.0.tar.gz http://www.evolution.reading.ac.uk/Files/BayesTraits-OSX-PPC-V1.0.tar.gz

This tells curl to access the specified URL (curl will stand in for a web browser) and save the resulting file as BayesTraits-OSX-PPC-V1.0.tar.gz.

Once you have the file in your home directory (use the ls command to check), you will need to unpack it using the tar command:

 tar zxvf BayesTraits-OSX-PPC-V1.0.tar.gz

The file extension .tar.gz has a specific meaning: the .tar part means that the file is actually an archive (bundle) comprising several files saved one after the other. The .gz part means that the archive has been compressed using the gzip program. The tar command can both ungzip the archive (that's what the z in zxvf means) and separate the component files (the x in zxvf stands for extract). The v in zxvf means verbose (tar will show you what it's doing), and the f simply means that the name of the file to extract follows (i.e. BayesTraits-OSX-PPC-V1.0.tar.gz).

Once the tar command has completed, you should have a directory named BayesTraits. Use the cd command to move into that directory, then use the ls command to see what's there. You should see the same 5 files as before, except the executable (BayesTraits) does not have the .exe file name extension this time (that extension typically denotes Windows executables).