Difference between revisions of "Phylogenetics: Bioinformatics Cluster"

From EEBedia
Jump to: navigation, search
(Programs vs. protocols)
(cat command: creating and viewing files)
Line 86: Line 86:
  
 
Basically, cat just spews out the contents of whatever you give it to work with. If you give it a file name, it spews out the contents of the file. If you give it a hyphen, it reads text you type until you press Ctrl-d, then it spews that text out again. In your case, when you used the hyphen, you also told it to redirect its output to the file gopaup, so that's why you did not see what it spewed.
 
Basically, cat just spews out the contents of whatever you give it to work with. If you give it a file name, it spews out the contents of the file. If you give it a hyphen, it reads text you type until you press Ctrl-d, then it spews that text out again. In your case, when you used the hyphen, you also told it to redirect its output to the file gopaup, so that's why you did not see what it spewed.
 +
 +
== Using PSCP to upload a file ==
 +
 +
Locate the file <tt>algae.nex</tt> that we used in the previous lab. If you have deleted it, you will need to download and save it again.
  
 
== Part B: Starting a PAUP* run on the cluster ==
 
== Part B: Starting a PAUP* run on the cluster ==

Revision as of 03:17, 18 February 2007

Under construction.png This article is still under construction.
Expect it to change frequently until this notice is removed.
Adiantum.png EEB 349: Phylogenetics
The goal of this lab exercise is to show you how to use the Bioinformatics Facility computer cluster to run PAUP* and GARLI.

Part A: Using the UConn Bioinformatics Facility cluster

The Bioinformatics Facility is part of the UConn Biotechnology Center, which is located behind the Up-N-Atom Cafe in the lower level of the Biology/Physics building. Jeff Lary maintains a 17-node Apple Xserve G5 Cluster that can be used by UConn graduate students and faculty to conduct bioinformatics-related research (sequence analysis, biological database searches, phylogenetics, molecular evolution). You have each been given accounts on the cluster, and today you will learn how to start analyses remotely (i.e. from this computer lab), check on their status, and download the results when your analysis is finished.

Obtaining the necessary communications software

You will be using a couple of simple (and free) programs to communicate with the head node of the cluster. Visit the PuTTY web site, scroll down to the section labeled "Binaries" and save putty.exe and pscp.exe on your desktop.

PuTTY

The program PuTTY will allow you to communicate with the cluster using a protocol known as SSH (Secure Shell) that encrypts everything sent over the internet. You will use PuTTY to send commands to the cluster and see the output generated. In the old days, a protocol known as Telnet was used for this purpose, but it is no longer used because it did not encrypt anything, making it easy for someone with access to the network to snatch your password.

PSCP

The other program you will use is called PSCP. It allows you to transfer files back and forth using the SCP (Secure Copy) protocol. It replaces the old FTP protocol that, like Telnet, sends usernames and passwords unencrypted across the network.

Programs vs. protocols

SSH and SCP are protocols, not programs. PuTTY and PSCP are programs that implement the SSH and SCP protocols, respectively. In a little while from now, you may be thinking "I liked FTP much better than SCP!" because you are used to user-friendly, graphical FTP programs. There are much fancier programs for using SSH and SCP than PuTTY and PSCP, but these will serve us well today. The nice thing is that these programs are so small that you can just download them whenever and whereever you happen to need them. If you find yourself wanting a fancier SCP client, check out FileZilla (on Windows) or Fugu (for Macs).

Logging in for the first time

On the whiteboard you will find your login id (user name) and password.

Double-click the PuTTY icon on your desktop to start the program. In the Host Name (or IP address) box, type bbcxsrv1.biotech.uconn.edu. Now type Bioinformatics cluster into the Saved Sessions box and press the Save button. This will save having to type the computer's name each time you want to connect. Now click the Open button to start a session.

The first time you connect, you will get a PuTTY Security Alert. Just press the Yes button to close this dialog.

Now you should see the following prompt:

login as:

Type in your username and press Enter. Now you should see the password prompt:

Password:

Type in your password and press Enter. If all goes well, you should see something like this:

Welcome to Darwin!
[bbcxsrv1:~] plewis%

except that your username should appear instead of mine (plewis).

The first thing you should do is change your password. Type

passwd

and press the Enter key, then follow the directions to change your password. If you have trouble thinking up passwords that are acceptable, check out the Java Password Generator web site. It generates passwords that are not really words but sound like they are, so they are easier to remember than completely random passwords.

Learning enough UNIX to get around

I'm presuming that you do not know a lot of UNIX commands. If you are already a UNIX guru, you can skip to the next section. UNIX is the operating system upon which MacOSX is built. You are actually communicating with a MacIntosh G5 computer running MacOSX, but you will be using the command console rather than menus today.

ls command: finding out what is in the present working directory

The ls command lists the files in the present working directory. Try typing just

ls

If you need more details about files than you see here, type

ls -la

instead. This version provides information about file permissions, ownership, size, and last modification date.

pwd command: finding out what directory you are in

Typing

pwd

shows you the full path of the present working directory. The path shown should end with your username, indicating that you are currently in your home directory.

mkdir command: creating a new directory

Typing the following command will create a new directory named pauprun in your home directory:

mkdir pauprun

Use the ls command now to make sure a directory of that name was indeed created.

cd command: leaving the nest and returning home again

The cd command lets you change the present working directory. To move into the newly-created pauprun directory, type

cd pauprun

You can always go back to your home directory (no matter how lost you get!) by typing just cd by itself

cd

Use cd now to return to your home directory.

cat command: creating and viewing files

The cat command was designed for concatenating files, but I most often use it for viewing and creating files. To create a new file named gopaup, type the following (be sure to leave a space between each item, just as you see it below) and then press the Enter key

cat - > gopaup

Note that you no longer see the unix prompt, and the system appears to be hung. This is ok! The text you typed is admittedly somewhat cryptic:

  • The hyphen (-) after the word cat means "use text typed from the console"
  • The greater-than symbol (>) means "redirect the output to a file"
  • The gopaup part is the name of the file to which the output will be redirected

The cat command is now waiting for you to type something. Type the following and, when finished, press the Ctrl-d key combination to tell cat that you are done:

#$ -o junk.txt -j y
cd $HOME/pauprun
paup run.nex

You can now use the cat command again to view the contents of the file you just created:

cat gopaup

Basically, cat just spews out the contents of whatever you give it to work with. If you give it a file name, it spews out the contents of the file. If you give it a hyphen, it reads text you type until you press Ctrl-d, then it spews that text out again. In your case, when you used the hyphen, you also told it to redirect its output to the file gopaup, so that's why you did not see what it spewed.

Using PSCP to upload a file

Locate the file algae.nex that we used in the previous lab. If you have deleted it, you will need to download and save it again.

Part B: Starting a PAUP* run on the cluster

Part C: Starting a GARLI run on the cluster