Difference between revisions of "Phylogenetics: Modeltest Lab"

From EEBedia
Jump to: navigation, search
(Plotting a gamma distribution)
(Drawing samples from a Gamma distribution)
Line 63: Line 63:
 
* In cell A1, type  
 
* In cell A1, type  
 
  =rand()
 
  =rand()
* Replicate this cell down to A100. You now have a column containing 100 uniform(0,1) deviates. A uniform(0,1) deviate is a sample chosen at random from a Uniform distribution between 0.0 and 1.0.
+
* Replicate this cell down to A1000. You now have a column containing 1000 uniform(0,1) deviates. A uniform(0,1) deviate is a sample chosen at random from a Uniform distribution between 0.0 and 1.0.
 
* In cell B1, type  
 
* In cell B1, type  
 
  =gammainv(A1, 10, 0.5)
 
  =gammainv(A1, 10, 0.5)
* Replicate this cell down to B100. You now have (in column B) 100 random gamma(10,0.5) random deviates.
+
* Replicate this cell down to B1000. You now have (in column B) 1000 random gamma(10,0.5) random deviates.
* Find the sample mean of the 100 deviates in column B by typing the following in cell D1
+
* Find the sample mean of the 1000 deviates in column B by typing the following in cell D1
  =average(B1:B100)
+
  =average(B1:B1000)
* Find the sample standard deviation of the 100 deviates by tying the following into cell D2
+
* Find the sample standard deviation of the 1000 deviates by tying the following into cell D2
  =stdev(B1:B100)
+
  =stdev(B1:B1000)
 
* Find the sample variance by squaring the sample standard deviation. That is, in cell D3, type
 
* Find the sample variance by squaring the sample standard deviation. That is, in cell D3, type
 
  =D2^2
 
  =D2^2

Revision as of 00:52, 12 March 2007

Under construction.png This article is still under construction.
Expect it to change frequently until this notice is removed.
Adiantum.png EEB 349: Phylogenetics
The goal of this lab exercise is to (a) introduce you to Modeltest, which uses PAUP* to perform likelihood ratio tests and organizes the results to make it easier to choose a substitution model, and (b) gain experience with probability distributions by plotting several gamma distributions in Excel and computing areas under portions of these distributions.

Part A: Modeltest

Part B: Gamma distributions in Excel

Plotting a gamma distribution

Create a plot of a Gamma(10, 0.5) distribution in Excel by following these instructions:

  • Open Microsoft Excel
  • In cell A5, enter the word shape, and in the next cell to the right (B5), enter the numerical value 10.0
  • In cell A6, enter the word scale, and in the next cell to the right (B6), enter the numerical value 0.5
  • In cell A7, enter the word mean, and in the next cell to the right (B7), enter the cell formula =B5*B6 (the mean of a gamma distribution is the product of the shape and scale parameters)
  • In cell A8, enter the word variance, and in the next cell to the right (B8), enter the cell formula =B5*B6^2 (the variance of a gamma distribution is the shape times the scale parameter squared)
  • In cell A10, enter 0.0
  • In cell A11, enter =A10+0.1
  • Now select cell A11, then drag downward using the black square "handle" in the lower right corner. Drag down until you get to cell A220. Cell A220 should now have the value 21.
  • Now move back to the top (Ctrl-Home key), entering the following in cell B10:
=gammadist(A10, $B$5, $B$6, false)
  • Select cell B10, hover your mouse over the black handle until the cursor changes to a solid black cross, then double-click to automatically copy that cell's formula all the way down to B220
  • Select cells B10:B220, then choose Insert > Chart... from the main menu to bring up the Chart Wizard
  • Choose Chart type Line, click on the upper-left chart sub-type (plain line), then click the Next button
  • Click the Series tab, then click inside the box labeled Category (X) axis labels. Move over to the worksheet again, and select all cells in the range A10:A220 (the fast way is to select just A10, then press Shift-End followed by Shift-Downarrow). When the range A10:A220 is surrounded by a dotted line, click the Finish button in the Chart Wizard
  • Excel will create the chart close to the bottom of the selected cells. Grab the chart and move it to the top of the worksheet to make it easier to see the effect of modifying shape and scale parameters.

You have just plotted a Gamma probability density function having shape parameter 10 and scale parameter 0.5. You should try several other combinations of shape and scale parameters to see what these gamma distributions look like. Answer these questions before moving on:

  • What values of the shape parameter cause cell B10 to display #NUM! (indicating that the curve shoots off to infinity at values approaching zero)?
  • What values of the shape parameter cause the curve to peak, dropping to zero on the left and right?

Exponential distributions

An Exponential distribution is a special case of the gamma distribution that occurs when the shape parameter is exactly 1.0. Set the shape parameter to 1.0 and play with various values (e.g. 2, 3, 4, 5, etc.) of the scale parameter. Can you determine a pattern by looking at where the curve hits the y-axis when x = 0? Note that you should ignore the value in cell B10 because Excel does not calculate this value correctly for some reason; instead, just sight along the curve as it approaches x = 0.0. You should find that the value of the exponential density function at zero is a function of its mean. What is this relationship?

Gamma distributions and rate heterogeneity

We have encountered gamma distributions in the course previously. They are used to allow rate heterogeneity in substitution models. You may remember that a gamma distribution having a small shape parameter (e.g. 0.1) implies a lot of rate heterogeneity, whereas a large shape parameter (e.g. 100) implies almost no rate heterogeneity. In the rate heterogeneity application, the scale parameter is always equal to the inverse of the shape parameter (which is why no one ever mentions the gamma scale parameter in discussions of rate heterogeneity). Try plotting a gamma distribution with shape = 100 (and scale = 0.001). Now plot a gamma distribution in which shape = 0.1 (and scale = 10).

  • Why is scale = 1/shape in this application? Hint: it has to do with ensuring the the mean of the relative rates is 1.0.

Finding areas under a Gamma density curve

Use column C of your worksheet to find the cumulative area under the gamma density from 0.0 up to each value in the A column:

  • In cell C10, enter this formula:
=gammadist(A10, $B$5, $B$6, true)
  • Replicate the formula down to C220, then create a line plot as you did for the B column

The formula for obtaining the cumulative area under the density is nearly the same as that for obtaining the density itself: the difference lies in whether the fourth argument to the gammadist fuction is true (cumulative distribution) or false (density function).

Set the shape and scale of your gamma distribution back to 10 and 0.5, respectively, then answer these questions (write the numbers you find down for later comparison):

  • What is the value now in cell C220? Does this value make sense?
  • As the x value increases, the area under the density accumulates. At what x value does the cumulative area first exceed 0.95?
  • What is the median of this distribution? Hint: find the x value at which the cumulative area is as close as possible to 0.5.
  • What interval captures the middle 95% of this distribution? To answer this, find the x values corresponding to 2.5% and 95%. It is unlikely that you will be able to find the interval precisely, but see how close you can get with the given spacing of x values.

Drawing samples from a Gamma distribution

In a separate worksheet, try implementing these instructions:

  • In cell A1, type
=rand()
  • Replicate this cell down to A1000. You now have a column containing 1000 uniform(0,1) deviates. A uniform(0,1) deviate is a sample chosen at random from a Uniform distribution between 0.0 and 1.0.
  • In cell B1, type
=gammainv(A1, 10, 0.5)
  • Replicate this cell down to B1000. You now have (in column B) 1000 random gamma(10,0.5) random deviates.
  • Find the sample mean of the 1000 deviates in column B by typing the following in cell D1
=average(B1:B1000)
  • Find the sample standard deviation of the 1000 deviates by tying the following into cell D2
=stdev(B1:B1000)
  • Find the sample variance by squaring the sample standard deviation. That is, in cell D3, type
=D2^2

Press F9 a few time to recalculate. Watch the mean and variance fluctuate. Recall that for a gamma distribution with shape = 10 and scale = 0.5, the mean is 5 (shape*scale) and the variance is 2.5 (shape*scale^2).

  • Do they fluctuate around the theoretical mean and variance?