Difference between revisions of "Phylogenetics: Modeltest Lab"

From EEBedia
Jump to: navigation, search
Line 37: Line 37:
 
* ''What values of the shape parameter cause the curve to peak, dropping to zero on the left and right?''
 
* ''What values of the shape parameter cause the curve to peak, dropping to zero on the left and right?''
  
 +
==== Exponential distributions ====
 
An Exponential distribution is a special case of the gamma distribution that occurs when the shape parameter is exactly 1.0. Set the shape parameter to 1.0 and play with various values (e.g. 2, 3, 4, 5, etc.) of the scale parameter. Can you determine a pattern by looking at where the curve hits the y-axis when x = 0? Note that you should ignore the value in cell B10 because Excel does not calculate this value correctly for some reason; instead, just sight along the curve as it approaches  x = 0.0. You should find that the value of the exponential density function at zero is a function of its mean. What is this relationship?
 
An Exponential distribution is a special case of the gamma distribution that occurs when the shape parameter is exactly 1.0. Set the shape parameter to 1.0 and play with various values (e.g. 2, 3, 4, 5, etc.) of the scale parameter. Can you determine a pattern by looking at where the curve hits the y-axis when x = 0? Note that you should ignore the value in cell B10 because Excel does not calculate this value correctly for some reason; instead, just sight along the curve as it approaches  x = 0.0. You should find that the value of the exponential density function at zero is a function of its mean. What is this relationship?
  
 +
==== Gamma distributions and rate heterogeneity ====
 
We have encountered gamma distributions in the course previously. They are used to allow rate heterogeneity in substitution models. You may remember that a gamma distribution having a small shape parameter (e.g. 0.1) implies a lot of rate heterogeneity, whereas a large shape parameter (e.g. 100) implies almost no rate heterogeneity. In the rate heterogeneity application, the scale parameter is always equal to the inverse of the shape parameter (which is why no one ever mentions the gamma ''scale'' parameter in discussions of rate heterogeneity). Try plotting a gamma distribution with shape = 100 (and scale = 0.001). Now plot a gamma distribution in which shape = 0.1 (and scale = 10).  
 
We have encountered gamma distributions in the course previously. They are used to allow rate heterogeneity in substitution models. You may remember that a gamma distribution having a small shape parameter (e.g. 0.1) implies a lot of rate heterogeneity, whereas a large shape parameter (e.g. 100) implies almost no rate heterogeneity. In the rate heterogeneity application, the scale parameter is always equal to the inverse of the shape parameter (which is why no one ever mentions the gamma ''scale'' parameter in discussions of rate heterogeneity). Try plotting a gamma distribution with shape = 100 (and scale = 0.001). Now plot a gamma distribution in which shape = 0.1 (and scale = 10).  
  
 
* ''Why is scale = 1/shape in this application?'' Hint: it has to do with ensuring the the mean of the relative rates is 1.0.
 
* ''Why is scale = 1/shape in this application?'' Hint: it has to do with ensuring the the mean of the relative rates is 1.0.

Revision as of 20:42, 11 March 2007

Under construction.png This article is still under construction.
Expect it to change frequently until this notice is removed.
Adiantum.png EEB 349: Phylogenetics
The goal of this lab exercise is to (a) introduce you to Modeltest, which uses PAUP* to perform likelihood ratio tests and organizes the results to make it easier to choose a substitution model, and (b) gain experience with probability distributions by plotting several gamma distributions in Excel and computing areas under portions of these distributions.

Part A: Modeltest =

Part B: Gamma distributions in Excel

Plotting a gamma distribution

Create a plot of a Gamma(10, 0.5) distribution in Excel by following these instructions:

  • Open Microsoft Excel
  • In cell A5, enter the word shape, and in the next cell to the right (B5), enter the numerical value 10.0
  • In cell A6, enter the word scale, and in the next cell to the right (B6), enter the numerical value 0.5
  • In cell A7, enter the word mean, and in the next cell to the right (B7), enter the cell formula =B5*B6 (the mean of a gamma distribution is the product of the shape and scale parameters)
  • In cell A8, enter the word variance, and in the next cell to the right (B8), enter the cell formula =B5*B6^2 (the variance of a gamma distribution is the shape times the scale parameter squared)
  • In cell A10, enter 0.0
  • In cell A11, enter =A10+0.1
  • Now select cell A11, then drag downward using the black square "handle" in the lower right corner. Drag down until you get to cell A220. Cell A220 should now have the value 21.
  • Now move back to the top (Ctrl-Home key), entering the following in cell B10:
=gammadist(A10, $B$5, $B$6, false)
  • Select cell B10, hover your mouse over the black handle until the cursor changes to a solid black cross, then double-click to automatically copy that cell's formula all the way down to B220
  • Select cells B10:B220, then choose Insert > Chart... from the main menu to bring up the Chart Wizard
  • Choose Chart type Line, click on the upper-left chart sub-type (plain line), then click the Next button
  • Click the Series tab, then click inside the box labeled Category (X) axis labels. Move over to the worksheet again, and select all cells in the range A10:A220 (the fast way is to select just A10, then press Shift-End followed by Shift-Downarrow). When the range A10:A220 is surrounded by a dotted line, click the Finish button in the Chart Wizard
  • Excel will create the chart close to the bottom of the selected cells. Grab the chart and move it to the top of the worksheet to make it easier to see the effect of modifying shape and scale parameters.

You have just plotted a Gamma probability density function having shape parameter 10 and scale parameter 0.5. You should try several other combinations of shape and scale parameters to see what these gamma distributions look like. Answer these questions before moving on:

  • What values of the shape parameter cause cell B10 to display #NUM! (indicating that the curve shoots off to infinity at values approaching zero)?
  • What values of the shape parameter cause the curve to peak, dropping to zero on the left and right?

Exponential distributions

An Exponential distribution is a special case of the gamma distribution that occurs when the shape parameter is exactly 1.0. Set the shape parameter to 1.0 and play with various values (e.g. 2, 3, 4, 5, etc.) of the scale parameter. Can you determine a pattern by looking at where the curve hits the y-axis when x = 0? Note that you should ignore the value in cell B10 because Excel does not calculate this value correctly for some reason; instead, just sight along the curve as it approaches x = 0.0. You should find that the value of the exponential density function at zero is a function of its mean. What is this relationship?

Gamma distributions and rate heterogeneity

We have encountered gamma distributions in the course previously. They are used to allow rate heterogeneity in substitution models. You may remember that a gamma distribution having a small shape parameter (e.g. 0.1) implies a lot of rate heterogeneity, whereas a large shape parameter (e.g. 100) implies almost no rate heterogeneity. In the rate heterogeneity application, the scale parameter is always equal to the inverse of the shape parameter (which is why no one ever mentions the gamma scale parameter in discussions of rate heterogeneity). Try plotting a gamma distribution with shape = 100 (and scale = 0.001). Now plot a gamma distribution in which shape = 0.1 (and scale = 10).

  • Why is scale = 1/shape in this application? Hint: it has to do with ensuring the the mean of the relative rates is 1.0.