Codon models

From Phycas
Jump to: navigation, search

This file compares the way codon models are implemented in four programs: MrBayes 3.1, HyPhy 0.99 and CodeML (PAML 3.15) and Phycas (not yet released). In each case, we attempted to create a codon model with the following properties:

  • uses Universal Genetic Code
  • codon frequencies equal
  • a parameter kappa equal to the transition/transversion rate ratio
  • a parameter omega equal to the nonsynonymous/synonymous rate ratio

All programs were tested on the dataset codons.nex.

Being Bayesian programs, neither MrBayes nor Phycas obtain maximum likelihood estimates, so we plugged in the MLEs from CodeML and simply calculated the likelihood using these two programs. For HyPhy and CodeML, edge lengths and substitution model parameters were estimated by each program.

Quantity HyPhy CodeML MrBayes Phycas
log-likelihood -3446.044865 -3446.044858 -3446.044932 -3446.04486
Chara edge length 0.260434 0.781392 0.260464 0.781392
Asplenium edge length 0.197596 0.592819 0.197606 0.592819
Iris edge length 0.0530524 0.159117 0.053039 0.159117
Nicotiana edge length 0.0746053 0.223831 0.0746103 0.223831
central edge length 0.0688381 0.206502 0.068834 0.206502
kappa 0.541274 1.84707 1.84707 1.84707
omega 0.0366313 0.03662 0.03662 0.03662


Assumes edge lengths are expected substitutions per nucleotide site. Note that kappa is defined to be the transversion/transition rate ratio, not the transition/transversion rate ratio used by the other programs. Also, we found that we needed to export the likelihood function (from the GUI) and then correct the codon frequencies in order to get the same likelihood other programs reported. Before this bug was fixed, the exported likelihood function showed each of the 61 codon frequencies to be 0.00000440567896412 rather than 1/61 (= 0.016393).


Assumes edge lengths are expected substitutions per "codon" site.


Assumes edge lengths are expected substitutions per "nucleotide" site. The MRBAYES block in the codons.nex data was used, and what is recorded in the table is the likelihood of the user tree.


Assumes edge lengths are expected substitutions per codon site.