Syst. Biol. 44(1):17--48, 1995

Performance of Phylogenetic Methods in Simulation

John P. Huelsenbeck1
Department of Zoology
University of Texas
Austin, Texas 78712, USA
1E-mail: johnh@phylo.zo.utexas.edu.

Abstract.---Computer simulations are useful because they can characterize the expected performance of phylogenetic methods under idealized conditions. However, simulation studies are also subject to several sources of bias that make the results of different simulation studies difficult to interpret and often contradictory. In this study, I examined the performance of 26 commonly used methods of phylogenetic inference for three statistical criteria: consistency, efficiency, and robustness. Methods examined included parsimony (general, weighted, and transversion), maximum likelihood (assuming Jukes--Cantor and Kimura models of DNA substitution), and UPGMA, minimum evolution, and weighted and unweighted least squares (with uncorrected, Jukes--Cantor, Kimura, modified Kimura, and gamma distances). The performance of methods was examined under three models of DNA substitution for four taxa. The branch lengths of the four-taxon trees were varied extensively in this simulation. The results indicate that most methods perform well (i.e., estimate the correct tree x95% of the time) over a large portion of the four-taxon parameter space. In general, maximum likelihood performed best, followed by the additive distance methods and the parsimony methods. Lake's method of invariants and UPGMA are, respectively, inefficient and extremely sensitive to branch-length inequalities. In general, differential weighting of character-state transformations increases the performance of methods when the weighting can be applied appropriately. Although methods differ in their consistency, efficiency, and robustness, additional criteria---mainly falsifiability---are extremely important considerations when choosing a method of phylogenetic inference. [Lake's invariants; maximum likelihood; minimum evolution; neighbor joining; parsimony; phylogeny estimation; simulation; weighted least squares; unweighted least squares; UPGMA; tree space.]


Syst. Biol. 44(1):49--63, 1995

Statistical Tests of DNA Phylogenies

Wen-Hsiung Li1 and Andrey Zharkikh2
Center for Demographic and Population Genetics,
University of Texas, P.O. Box 20334,
Houston, Texas 77225, USA
1E-mail: gsbs005@utsph.sph.uth.tmc.edu.
2E-mail: zharkikh@gsbs18.gs.uth.tmc.edu.

Abstract.---In this article, we review (1) statistical tests of DNA phylogenies inferred by the maximum-parsimony method, including tests that take into account the effect of different base compositions among lineages, (2) statistical tests based on the minimum-evolution criterion, i.e., the best tree is the tree with the smallest sum of branch-length estimates, and (3) the bootstrap technique for estimating the confidence level of a phylogenetic hypothesis based on either the maximum-parsimony or the neighbor-joining method. We explain why the bootstrap technique usually gives biased estimates and how to correct the bias. [DNA phylogenies; parsimony; neighbor joining; minimum evolution; statistical tests; bootstrap; bias correction.]


Syst. Biol. 44(1):64--76, 1995

Testing Species Phylogenies and Phylogenetic Methods with Congruence

Michael M. Miyamoto1,3 and Walter M. Fitch2
1Department of Zoology, University of Florida,
Gainesville, Florida 32611, USA
2Department of Ecology and Evolutionary Biology,
321 Steinhaus Hall,
University of California, Irvine, California 92717-2515, USA
3E-mail: zoodept@nervm.nerdc.ufl.edu.

Abstract.---We assessed the utility of congruence and multiple data sets to test species relationships and the accuracy of phylogenetic methods. The ongoing controversy about whether to combine data sets for phylogenetic analysis was evaluated against the naturalness of different types of data (as commonly recognized by systematists) and character independence. We defend the recommendation that independent data sets (defined in terms of process partitions; sensu Bull et al., 1993, Syst. Biol. 42:384--397) should rarely be combined but should be kept separate for phylogenetic analysis because their independence increases the significance of corroboration. Trees of natural taxa, well supported by many independent lines of evidence, should be used in the same way as the known phylogenies of simulations and of certain laboratory and domesticated groups, i.e., as standards for evaluating the accuracy of different phylogenetic methods. Although compromised by their imperfect reliabilities, such tests using well-supported trees of wild taxa provide important reality checks on the conclusions of the other two approaches by encompassing more of the complexity and diversity of natural systems and their evolutionary processes. In this way, a combination of testing with the well-supported trees of natural groups, with simulations, and with those laboratory and domesticated taxa with known phylogenies is most likely to prove effective in establishing the strengths, weaknesses, and assumptions of different phylogenetic methods. [Accuracy; taxonomic congruence; character congruence; process partitions; character independence; well-supported trees; phylogenetic methods.]