Syst. Biol. 51(5) 2002

**Huelsenbeck et al.**

*Abstract.*—Only recently has Bayesian inference of phylogeny been proposed. The method is now a practical alternative to the other methods; indeed, the method appears to possess advantages over the other methods in terms of ability to use complex models of evolution, ease of interpretation of the results, and computational efficiency. However, the method should be used cautiously. The results of a Bayesian analysis should be examined with respect to the sensitivity of the results to the priors used and the reliability of the Markov chain Monte Carlo approximation of the probabilities of trees.

**Thorne and Kishino**

*Abstract.*—Bayesian methods for estimating evolutionary divergence times are extended to multigene data sets, and a technique is described for detecting correlated changes in evolutionary rates among genes. Simulations are employed to explore the effect of multigene data on divergence time estimation, and the methodology is illustrated with a previously published data set representing diverse plant taxa. The fact that evolutionary rates and times are confounded when sequence data are compared is emphasized and the importance of fossil information for disentangling rates and times is stressed.

**Aris-Brosou and Yang**

*Abstract.*—The molecular clock, i.e., constancy of the rate of evolution over time, is commonly assumed in estimating divergence dates. However, this assumption is often violated and has drastic effects on date estimation. Recently, a number of attempts have been made to relax the clock assumption. One approach is to use maximum likelihood, which assigns rates to branches and allows the estimation of both rates and times. An alternative is the Bayes approach, which models the change of the rate over time. A number of models of rate change have been proposed. We have extended and evaluated models of rate evolution, that is, the lognormal and its recent variant, along with the gamma, the exponential and the Ornstein-Uhlenbeck processes. These models were first applied to a small hominoid data set, where an empirical Bayes approach was used to estimate the hyperparameters that measure the amount of rate variation. We found that estimation of divergence times was sensitive to these hyperparameters, especially when the assumed model is close to the clock assumption. The rate and date estimates varied little from model to model, although the posterior Bayes factor indicated the Ornstein-Uhlenbeck process outperformed the other models. To demonstrate the importance of allowing for rate change across lineages, this general approach was used to analyze a larger data set consisting of the 18S rRNA gene of 40 metazoan species. We obtained date estimates consistent with paleontological records, the deepest split within the group being about 560 million years ago. Estimates of the rates were in accordance with the Cambrian explosion hypothesis, and suggested some more recent lineage-specific bursts of evolution.

**Suchard et al.**

*Abstract.*—Current methods to identify recombination between subtypes of human immunodeficiency virus-1 (HIV-1) fall into a sequential testing trap, in which significance is assessed conditional on parental representative sequences and crossover points (COPs) that maximize the same test statistic. We overcome this shortfall by testing for recombination while simultaneously inferring parental heritage and COPs using an extended Bayesian multiple change-point model. The model assumes that aligned molecular sequence data consist of an unknown number of contiguous segments that may support alternative topologies or varying evolutionary pressures. We allow for heterogeneity in the substitution process and specifically test for inter-subtype recombination using Bayes factors. We also develop a new class of priors to assess significance across a wide range of support for recombination in the data. We apply our method to three putative, gag gene recombinants. HIV-1 isolate RW024 decisively supports recombination with an inferred parental heritage of AD and a COP 95% Bayesian credible interval of (1152; 1178) using the HXB2 numbering scheme. HIV-1 isolate VI557 barely supports recombination. HIV-1 isolate RF decisively rejects recombination, as expected given the sequence is commonly used as a reference sequence for subtype B.
We employ scaled regeneration quantile plots to assess convergence and find this approach convenient to use even for our variable dimensional model parameter space.

**Nielsen**

*Abstract.*—Mapping of mutations on a phylogeny has been a commonly used analytical tool in phylogenetics and molecular evolution. However, the common approaches for mapping mutations based on parsimony have lacked a solid statistical foundation. Here, I present a Bayesian method for mapping mutations on a phylogeny. I illustrate some of the common problems associated with using parsimony and suggest instead that inferences in molecular evolution can be made on the basis of the posterior distribution of the mappings of mutations. A method for simulating a mapping from the posterior distribution of mappings is also presented and the utility of the method is illustrated on two previously published data sets. Applications include a method for testing for variation in the substitution rate along the sequence and a method for testing if the *d _{N}/d_{S}* ratio varies among lineages in the phylogeny.

**Miller et al.**

*Abstract.*—The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, specifically the genus *Ipomoea* and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of *Ipomoea* by suggesting that the genera within the tribe Argyreieae are derived from within *Ipomoea*; however, no quantitative estimates of statistical support were developed to address these questions. We applied a Bayesian analysis to provide quantitative estimates of monophyly in an investigation of morning glory relationships using DNA sequence data. We also explored various approaches for examining convergence of the Markov chain Monte Carlo (MCMC) simulation of the Bayesian analysis by running 18 separate analyses varying in length. We found convergence of the important components of the phylogenetic model (the tree with the maximum posterior probability, branch lengths, the parameter values from the DNA substitution model, and the posterior probabilities for clade support) for these data after one million generations of the MCMC simulations. In the process, we identified a run where the parameter values obtained were often outside the range of values obtained from the other runs, suggesting an aberrant result. In addition, we compared the Bayesian method of phylogenetic analysis to maximum likelihood and maximum parsimony. The results from the Bayesian analysis and the maximum likelihood analysis were similar for topology, branch lengths, and parameters of the DNA substitution model. Topologies also were similar in the comparison between the Bayesian analysis and maximum parsimony, although the posterior probabilities and the bootstrap proportions exhibited some striking differences. In a Bayesian analysis of three data sets (ITS sequences, *waxy* sequences, and ITS + *waxy* sequences) no support for the monophyly of the genus *Ipomoea*, or for the tribe Argyreieae, was observed, with the estimate of the probability of the monophyly of these taxa being less than 3.4 X 10^{-7}.

**Rannala**

*Abstract.*—Methods for Bayesian inference of phylogeny using DNA sequences based on Markov chain Monte Carlo (MCMC) techniques allow the incorporation of arbitrarily complex models of the DNA substitution process, and other aspects of evolution This has increased the realism of models, potentially improving the accuracy of the methods, and is largely responsible for their recent popularity. Another consequence of the increased complexity of models in Bayesian phylogenetics is that these models have, in several cases, become overparameterized. In such cases, some parameters of the model are not identifiable; different combinations of nonidentifiable parameters lead to the same likelihood, making it impossible to decide among the potential parameter values based on the data. Overparameterized models can also slow the rate of convergence of MCMC algorithms due to large negative correlations among parameters in the posterior probability distribution. Functions of parameters can sometimes be found, in overparameterized models, that are identifiable, and inferences based on these functions are legitimate. Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time.

**Marvaldi et al.**

*Abstract.*—The main goals of this study were to provide a robust phylogeny for the families of Curculionoidea, to discover relationships and major natural groups within the family Curculionidae, and to clarify the evolution of larval habits and host-plant associations in weevils in order to analyze their role in weevil diversification. Phylogenetic relationships among the weevils (Curculionoidea) were inferred from analysis of nucleotide sequences of 18S ribosomal DNA (∼2,000 bases) and 115 morphological characters of larval and adult stages. A worldwide sample of 100 species was made to maximize representation of weevil morphological and ecological diversity. All families and the main subfamilies of Curculionoidea are represented. The family Curculionidae *sensu lato* is represented by about 80 species in 30 "subfamilies" of traditional classifications. Phylogenetic reconstruction was done by parsimony analysis of separate and combined molecular and morphological data matrices, and also by bayesian analysis of the molecular data; tree topology support was evaluated. Results of the combined analysis of 18S/morphology show that monophyly of, and relationships among, each of the weevil families are well supported with the topology ((Nemonychidae Anthribidae) (Belidae (Attelabidae (Caridae (Brentidae Curculionidae))))). Within the clade Curculionidae *sensu lato* the basal positions are occupied by (mostly monocot-associated) taxa with the "primitive" type of male genitalia, followed by the Curculionidae *sensu stricto*, made up of groups with the "derived" type of male genitalia. High support values were found for the monophyly of some distinct curculionid groups like the Dryophthorinae (several tribes represented) and the Platypodinae (Tesserocerini plus Platypodini), among others. However, the subfamilial relationships in Curculionidae are unresolved or weakly supported. The phylogeny estimate based on 18S/morphology suggests that diversification in weevils is accompanied by niche shifts in host plant associations and in larval habits. Pronounced conservatism is shown in larval feeding habits, particularly in the host tissue consumed. Multiple shifts to use of angiosperms in Curculionoidea were identified, each time associated with increases in weevil diversity, and subsequent shifts back to gymnosperms, particularly in the Curculionidae.

**Kopp and True**

*Abstract.*—The *melanogaster* species group of *Drosophila* (subgenus *Sophophora*) has long been a favored model for evolutionary studies due to its morphological and ecological diversity and wide geographic distribution. However, phylogenetic relationships among species and subgroups within this lineage are not well understood. We reconstructed the phylogeny of 17 species representing 7 "oriental" species subgroups, which are especially closely related to *D. melanogaster*. We used DNA sequences of 4 nuclear and 2 mitochondrial loci in an attempt both to obtain the best possible estimate of species phylogeny and to assess the extent and sources of remaining uncertainties. Comparison of trees derived from single-gene datasets has allowed us to identify several strongly supported clades, which were also consistently seen in combined analyses. The relationships among these clades were less certain. The combined dataset contains data partitions that are incongruent with each other. Trees reconstructed from the combined dataset and from internally homogenous datasets consisting of 3-4 genes each differ at several deep nodes. The total dataset tree is fully resolved and strongly supported at most nodes. Statistical tests indicated that this tree is compatible with all individual and combined datasets. Therefore, we accepted this tree as the most likely model of historical relationships. We compared the new molecular phylogeny to earlier estimates based on morphology and chromosome structure and discuss its taxonomic and evolutionary implications.

**Szumik et al.**

*Abstract.*—A formal method to determine areas of endemism is presented. The method is based on dividing the study region into cells, and counting the number of species that can be considered as endemic, for a given set of cells (=area). Thus, the areas which imply that the maximum number of species can be considered as endemic, are to be preferred. This is the first method to identify areas of endemism which implements an optimality criterion directly based on considering the aspects of species distribution which are relevant to endemism. The method is implemented in two computer programs, NDM and VNDM, available from the authors.