Syst. Biol. 47(4):545-567, 1998

Phylogenetic congruence and discordance among one morphological and three molecular data sets from Pontederiaceae

Sean W. Graham1, Joshua R. Kohn2, Brian R. Morton3, James E. Eckenwalder4, and Spencer C.H. Barrett4

1 Department of Botany, Box 355325, University of Washington, Seattle, WA 98195, USA.
(E-mail: sgraham@u.washington.edu)

2 Department of Biology, University of California at San Diego, California, USA.

3 Department of Biological Sciences, Barnard College, Columbia University, New York, USA.

4 Department of Botany, 25 Willcocks St., University of Toronto, Toronto, Ontario, Canada

Abstract.---A morphological data set and three sources of data from the chloroplast genome (two genes and a restriction-site survey) were used to reconstruct the phylogenetic history of the Pickerelweed family Pontederiaceae. The chloroplast data are converging to a single tree, presumably the true chloroplast phylogeny of the family. Unrooted trees estimated from the three chloroplast data sets were identical or extremely similar in shape to each other, mostly robustly supported and there was no evidence of significant heterogeneity among them. The few topological differences seen among unrooted trees from each chloroplast data set are probably artifacts of sampling error on short branches. Despite well documented differences in rates of evolution for different characters in individual data sets, equally weighted parsimony therefore permits accurate reconstructions of chloroplast relationships in Pontederiaceae. A separate morphology-based data set yielded trees that were very different from the chloroplast trees. While there was substantial support by the morphological evidence for several major clades supported by chloroplast trees, most of the conflicting phylogenetic structure on the morphology trees was not robust. Nonetheless, several statistical tests of incongruence indicate significant heterogeneity between molecules and morphology. The source of this apparent incongruence appears to be a low ratio of phylogenetic signal to noise in the morphological data.
[Pontederiaceae; ndhF; rbcL; chloroplast DNA; morphology; congruence tests; incongruence; noise]


Syst. Biol. 47(4):568-581, 1998

Combining data sets with different phylogenetic histories

John J. Wiens

Section of Amphibians and Reptiles, Carnegie Museum of Natural History, Pittsburgh, Pennsylvania 15213-4080, USA;
E-mail: wiensj@clpgh.org

Abstract.---The possibility that two data sets may have different underlying phylogenetic histories (such as gene trees that deviate from species trees) has become an important argument against combining data in phylogenetic analysis. However, two data sets sampled for a large number of taxa may differ in only part of their histories. This is a realistic scenario and one in which the relative advantages of combined, separate, and consensus analysis become much less clear. I suggest a simple methodology for dealing with this situation that involves (1) partitioning the available data to maximize detection of different histories, (2) performing separate analyses of the data sets, and (3) combining the data but considering questionable or unresolved those parts of the combined tree that are strongly contested in the separate analyses (and which therefore may have different histories), until a majority of unlinked data sets supports one resolution over another. In support of this methodology, computer simulations suggest that (1) the accuracy of combined analysis at recovering the true species phylogeny may exceed that of either of two separately analyzed data sets under some conditions, particularly when the mismatch between phylogenetic histories is small and the estimates of the underlying histories are imperfect (few characters and/or high homoplasy), and (2) combined analysis provides a poor estimate of the species tree in areas of the phylogenies with different histories but an improved estimate in regions that share the same history. Thus, when there is a localized mismatch between the histories of two data sets, separate, consensus, and combined analysis may all give unsatisfactory results in certain parts of the phylogeny. Similarly, approaches that allow data combination only after a global test of heterogeneity will suffer from the potential failings of either separate or combined analysis, depending on the outcome of the test. Excision of conflicting taxa is also problematic in that it may obfuscate the position of conflicting taxa within a larger tree, even when their placement is congruent between data sets. Application of the proposed methodology to molecular and morphological data sets for Sceloporus lizards is discussed.
[Combined analysis, separate analysis, consensus analysis, phylogenetic accuracy; computer simulation; Sceloporus].


Syst. Biol. 47(4):582-588, 1998

Step matrices and the interpretation of homoplasy

Richard H. Ree1,2 and Michael J. Donoghue

1 Department of Organismic and Evolutionary Biology, Harvard University, 22 Divinity Avenue, Cambridge, Massachusetts 02138, USA

2 Corresponding author. E-mail: rree@oeb.harvard.edu

Abstract.---Assumptions about the costs of character change, coded in the form of a step matrix, determine most-parsimonious inferences of character evolution on phylogenies. We present a graphical approach to exploring the relationship between cost assumptions and evolutionary inferences from character data. The number of gains and losses of a binary trait on a phylogeny can be plotted over a range of cost assumptions, to reveal the inflection point at which there is a switch from more gains to more losses and the point at which all changes are inferred to be in one direction or the other. Phylogenetic structure in the data, the tree shape, and the relative frequency of states among the taxa influence the shape of such graphs and complicate the interpretation of possible permutation-based tests for directionality of change. The costs at which the most-parsimonious state of each internal node switches from one state to another can also be quantified by iterative ancestral-state reconstruction over a range of costs. This procedure helps identify the most robust inferences of change in each direction, which should be of use in designing comparative studies.
[parsimony, homoplasy, phylogenetic inference, character evolution, ancestral states]


Syst. Biol. 47(4):589-603, 1998

Can weighting improve bushy trees? Models of cytochrome b evolution and the molecular systematics of pipits and wagtails (Aves: Motacillidae)

Gary Voelker1 and Scott V. Edwards

Burke Museum and Department of Zoology, Box 353010, University of Washington, Seattle, WA 98195, USA

1 Present address (and address for correspondence): Barrick Museum, Box 454012, University of Nevada Las Vegas, Las Vegas, NV 89154, USA;
E-mail: voelker@hrc.nevada.edu



Abstract.---Among-site rate variation (a) and transition bias (k) have been shown, most often as independent parameters, to be important dynamics in DNA evolution. Accounting for these dynamics should result in better estimates of phylogenetic relationships. To test this idea, we simultaneously estimated overall (averaged over all codon positions) and codon-specific values of a and k using maximum likelihood analyses of cytochrome b data from all genera of pipits and wagtails (Aves: Motacillidae) and six outgroup species using initial trees generated with default values. Estimates of a and k were robust to initial tree topology and suggested substantial among-site rate variation even within codon classes, with a being lowest (large among-site rate variation) at second and highest (low among-site rate variation) at third codon positions. There were shifts in tree topology and dramatic and statistically significant improvements in log-likelihood scores of trees when overall values were applied, as compared to scores from default values. Applying codon-specific values resulted in yet another highly significant increase in likelihood. However, although incorporating substitution dynamics into maximum likelihood, maximum parsimony and neighbor-joining analyses resulted in increases in congruence among trees, there were only minor improvements in phylogenetic signal, and none of the successive approximations tree topologies were statistically distinguishable from one another by the data. We suggest that the bush-like nature of many higher-level phylogenies in birds makes estimating the dynamics of DNA evolution less sensitive to tree topology, but also less susceptible to improvement via weighting.
[Character weighting; cytochrome b; maximum likelihood; Motacillidae; rate variation; systematics; transition bias.]


Syst. Biol. 47(4):604-616, 1998

Measuring the phylogenetic randomness of biological data sets

William H. E. Day1, George F. Estabrook2, and F. R. McMorris3

1 Box 17, Port Maitland, Nova Scotia B0W 2V0, Canada;
E-mail: whday@istar.ca

2 Department of Biology, University of Michigan, Ann Arbor, Michigan 48109--1048, USA;
E-mail: estabrook@umich.edu

3 Department of Mathematics, University of Louisville, Louisville, Kentucky 40292, USA;
E-mail: frmcmo01@homer.louisville.edu

Abstract.---Two qualitative taxonomic characters are potentially compatible if the states of each can be ordered into a character state tree in such a way that the two resulting character state trees are compatible. The number of potentially compatible pairs (NPCP) of qualitative characters from a data set may be considered to be a measure of its phylogenetic randomness. The value of NPCP depends on the number of evolutionary units (EUs), the number of characters, the number of states in the characters, the distributions of EUs among these states, and the amount and distribution of missing information, and so does not directly indicate degree of phylogenetic randomness. Thus, for an observed data set, we used Monte Carlo methods to estimate the probability that a data set chosen equiprobably from among those identical (with respect to all the other above determining features) to the observed data set would have as high (or low) an NPCP as the observed data set. This probability, the realized significance of the observed NPCP, is attractive as an indication of phylogenetic randomness because it does not require assumptions made by other such methods: no character state trees are assumed and, as a consequence, only potential compatibility can be determined; no particular method of phylogenetic estimation is assumed; and no phylogenetic trees are constructed. We determined the values and significances of NPCP for analyses of 57 data sets taken from 53 published sources. All data sets from 37 of those sources exhibited realized significances <0.01, indicating high levels of phylogenetic nonrandomness. From each of the remaining 16 sources, at least one data set was more phylogenetically random. Inclusion of outgroups changed significance in some cases, but not always in the same direction. Data sets with significantly low NPCP may be consistent with an ancient hybrid origin (or other ancient polyphyletic gene exchange, crossing over, viral transfer, etc.) of the study group.
[Compatibility; data set robustness; Monte Carlo methods; phylogenetic randomness; qualitative taxonomic characters.]


Syst. Biol. 47(4):617-624, 1998

Using a nonrecursive formula to determine cladogram probabilities

J. Stone1,3 and J. Repka2

1 Department of Zoology, University of Toronto, Toronto, Ontario M5S 3G5, Canada

2 Department of Mathematics, University of Toronto, Toronto, Ontario M5S 3G3, Canada;
E-mail: repka@math.toronto.edu

3 Present address (and address for correspondence): Department of Ecology & Evolution, State University of New York, Stony Brook, NY 11794-5245, USA
E-mail: stone@life.bio.sunysb.edu



Abstract.---Three properties of bifurcating branching diagrams that are used for representing a specific number of taxa are (1) the number of possible arrangements, (2) the number of possible topologies, and (3) the probabilities of formation according to particular models of cladogenesis. Of these, probabilities have received the least attention in the literature. Indeed, many biologists would be astonished by the observation that the probability of a commonly cited cladogram containing 35 phyla of the animal kingdom is < 0.0072% of the value of the average probability taken over all possible cladograms! We reviewed works on cladogram arrangements and topologies and developed a computer-generated table of enumerations that extends and corrects such tables in the literature. We also developed a nonrecursive formula for the determination of cladogram probabilities. This formula facilitates calculation and thereby should promote use of cladogram probabilities, which might provide more accurate null hypotheses for tests of cladogenic events than do considerations of cladogram arrangements or topologies.
[Bifurcating branching diagram; cladogram arrangement; cladogram topology; likelihood; null hypothesis; phylogenetic tree.]


Syst. Biol. 47(4):625-640, 1998

Does adding characters with missing data increase or decrease phylogenetic accuracy?

John J. Wiens

Section of Amphibians and Reptiles, Carnegie Museum of Natural History, Pittsburgh, Pennsylvania 15213-4080, USA;
E-mail: wiensj@clpgh.org

Abstract.---Missing data are a widely recognized nuisance factor in phylogenetic analyses, and the fear of missing data may deter systematists from including characters that are highly incomplete. In this paper, I used simulations to explore the consequences of including sets of characters that contain missing data. More specifically, I tested whether the benefits of increasing the number of characters outweigh the costs of adding missing data cells to a matrix. The results show that the addition of a set of characters with missing data is generally more likely to increase phylogenetic accuracy than decrease it but that the potential benefits of adding these characters quickly disappear as the proportion of missing data increases. Furthermore, despite the overall trend, adding characters with missing data does decrease accuracy in some cases. In these situations, the missing data entries are not themselves misleading, but their presence may mimic the effects of limited taxon sampling, which can positively mislead. Criteria are discussed for predicting whether adding characters with missing data may increase or decrease accuracy. The results of this study also suggest that accuracy can be increased to a surprising degree by (1) “filling the holes” in a data matrix as much as possible (even when relatively few taxa are missing data), and (2) adding fewer characters scored for all taxa rather than adding a larger number of characters known for fewer taxa. Missing data can also be eliminated from an analysis through the exclusion of incomplete taxa rather than incomplete characters, but this approach may reduce the usefulness of the analysis and (in some cases) the accuracy of the estimated trees.
[Accuracy; missing data; parsimony; simulations.]


Syst. Biol. 47(4):641-653, 1998

Individuality and the existence of species through time

David A. Baum

Department of Organismic and Evolutionary Biology, Harvard University Herbaria, 22 Divinity Avenue, Cambridge Massachusetts 02138, USA;
E-mail: dbaum@oeb.harvard.edu

Abstract.---The individuality of species provides the basis for linking practical taxonomy with evolutionary and ecological theory. An individual is here defined as a collection of parts (lower-level entities) that are mutually connected. Different types of species individual exist based on different types of connection between organisms. An interbreeding species is a group of organisms connected by the potential to share common descendants, whereas a genealogical species is integrated by the sharing of common ancestors. Such species definitions serve to set the limits of species at a moment of time and these slices connect through time to form time-extended lineages. This perspective on the nature of individuality has implications that conflict with traditional views of species and lineages: (1) several types of connections among organisms may serve to individuate species in parallel (species pluralism); (2) each kind of species corresponds to a distinct kind of lineage; (3) although lineage branching is the most obvious criterion to break lineages into diachronic species, it cannot be justified simply by reference to species individuality; (4) species (like other individuals) have fuzzy boundaries; (5) if we wish to retain a species rank, we should focus on either the most or least inclusive individual in a nested series; (6) not all organisms will be in any species, and; (7) named species taxa are best interpreted as hypotheses of real species. Although species individuality requires significant changes to systematic practice and challenges some preconceptions we may have about the ontology of species, it provides to only sound basis for asserting that species exist independent of human perception.
[Individuation, phylogenetic systematics, phylogenetic taxonomy, metaspecies, pluralism.]


Syst. Biol. 47(4):654-672, 1998

Effects of branch length errors on the performance of phylogenetically independent contrasts

Ramon Diaz-Uriarte and Theodore Garland, Jr.

Department of Zoology, 430 Lincoln Drive, University of Wisconsin, Madison, Wisconsin 53706-1381, USA;
E-mail: tgarland@macc.wisc.edu

Abstract.--- We examined Type I error rates of Felsenstein's (1985; Am. Nat. 125:1-15) comparative method of phylogenetically independent contrasts when branch lengths are in error and the model of evolution is not Brownian motion. We used seven evolutionary models, six of which depart strongly from Brownian motion, to simulate the evolution of two continuously valued characters along two different phylogenies (15 and 49 species). First we examined the performance of independent contrasts when branch lengths are distorted systematically, for example, by taking the square root of each branch segment. These distortions often caused inflated Type I error rates, but performance was almost always restored when branch length transformations were used. Next, we investigated effects of random errors in branch lengths. After the data were simulated, we added errors to the branch lengths and then used the altered phylogenies to estimate character correlations. Errors in the branches could be of two types: fixed, where branch lengths are either shortened or lengthened by a fixed fraction; or variable, where the error is a normal variate with mean zero and the variance is scaled to the length of the branch (so that the expected error relative to branch length is constant for the whole tree). Thus the error added is unrelated to the microevolutionary model. Without branch length checks and transformations, independent contrasts tended to yield extremely inflated and highly variable Type I error rates. Type I error rates were reduced, however, when branch lengths were checked and transformed as proposed by Garland et al. (1992; Syst. Biol. 41:18-32), and almost never exceeded twice the nominal P-value at alpha=0.05. OUr results also indicate that, if branch length transformations are applied, then the appropriate degrees of freedom for testing the significance of a correlation coefficient should, in general, be reduced to account for estimation of the best branch length transformation. These results extend those reported in Diaz-Uriarte and Garland (1996; Syst. Biol. 45:27-47), and show that, even with errors in branch length and evolutionary models different from Brownian motion, independent contrasts are a robust method for testing hypotheses of correlated evolution.
[Branch lengths; Brownian motion; continuous characters; independent contrasts; Ornstein-Uhlenbeck model; simulations; speciational model; Type I error.]


Syst. Biol. 47(4):673-695, 1998

Split support and split conflict randomization tests in phylogenetic inference

Mark Wilkinson

School of Biological Sciences, University of Bristol, Bristol BS8 1UG, and Department of Zoology, The Natural History Museum, London, SW7 5BD, England;
E-mail: m.wilkinson@nhm.ac.uk

Abstract.---Randomization tests allow the formulation and statistical testing of null hypotheses about the quality of entire data sets or the quality of fit between the data and particular phylogenetic hypotheses. Randomization tests of phylogenetic hypotheses based on the concepts of split support and split conflict are described here, as are tests where splits, rather than the data, are randomly permuted. These tree-independent randomization tests are explored through their application to phylogenetic data for caecilian amphibians. Of these tests, split support randomization tests appear to be the most promising tools for phylogeneticists. These tests appear quite conservative, are applicable to nonpolar data and unordered multistate characters, and do not suffer from problems of nonindependence that affect split conflict and hierarchy tests. Unlike split conflict tests, their power does not appear to be correlated with split size. However, all tests are sensitive to taxonomic scope. Split support tests may help discern data that are likely to suffer from the problems of long branches effects. Comparison of test results for mutually incompatible splits may help to identify the presence of strong misleading signals in phylogenetic data. Significant split support could be a prerequisite for phylogenetic hypotheses to be considered well supported by the data, and split support randomization tests might be usefully applied prior to or as part of tree construction.
[Compatibility, conflict, hierarchy, phylogeny, randomization, spectral analysis, splits, support, statistics.]