Abstract.---A morphological data set and three sources of data from the
chloroplast genome (two genes and a restriction-site survey) were used to reconstruct the
phylogenetic history of the Pickerelweed family Pontederiaceae. The chloroplast data are
converging to a single tree, presumably the true chloroplast phylogeny of the family. Unrooted
trees estimated from the three chloroplast data sets were identical or extremely similar in
shape to each other, mostly robustly supported and there was no evidence of significant
heterogeneity among them. The few topological differences seen among unrooted trees from each
chloroplast data set are probably artifacts of sampling error on short branches. Despite well
documented differences in rates of evolution for different characters in individual data sets,
equally weighted parsimony therefore permits accurate reconstructions of chloroplast
relationships in Pontederiaceae. A separate morphology-based data set yielded trees that were
very different from the chloroplast trees. While there was substantial support by the
morphological evidence for several major clades supported by chloroplast trees, most of the
conflicting phylogenetic structure on the morphology trees was not robust. Nonetheless,
several statistical tests of incongruence indicate significant heterogeneity between molecules
and morphology. The source of this apparent incongruence appears to be a low ratio of
phylogenetic signal to noise in the morphological data.
[Pontederiaceae; ndhF; rbcL;
chloroplast DNA; morphology; congruence tests; incongruence; noise]
Abstract.---The possibility that two data sets may have different underlying
phylogenetic histories (such as gene trees that deviate from species trees) has become an
important argument against combining data in phylogenetic analysis. However, two data sets
sampled for a large number of taxa may differ in only part of their histories. This is a
realistic scenario and one in which the relative advantages of combined, separate, and
consensus analysis become much less clear. I suggest a simple methodology for dealing with this
situation that involves (1) partitioning the available data to maximize detection of different
histories, (2) performing separate analyses of the data sets, and (3) combining the data but
considering questionable or unresolved those parts of the combined tree that are strongly
contested in the separate analyses (and which therefore may have different histories), until a
majority of unlinked data sets supports one resolution over another. In support of this
methodology, computer simulations suggest that (1) the accuracy of combined analysis at
recovering the true species phylogeny may exceed that of either of two separately analyzed data
sets under some conditions, particularly when the mismatch between phylogenetic histories is
small and the estimates of the underlying histories are imperfect (few characters and/or high
homoplasy), and (2) combined analysis provides a poor estimate of the species tree in areas of
the phylogenies with different histories but an improved estimate in regions that share the
same history. Thus, when there is a localized mismatch between the histories of two data sets,
separate, consensus, and combined analysis may all give unsatisfactory results in certain parts
of the phylogeny. Similarly, approaches that allow data combination only after a global test of
heterogeneity will suffer from the potential failings of either separate or combined analysis,
depending on the outcome of the test. Excision of conflicting taxa is also problematic in that
it may obfuscate the position of conflicting taxa within a larger tree, even when their
placement is congruent between data sets. Application of the proposed methodology to molecular
and morphological data sets for Sceloporus lizards is discussed.
[Combined analysis,
separate analysis, consensus analysis, phylogenetic accuracy; computer simulation; Sceloporus].
Abstract.---Assumptions about the costs of character change, coded in the form of
a step matrix, determine most-parsimonious inferences of character evolution on phylogenies. We
present a graphical approach to exploring the relationship between cost assumptions and
evolutionary inferences from character data. The number of gains and losses of a binary trait
on a phylogeny can be plotted over a range of cost assumptions, to reveal the inflection point
at which there is a switch from more gains to more losses and the point at which all changes
are inferred to be in one direction or the other. Phylogenetic structure in the data, the tree
shape, and the relative frequency of states among the taxa influence the shape of such graphs
and complicate the interpretation of possible permutation-based tests for directionality of
change. The costs at which the most-parsimonious state of each internal node switches from one
state to another can also be quantified by iterative ancestral-state reconstruction over a
range of costs. This procedure helps identify the most robust inferences of change in each
direction, which should be of use in designing comparative studies.
[parsimony, homoplasy,
phylogenetic inference, character evolution, ancestral states]
Abstract.---Among-site rate variation (a) and transition bias (k) have
been shown, most often as independent parameters, to be important dynamics in DNA evolution.
Accounting for these dynamics should result in better estimates of phylogenetic relationships.
To test this idea, we simultaneously estimated overall (averaged over all codon positions) and
codon-specific values of a and k using maximum likelihood analyses of cytochrome b data from
all genera of pipits and wagtails (Aves: Motacillidae) and six outgroup species using initial
trees generated with default values. Estimates of a and k were robust to initial tree topology
and suggested substantial among-site rate variation even within codon classes, with a being
lowest (large among-site rate variation) at second and highest (low among-site rate variation)
at third codon positions. There were shifts in tree topology and dramatic and statistically
significant improvements in log-likelihood scores of trees when overall values were applied, as
compared to scores from default values. Applying codon-specific values resulted in yet another
highly significant increase in likelihood. However, although incorporating substitution
dynamics into maximum likelihood, maximum parsimony and neighbor-joining analyses resulted in
increases in congruence among trees, there were only minor improvements in phylogenetic signal,
and none of the successive approximations tree topologies were statistically distinguishable
from one another by the data. We suggest that the bush-like nature of many higher-level
phylogenies in birds makes estimating the dynamics of DNA evolution less sensitive to tree
topology, but also less susceptible to improvement via weighting.
[Character weighting;
cytochrome b; maximum likelihood; Motacillidae; rate variation; systematics; transition bias.]
Abstract.---Two qualitative taxonomic characters are potentially compatible if the
states of each can be ordered into a character state tree in such a way that the two resulting
character state trees are compatible. The number of potentially compatible pairs (NPCP) of
qualitative characters from a data set may be considered to be a measure of its phylogenetic
randomness. The value of NPCP depends on the number of evolutionary units (EUs), the number of
characters, the number of states in the characters, the distributions of EUs among these
states, and the amount and distribution of missing information, and so does not directly
indicate degree of phylogenetic randomness. Thus, for an observed data set, we used Monte Carlo
methods to estimate the probability that a data set chosen equiprobably from among those
identical (with respect to all the other above determining features) to the observed data set
would have as high (or low) an NPCP as the observed data set. This probability, the realized
significance of the observed NPCP, is attractive as an indication of phylogenetic randomness
because it does not require assumptions made by other such methods: no character state trees
are assumed and, as a consequence, only potential compatibility can be determined; no
particular method of phylogenetic estimation is assumed; and no phylogenetic trees are
constructed. We determined the values and significances of NPCP for analyses of 57 data sets
taken from 53 published sources. All data sets from 37 of those sources exhibited realized
significances <0.01, indicating high levels of phylogenetic nonrandomness. From each of the
remaining 16 sources, at least one data set was more phylogenetically random. Inclusion of
outgroups changed significance in some cases, but not always in the same direction. Data sets
with significantly low NPCP may be consistent with an ancient hybrid origin (or other ancient
polyphyletic gene exchange, crossing over, viral transfer, etc.) of the study group.
[Compatibility; data set robustness; Monte Carlo methods; phylogenetic randomness; qualitative
taxonomic characters.]
Abstract.---Three properties of bifurcating branching diagrams that are
used for representing a specific number of taxa are (1) the number of possible arrangements,
(2) the number of possible topologies, and (3) the probabilities of formation according to
particular models of cladogenesis. Of these, probabilities have received the least attention
in the literature. Indeed, many biologists would be astonished by the observation that the
probability of a commonly cited cladogram containing 35 phyla of the animal kingdom is <
0.0072% of the value of the average probability taken over all possible cladograms! We
reviewed works on cladogram arrangements and topologies and developed a computer-generated
table of enumerations that extends and corrects such tables in the literature. We also
developed a nonrecursive formula for the determination of cladogram probabilities. This
formula facilitates calculation and thereby should promote use of cladogram probabilities,
which might provide more accurate null hypotheses for tests of cladogenic events than do
considerations of cladogram arrangements or topologies.
[Bifurcating branching diagram;
cladogram arrangement; cladogram topology; likelihood; null hypothesis; phylogenetic tree.]
Abstract.---Missing data are a widely recognized nuisance factor in phylogenetic
analyses, and the fear of missing data may deter systematists from including characters that
are highly incomplete. In this paper, I used simulations to explore the consequences of
including sets of characters that contain missing data. More specifically, I tested whether the
benefits of increasing the number of characters outweigh the costs of adding missing data cells
to a matrix. The results show that the addition of a set of characters with missing data is
generally more likely to increase phylogenetic accuracy than decrease it but that the potential
benefits of adding these characters quickly disappear as the proportion of missing data
increases. Furthermore, despite the overall trend, adding characters with missing data does
decrease accuracy in some cases. In these situations, the missing data entries are not
themselves misleading, but their presence may mimic the effects of limited taxon sampling,
which can positively mislead. Criteria are discussed for predicting whether adding characters
with missing data may increase or decrease accuracy. The results of this study also suggest
that accuracy can be increased to a surprising degree by (1) Òfilling the holesÓ in a data
matrix as much as possible (even when relatively few taxa are missing data), and (2) adding
fewer characters scored for all taxa rather than adding a larger number of characters known for
fewer taxa. Missing data can also be eliminated from an analysis through the exclusion of
incomplete taxa rather than incomplete characters, but this approach may reduce the usefulness
of the analysis and (in some cases) the accuracy of the estimated trees.
[Accuracy; missing
data; parsimony; simulations.]
Abstract.---The individuality of species provides the basis for linking practical
taxonomy with evolutionary and ecological theory. An individual is here defined as a
collection of parts (lower-level entities) that are mutually connected. Different types of
species individual exist based on different types of connection between organisms. An
interbreeding species is a group of organisms connected by the potential to share common
descendants, whereas a genealogical species is integrated by the sharing of common ancestors.
Such species definitions serve to set the limits of species at a moment of time and these
slices connect through time to form time-extended lineages. This perspective on the nature of
individuality has implications that conflict with traditional views of species and lineages:
(1) several types of connections among organisms may serve to individuate species in parallel
(species pluralism); (2) each kind of species corresponds to a distinct kind of lineage; (3)
although lineage branching is the most obvious criterion to break lineages into diachronic
species, it cannot be justified simply by reference to species individuality; (4) species
(like other individuals) have fuzzy boundaries; (5) if we wish to retain a species rank, we
should focus on either the most or least inclusive individual in a nested series; (6) not all
organisms will be in any species, and; (7) named species taxa are best interpreted as
hypotheses of real species. Although species individuality requires significant changes to
systematic practice and challenges some preconceptions we may have about the ontology of
species, it provides to only sound basis for asserting that species exist independent of human
perception.
[Individuation, phylogenetic systematics, phylogenetic taxonomy, metaspecies,
pluralism.]
Abstract.--- We examined Type I error rates of Felsenstein's (1985; Am. Nat.
125:1-15) comparative method of phylogenetically independent contrasts when branch lengths are
in error and the model of evolution is not Brownian motion. We used seven evolutionary
models, six of which depart strongly from Brownian motion, to simulate the evolution of two
continuously valued characters along two different phylogenies (15 and 49 species). First we
examined the performance of independent contrasts when branch lengths are distorted
systematically, for example, by taking the square root of each branch segment. These
distortions often caused inflated Type I error rates, but performance was almost always
restored when branch length transformations were used. Next, we investigated effects of random
errors in branch lengths. After the data were simulated, we added errors to the branch lengths
and then used the altered phylogenies to estimate character correlations. Errors in the
branches could be of two types: fixed, where branch lengths are either shortened or lengthened
by a fixed fraction; or variable, where the error is a normal variate with mean zero and the
variance is scaled to the length of the branch (so that the expected error relative to branch
length is constant for the whole tree). Thus the error added is unrelated to the
microevolutionary model. Without branch length checks and transformations, independent
contrasts tended to yield extremely inflated and highly variable Type I error rates. Type I
error rates were reduced, however, when branch lengths were checked and transformed as proposed
by Garland et al. (1992; Syst. Biol. 41:18-32), and almost never exceeded twice the nominal
P-value at alpha=0.05. OUr results also indicate that, if branch length transformations are
applied, then the appropriate degrees of freedom for testing the significance of a correlation
coefficient should, in general, be reduced to account for estimation of the best branch length
transformation. These results extend those reported in Diaz-Uriarte and Garland (1996; Syst.
Biol. 45:27-47), and show that, even with errors in branch length and evolutionary models
different from Brownian motion, independent contrasts are a robust method for testing
hypotheses of correlated evolution.
[Branch lengths; Brownian motion; continuous characters; independent contrasts;
Ornstein-Uhlenbeck model; simulations; speciational model; Type I error.]
Abstract.---Randomization tests allow the formulation and statistical testing of
null hypotheses about the quality of entire data sets or the quality of fit between the data
and particular phylogenetic hypotheses. Randomization tests of phylogenetic hypotheses based on
the concepts of split support and split conflict are described here, as are tests where splits,
rather than the data, are randomly permuted. These tree-independent randomization tests are
explored through their application to phylogenetic data for caecilian amphibians. Of these
tests, split support randomization tests appear to be the most promising tools for
phylogeneticists. These tests appear quite conservative, are applicable to nonpolar data and
unordered multistate characters, and do not suffer from problems of nonindependence that affect
split conflict and hierarchy tests. Unlike split conflict tests, their power does not appear to
be correlated with split size. However, all tests are sensitive to taxonomic scope. Split
support tests may help discern data that are likely to suffer from the problems of long
branches effects. Comparison of test results for mutually incompatible splits may help to
identify the presence of strong misleading signals in phylogenetic data. Significant split
support could be a prerequisite for phylogenetic hypotheses to be considered well supported by
the data, and split support randomization tests might be usefully applied prior to or as part
of tree construction.
[Compatibility, conflict, hierarchy, phylogeny, randomization,
spectral analysis, splits, support, statistics.]