Explanation of Appendix 1

            When all Trichoptera are alignable, but not alignable to the outgroup, outgroup nucleotides in the region are replaced with “?”.  This was followed in the loop portion of stem 15 (Block 1) and the loop portion of the V4-1 (block 8).

            When a region has identical length in all but 2-3 taxa, but these taxa are still alignable, the region was included, but information about the length of the region is also recoded with symbols.  The gap symbol “-” was always treated as missing data.  At site 1, (Block 1) all Trichoptera were coded “0”, reflecting a length of 6 nucleotides (nts), except for the 2 Beraeidae, which had a length of 8, coded as “5” and Ceraclea, which had a length of 5, coded as “6”.  The extra nucleotides in Beraeidae were placed in the position of the two gaps in the rest of the taxa, as shown below:

Anomalopsyche    CGGG--CG 0  

Beraea                             CGGAGCCG 5  

Ernodes                      CGGAGCCG 5  

Ceraclea               CGUG--C- 6

Amphoropsyche    CGUG--CG 0  

 

Other cases where a few taxa had insertions or deletions which were coded with symbols were sites 4 (Block 3), 7 (Block 6), and 16 (Block 8).

            In some regions of variable length, an identical motif is found in all of the trichopteran suborders.  Identical motifs found across all suborders were defined as plesiomorphic, and other taxa were aligned to the plesiomorphic sequences.  An interesting example is found at site 3 (Block 2).  In this region, at least one member of every family except Xiphocentronidae possessed the sequence “CAUUAGUCA” (or “CGUUAGUCA” in Austrotinodes).  This plesiomorphic motif is found in all spicipalpian and integripalpian taxa with the three exceptions shown below.  Yet practically all annulipalpian families have major modifications in some taxa, while other taxa in the same family possesses the plesiomorphic motif.  This is puzzling because the conservation within Integripalpia and Spicipalpia implies some function, yet there have been repeated independent losses of the motif in Annulipalpia. 

            Sequences from some taxa were so different we could no longer align them, yet members of the same family retained the plesiomorphic sequence.  The taxa we deemed unalignable are shown bracketted below, but in PAUP they were coded with “?”.  This was a situation where we selectively eliminated characters from within the ingroup, and we justify this with the observation that at least some members of every family are alignable and represented by a plesiomorphic state.  If this is true, then variations of the plesiomorphic state are independent of one-another.  What is interesting is that these repeated independent losses only occurred within Annulipalpia.  The coded number following the motif below refers to the position of the lost nucleotides. 

 

Outgroups                           ********* *

All other Trichoptera               CAUUAGUCA 0

Culoptila_n_sp_GLOS                 C-UUAGUCA 2 

Cochliopsyche_vasquezi_HELI         CAGUAGUCA 0 

Helicopsyche_borealis               CACUAGUCA 0 

Homoplectra_doringa_HYDR            CAUUAGUCA 0 

Diplectrona_modesta_HYDR            CAU-AGUCA 7 

Hydropsyche_occidentalis_HYDR       -----GUCA 5 

Cheumatopsyche_oxa_HYDR             -----GUUA 5   

Leptonema_salvini_HYDR              C-UUAGUCA 2  

Macrostemum_zebratum_HYDR      C-UACAUAA 2 

Smicridea_turrialbana_HYDR          CAAACA-CA 8 

Philopotamus_montanus_PHIL          CAUUAGUCA 0 

Chimarrhodella_ulmeri_PHIL          CACCAGUCA 0 

Xiphocentron_sp_XIPH           -ACCAGUCA 1 

Polycentropus_interruptus_POLY CAUUAGUCA 0 

Pseudoneureclipsis_POLY             -AUUAGUCA 1 

Antilopsyche_demma_POLY        CGUUAGUCA 0 

Phylocentropus_placidus_DIPS        CAUUAGUCA 0 

Austrotinodes_sp_ECNO               CGUUAGUCA 0   

Ecnomus_tenellus_ECNO               C-UACGUU- 2 

Smicridea_talamanca_HYDR           [CAU----UU]4

Wormaldia_gabriella_PHIL           [AUU----UA]4 

Wormaldia_triangulifera_PHIL       [AUU----UA]4 

Chimarra_feria_PHIL                [U------UU]6 

Chimarra_rossi_PHIL                [C------UU]6 

Dipseudopsis_notatus_DIPS     [CAAC---CA]3 

Dipseudopsis_varius_DIPS           [CAAC---CA]3  

 

            Similarly, the nucleotides at position 5 (Block 5) are somewhat ambiguously aligned across higher taxa, but aligned within each suborder.  The plesiomorphic state was 5 nts: “RYYRA”.  This was determined by examination within Trichoptera at Spicipalpia and Annulipalpia and at the outgroups.  Five nts. are found in Siphonaptera and Mecoptera, while 6 are unique to Lepidoptera.  Some Trichoptera had 7 nts, with the extra nts. aligned at the beginning.  We could determine this because the last five nts of 7 had the RYYRA motif.  So position 5 refers to the state of these first 2 bracketted nts (of 7) that precede the symbol:  “0=pleisiomorphically missing” “AC=1, UU=6...” 

            Following position 5, the nts were retained if they had 5 nts that could be aligned, but coded as “?” if not.  Position 6 refers to the number of these nucleotides.  This was necessary because some taxa that could not be aligned were still identical to one-another, and while the unalignable nts were coded as “?” in the region between 5 and 6, some information was retained by giving them a specific coded symbol that indicated their length.

            Stem D3-1 (block 6) has multiple compensatory changes that guided the alignment of the majority of the taxa.  Position 7 is coded for presence or absence of a nucleotide preceding it.  Multiple states are given because a few taxa have an insertion here, and these taxa are coded with different numbers.  A few taxa have lost the entire stem.   

            In the loop in the middle of D3-1, the plesiomorphic state is CUCGN, present in all higher taxa and in Lepidoptera.  All taxa possessing the CUCGN, followed by an unambiguous stem of 8 bp were retained.  All others (particularly those with longer stems) were coded as question marks.  Position 8 codes for insertions in some taxa.  The actual nucleotides were bracketted, and a numerical code was used to indicate “nucleotide present”.  Position 9 codes for the loss of single nucleotides following the “CUCG” loop motif (a 4 nt “loop”).  Position 10 codes for several taxa with an additional nucleotide following the “CUCGN” motif.  Again, for these scattered insertions, the nucleotides were bracketted, and the state was coded as 1=nucleotide present, 0=nucleotide absent.  Positions 11, 12 and 13 also coded for presence/ absence of an insertion.  Position 14 codes for an insertion from 0-4 nts in length.  The nts could not be aligned across taxa, but each identical combination was given a specific numerical code.

            Stem D3-2, position 15 (block 6) refers to a large insert present in some taxa.  The first number is coded such that each unique combination of nucleotides is given a number.  PAUP does not allow more than 32 states for a given character, so some taxa that were autapomorphic (and therefore not informative) were coded with “?”.  The second number refers to the number of nucleotides in the insert.

            The sequences for outgroup taxa preceding position 15 are 32-33 nts, while Trichoptera have 2-5 nts.  Despite the fact that there was not a great deal of length heterogeneity within the ingroup, the nucleotides within Trichoptera could not be aligned, and the region was eliminated for all taxa, but coded with symbols.  The first number of position 15 refers to the unique state of each taxon.  For example:  CCA=1 CCC=5  The second character is the number of nucleotides in the region, so that CCA=13 CCC=53.  With identical sequences, this has the undesired effect of having the first and second state linked.  We justify this with the following example:

 

Micropterigidae_LEP           UGAGC[GGCCGGUAAAACGGCGUGCUCAACUGAAAUC] ?0 Taschorema_evansi_HYDB           UGCGU[--CCA--------------------------] 13

Brachycentrus_americanus     UGCGU[--CCA--------------------------] 13

Alloecella_grisea_HEL         UGCGU[--CCA--------------------------] 13

Agarodes_distinctus_SERI      UGCGU[--CCA--------------------------] 13

Helicopsyche_borealis         UGCGU[--CCC--------------------------] 53  

Olinga_feredayi_CONO          UGCGU[--CCAC-------------------------] •4

Triplectides_flintorum_LEPT   UGCGU[--AAA--------------------------] p3

Ceraclea_resurgens_LEPT       UGCGU[--CCA--------------------------] 13

Diplectrona_modesta_HYDR      UGCGU[--UU---------------------------] 72

Homoplectra_doringa_HYDR      UGCGU[GUUUU--------------------------] 85

Leptonema_salvini_HYDR        UGCGU[GUUUU--------------------------] 85

Wormaldia_gabriella_PHIL      UGCGU[GUUCU--------------------------] a5

Wormaldia_triangulifera_PHIL  UGCGU[GUUCU--------------------------] a5

Stenopsychodes_mjobergi_STEN  UGCGU[GUCAA--------------------------] &5

Stenopsyche_marmorata_STEN    UGCGU[GUCAU--------------------------] f5

 

The above region is not alignable between Trichoptera and the outgroup, and also among trichopteran suborders.  Yet there are obviously characters present.  For example, Wormaldia share an identical “GUUCU”, and these are normally 5 characters.  Here they are reduced to two characters, with the first linked to the second.  Perhaps it would be better to code each taxon with a single symbol for each unique state.  But then for example, the stenopsychids, which differ only by a single nucleotide, would be as far from one another as they would be from Diplectrona.  We viewed our coding scheme here as a compromise, in an attempt to specify one kind of state (the unique combination) with the first symbol, and a second kind of state (gain vs. loss of nucleotides) with the second.  In hindsight, I wish I had been aware of Lutzoni et al. (2000:Syst. Biol 49:628-651) because I think this would solve the problem.

The final coded region:  a loop portion within step 29 (position 18;  block 11) was also coded in this way.

            There was a large insert in V4-3 (Block 9) that is flanked by a conserved stem.  This insert was alignable within the outgroup, and within Limnephiloidea and some Sericostomatoidea.  However the sequences within Spicipalpia and Annulipalpia make alignment across ingroup taxa impossible.  Moreover, the insert was too complex to code, therefore we eliminated it.  We anticipate using this region in future studies designed to answer shallower level phylogenies of individual superfamilies.

            The final insert of block 9, between V4-3 and V4-2’ was an insert in 3 taxa that was ignored because it was autapomorphic in each case.  All Trichoptera have an insert of 5 nucleotides compared to all our outgroup taxa.

            The “?”s  in stem v4-4 (Block 10) refer to an insert in Lepidoptera.  All Trichoptera are invariant in length in this region, but the inserted lepidopteran sequences could not be aligned to the ingroup.  Therefore we coded the nucleotides in Lepidoptera as “?” in the region that was ambiguously aligned to the ingroup without excluding the whole region for the rest of the taxa.

            In block 9, the position preceeding V4-5’,  all taxa except Diptera have state 1, corresponding to the presence of a nucleotide in the preceding position.  All dipterans we examined had a single missing nucleotide at position 16, and were coded as “0”.

            Position 17, block 11, indicates the loss of a single nucleotide in Ditresia (Lepidoptera).

            Returning to block 2, preceding stem 18b, we show the most difficult to align region that we retained in the analysis;  this region is discussed last because of its complexity.  By the time you have gotten this far you have either quit in disgust, or you understand what we are attempting.  Once again, in hindsight, I think the method of Lutzoni et al. (2000:Syst. Biol 49:628-651) would better solve this problem, but what we did here was an unbiased attempt to retain information.  The outgroups could not be aligned with Trichoptera, and outgroup sequences were bracketted and coded with “?”.  Spicipalpians and Integripalpians could be unambiguously aligned together, with the motif:  GUUAW?????UAYA, some with inserts.  Many of the Annulipalpia also began with “GUUAN”, while others had “UGUUAN”.  We interpreted this as a “U” insertion at the beginning, so that the conserved motif could be aligned with the other Trichoptera.  At the other end of the conserved motif, many non-annulipalpian Trichoptera had “UACA”  or “UAAA”, and some Annulipalpia had “UACAUU”.  We lined up the “UACA”, and then inferred another 2 nt insertions at the end in Annulipalpia.  The symbols following the region indicate the location at positions 1, 2 and 3, marked below.  For the Integripalpia and Spicipalpia, we left the first symbol as a question mark because this region is the most ambiguously aligned region that we included, and we did not want to support an Integripalpia + Spicipalpia clade with a character from such a hypervariable region unless we were certain of our alignment.  Most of the included characters here are either autapomorphic or uninformative, however, the following motifs were aligned:

 

     Motif           ? GUUAW ????? UAYA ??

Palaeagapetus_HYDT        -|GUUAU|AUUUA|UACA|-- ?10 

Limnephilus_externus -|GUUAU|ACA--|UACA|-- ?00 

Tasmanthrus_PHRH          -|GUUAA|AAU--|UAAA|-- ?00 

Tinodes_waeneri_PSYC -|GUUAA|-AU--|-UUA|UU 20(1,3)

Cyrnellus_POLY            U|UUUAA|-UU--|UACA|UU 000

Stenopsychodes_STEN       U|GUUAA|-UU--|UACU|UU 000 

Philopotamus_montanus     U|GUUAA|-UU--|UAUA|AU 000  

                           1         | 2| 3