Dating the Monocot–Dicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

Embed Size (px)

Citation preview

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    1/18

    Dating the MonocotDicot Divergence and the Origin of Core Eudicots

    Using Whole Chloroplast Genomes

    Shu-Miaw Chaw,1 Chien-Chang Chang,1 Hsin-Liang Chen,1 Wen-Hsiung Li2

    1 Institute of Botany, Academia Sinica, 128 Sec. 2, Academy Road, Taipei 115, Taiwan

    2 Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA

    Received: 31 July 2003 / Accepted: 23 October 2003

    Abstract. We estimated the dates of the monocot

    dicot split and the origin of core eudicots using a

    large chloroplast (cp) genomic dataset. Sixty-one

    protein-coding genes common to the 12 completely

    sequenced cp genomes of land plants were concate-

    nated and analyzed. Three reliable split events wereused as calibration points and for cross references.

    Both the method based on the assumption of a con-

    stant rate and the LiTanimura unequal-rate method

    were used to estimate divergence times. The phylo-

    genetic analyses indicated that nonsynonymous sub-

    stitution rates of cp genomes are unequal among

    tracheophyte lineages. For this reason, the constant-

    rate method gave overestimates of the monocotdicot

    divergence and the age of core eudicots, especially

    when fast-evolving monocots were included in the

    analysis. In contrast, the LiTanimura method gaveestimates consistent with the known evolutionary

    sequence of seed plant lineages and with known fossil

    records. Combining estimates calibrated by two

    known fossil nodes and the LiTanimura method, we

    propose that monocots branched off from dicots 140

    150 Myr ago (late Jurassicearly Cretaceous), at least

    50 Myr younger than previous estimates based on the

    molecular clock hypothesis, and that the core eudi-

    cots diverged 100115 Myr ago (AlbianAptian of

    the Cretaceous). These estimates indicate that both

    the monocotdicot divergence and the core eudicots

    age are older than their respective fossil records.

    Key words: Chloroplast genome Divergence ofmonocot and dicot Angiosperm phylogeny Age of core eudicots Molecular clock Un-equal rate

    Introduction

    Fossil evidence suggests that flowering plants (an-

    giosperms) first appeared 140 million years (Myr)

    ago in the early Cretaceous (Willis and McElwain

    2002). They soon diversified and expanded globally in

    the mid-Cretaceous (90100 Myr ago) (Nicholas et al.

    1983). Although the angiosperm phylogeny has now

    been largely established (Mathews and Donoghue

    1999; PS Soltis et al. 1999; Qiu et al. 1999; Parkinson

    et al. 1999; DE Soltis et al. 2000; Chase et al. 2000),

    the question of why the oldest unequivocal fossil for

    angiosperms is nearly 300 and 170 Myr later than the

    first vascular plants (ca. 440 Myr ago [Taylor and

    Taylor 1993]) and their extant sister group, the

    gymnosperms (late Carboniferous, ca. 310 Myr ago

    [Doyle 1998]), respectively, remains an abominable

    mystery (Darwin et al. 1903). A number of hy-

    potheses have been proposed to explain the late ar-

    rival of angiosperms in the fossil record. These

    include (1) the escape of fossilization in the initial

    stage of angiosperm evolution (Thomas and Spicer

    1987), (2) bias in the fossil record (i.e., angiosperms

    J Mol Evol (2004) 58:424441DOI: 10.1007/s00239-003-2564-9

    Correspondence to: Shu-Miaw Chaw; email: smchaw@sinica.

    edu.tw

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    2/18

    volved much earlier but went undetected), and (3)

    he suggestion that the evolution of angiosperms wasriggered by a particular set of environmental con-

    itions, and/or biotic interactions (such as co-evolu-

    ion with faunal groups) (Willis and McElwain 2002).

    Is the origin of angiosperms actually much older

    han the known fossil record? Since Ramshaw et al.s

    1972) first application of molecular data to address

    his question, three decades have passed. In the in-

    erim, molecular phylogenetic studies and critical

    ossils of derived angiosperms from older geological

    eposits (Magallo n et al. 1999; Wikstro m et al. 2001)

    ave opened up an opportunity to readdress the age

    nd evolution of angiosperms. Although all previous

    stimates of the monocotdicot divergence (Table 1)

    redate angiosperms fossil records, they are highly

    ariable, ranging from 140190 Myr (Goremykin

    t al. 1997; Sanderson 1997; Wikstro m et al. 2001;

    anderson and Doyle 2001) to 200 Myr (Ramshaw

    972; Wolfe et al. 1989; Laroche et al. 1995; Yang

    t al. 1999) or even 300320 Myr (Martin et al. 1989,

    993; Brandl et al. 1992).

    Traditionally, the angiosperms were subdividednto two classes, Liliopsida (the monocots) and

    Magnoliopsida (the dicots) (Cronquist 1988). How-

    ver, this subdivision was first refuted by rbcL and

    8S rRNA gene phylogenies (Chase et al. 1993; Chaw

    t al. 1997) and later by analyses of multiple genes

    from the three plant genomes (Mathews and Do-

    noghue 1999; Parkinson et al. 1999; Qiu et al. 1999;PS Soltis et al. 1999; DE Soltis et al. 2000; Chaw et al.

    2000). These phylogenetic analyses have led to the

    conclusion that the dicots were split into the basal

    dicots (or the magnoliids) and the eudicots and that

    the monocot lineage was derived from one of the

    basal magnoliids (Fig. 1A). Parallel to the molecular

    data has been the accumulation of pollen fossils of

    eudicots, which began in the late Barremian (of

    Cretaceous, ca. 120 Myr ago) and spread globally in

    the Albian (ca. 110 Myr ago) (Doyle 1992; Hughes

    1994). In addition, many new megafossils of basal

    eudicots have appeared, such as Tetracentraceae

    from the Barremian (110118 Myr ago) (Magallo n

    et al. 1999), as well as core eudicots, such as a pos-

    sible Rhamnaceae/Rosaceae (rosids) from the early

    Cenomanian (9497 Myr ago [Basinger and Dilcher

    1984]). It has also been suggested that the date of

    diversification of core eudicots was underestimated.

    Wikstro m et al. (2001) have examined this issue

    (Table 1) with nuclear 18S rDNA and two cp (rbcL

    and atpB) genes. We now provide additional evidenceby analyzing whole chloroplast (cp) genomic DNA

    sequences.

    Cp DNA sequences are useful for studying the

    plant phylogeny at deep levels of evolution because of

    their lower rates of silent nucleotide substitution

    able 1. Comparison of previous estimates of divergence between monocots and dicots

    Reference Gene used

    Reference point

    (timea of divergence; Myr) Estimated timea (Myr)

    Ramshaw et al. (1972) Cytochrome cb Mammalsbirds (280) 220240

    Martin et al. (1989) nrc GapC, CHS Animalyeast (1000) 319 35

    Drosophilavertebrates (600)

    Mammalschicken (270)

    Humanerat (85)

    Wolfe et al. (1989) 12 cpc genes Bryophyteangioaperm (350450) 170230Maizewheat (5070) 150260

    nr 26S, 18S rRNA Plantanimal (1000) 200250, 200210

    randl et al. (1992) cp tRNA Maizewheat (5070)

    Tracheophytebryophyte (350450) 230350

    Plantanimal (1000)

    Martin et al. (1993) nr GapC, cp rbcL Bryophytespermatophyte (450) 300

    Coniferangiosperm (330)

    aroche et al. (1995) 12 mtc genes Maizewheat (5070) 170238

    VicieaePhaseoleae (4565) 157226

    Goremykin et al. (1997) 58 cp genesb Bryophytespennatophyte (450) 160 16

    anderson (1997) rbcL Marchantia (450) [160215]d

    Yang et al. (1999) mt 1st intron of nad Maizewheat (5070) 170235

    anderson & Doyle (2001) rbcL, 18S rRNA Land plant (450) 140190Wikstro m et al. (2001) cp rbcL, atpB FagalesCucurbitales (84) [158179]

    nr 18S rDNA [131147]e

    The unit of time is millon years ago (or before present).

    The translated amino acid sequence was used.

    cp, chloroplast; mt, mitochondrial; nr, nuclear.

    The age of extant angioaperms.

    The origin date of eudicots.

    425

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    3/18

    (Palmer 1985a, b; Wolfe et al. 1989; Clegg et al.

    1994). Moreover, concatenating sequences frommany genes may overcome the problem of multiple

    substitutions that cause the loss of phylogenetic in-

    formation between cp lineages (Lockhart et al. 1999)

    and can reduce sampling errors due to substitutional

    noise and the finite number of characters within a

    gene (Sanderson and Doyle 2001). In this study we

    analyzed 39,507 sites of cp DNA genomic sequences

    from 61 protein-coding genes common to the 12

    complete cp genomes of land plants (Table 2). Our

    dataset is larger than those used in previous studies,

    including that of Goremykin et al. (1997; see also

    Table 1), who analyzed 40 proteins of cp genomes

    from fewer taxa (five land plants, including only one

    dicot and two monocots).

    Molecular dating often assumes rate constancy,

    but this is frequently violated (PS Soltis et al. 2002

    and references herein). For example, substitution

    rates of cp genes vary greatly among and within

    tracheophyte (or vascular plant) lineages (Bousquet

    et al. 1992; Gaut et al. 1992, 1993; Clegg et al. 1994;

    Sanderson and Doyle 2001; PS Soltis et al. 2002),between protein-coding loci (Muse and Gaut 1997;

    Matsuoka et al. 2002), and between nonsynonymous

    and synonymous sites (Gaut et al. 1997; Matsuoka et

    al. 2002). Sanderson and Doyle (2001) believed that

    much of the conflict in estimating divergence times

    was due to rate variation across lineages. In order to

    mitigate this problem we used mean branch lengths ofthe sampled monocots and dicots.

    The focus of this study is to estimate the dates of

    the monocotdicot split and the origin of core eudi-

    cots using a large cp genomic dataset. The date of the

    monocotdicot divergence can be calculated by ex-

    trapolation from the reliable dates of other speciation

    events by means of phylogeny based on DNA se-

    quence distances (Wolfe et al. 1989). Three diver-

    gence events with well-supported fossil dates were

    used as calibration points and cross references. Both

    the method based on the assumption of a constant

    rate and Li and Tanimuras (1987) unequal-rate

    method (hereafter the LiTanimura method) were

    used to estimate divergence times, and the estimates

    were compared with known fossil dates. Although

    several other methods without the rate constancy

    assumption, such as the nonparametric rate

    smoothing method (NPRS), have been proposed

    (Sanderson 1997 and references cited herein), we

    chose the LiTanimura method for its simplicity. The

    method uses lineages in which the molecular clockholds better than the others to estimate the diver-

    gence time at a particular node. We also discuss

    possible reasons for discrepancies among estimates of

    divergence dates obtained in this study and previous

    studies.

    Fig. 1. Rooted phylogenetic tree for the 12 sampled species. A

    Phylogeny of angiosperms based on Qiu et al. (1999) and P. S.

    Soltis et al.s (1999) phylogenetic trees. Solid lines lead to taxa

    sampled in this study. B Rooted NJ tree using the PamiloBianchi

    Li distances based on the Ka values concatenated from 61 cp

    protein-coding genes. The branches leading to nodes C2 and A are

    not drawn to scale. Lengths are indicated. The calibration points

    (nodes C1, C2, C3) were used to estimate the divergence between

    monocots and dicots (node A) and the origin of core eudicots (node

    B). Gene loss (open bar), loss but with known transfer to nucleus

    (hatched bar), retention (gray bar), and likely gain with no simi-

    larity to prokaryotic genes (filled bar) are plotted on the branches

    leading to each lineage. The upper numbers at each node denote the

    bootstrap percentages (where applicable, values of the interior

    branch test indicated after the slash). Total gene number in the cp

    genome is given after each species, in parentheses. Branch lengths

    and the scale bar are Ka values per 100 sites.

    426

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    4/18

    Data and Methods

    Database Search for Cp Genome Sequences

    ndividual genes of the 12 published cp genome sequences (Table

    ) were downloaded from GenBank, National Center for Bio-

    echnology Information (NCBI). Nomenclatures of the cp pro-

    ein-coding genes complied by Hallick and Bairoch (1994), Stoebe

    t al. (1998), Martin et al. (2002), and Swiss-Prot Protein

    Knowledgebase (2003) were used as guides. When synonyms were

    ncountered, their sequence homologies with the typified names

    ere carefully verified. Two homology criteria were considered:

    ) the alignable length between two proteins is larger than 80%

    f the longer sequence, and (2) the sequence identity in the

    ligned region is at least 40% if L > 150, or at least 0.06 +

    8L)0.032 (1 exp()L/1000)) (Rost 1999; Gu et al. 2002). Note that we

    aise the identity to 40% instead of 30% because the taxa we

    ampled are comparatively recent and cp genes are highly con-

    erved (Wolfe et al. 1989).

    Since the sequence of Medicago was not annotated, its protein-

    oding genes were annotated using the Nucleotide queryProtein

    atabase (BLASTX) algorithm at NCBI with each known gene

    om Lotus as query. If a particular gene was missing from Lotus,

    hat gene from the rest of the 10 taxa was used instead. Open

    eading frames annotated by us were also verified using the BLAST

    sequences algorithm and the Nucleotide queryTranslated db

    lgorithm in NCBI against the corresponding gene and the whole

    enome of Arabidopsis, respectively. A query sequence with more

    han 40% identity to the specific known genes was then considered

    s a putative homologous gene. A remnant of the accD gene in the

    ce was reported previously (Hiratsuka et al. 1989) but could not

    e detected by Katayama and Ogihara (1996) or Ogihara et al.

    2002) using Southern hybridization. We were not able to locate it

    from the rice cp genome either. We used the reviews of Millen et al.

    (2001) and Martin et al. (2002) cp genes as guides to confirm ourBLAST searches, especially for those genes lost or with unknown

    functions.

    After careful comparison and annotation, a total of 98 protein-

    coding genes was found in the cp genomes of the 12 sampled

    species (Table 2). The lengths as well as the presence or absence of

    those genes in each taxon are presented in Appendix 1. An open

    reading frame homologous to a known gene was given the same

    name to facilitate comparison and alignment. For some unanno-

    tated genes filtered by using BLASTX search, their positions in the

    corresponding genomes were indicated. We excluded pseudogenes

    and genes duplicated in the inverted repeat regions. The cp encoded

    RNA genes were previously shown to be problematic in early cp

    phylogeny (Martin et al. 1998; Lockhart et al. 1999) and in thepresent study as well (data not shown). Therefore, RNA genes were

    excluded from analysis.

    Alignment of All Cp Genes and Phylogenetic Analyses

    Amino acid sequences of each gene from the 12 taxa were first

    aligned one by one using GeneDoc (Nicholas and Nicholas 1997)

    with minor adjustments. The alignment was then used as a guide

    for aligning the corresponding nucleotide sequences. Unknown

    sites, start and stop codons, and regions difficult to align were

    removed from each gene alignment. All aligned individual gene

    sequences were then assembled using the Text Editor in MEGA 2.1

    (Kumar et al. 2001). Gaps were completely deleted from the as-

    sembled alignment concatenated from the 61 cp protein-coding

    genes common to the 12 sampled taxa (see also Results). The

    working data file (in MEGA format) is shown in Appendix 2,

    available in the Supplementary Material Section at the JME Web

    site.

    able 2. Scientific names, classification, and NCBI accessions of species in the dataset

    lassificationa Scientific name NCBI accession No. (version date)b/Reference

    ryophyte

    Marchantiaceae Marchantia polymorpha NC_001319 (Aug 2002)/Ohyama et al. (1986)

    etridophyte

    Psilotaceae Psilotum nudum AP004638 (Nov 2002)/Wakasugi et al. (2000)

    Gymnoaperm

    Pinaceae Pinus thunbergii NC_001631 (Sep 2002)/Wakasugi et al. (1994)

    AngiospermsMonocots

    Poaceae

    Andropogoneae Zea mays NC_001666 (Sep 2002)/Maier et al. (1995)

    Oryzeae Oryza sativa NC_001320 (Sep 2002)/Hiratsuka et al. (1989)

    Triticeae Triticum aestivum NC_002762 (Sep 2002)/Ikeo and Ogihara (2000)

    Dicots

    Eudicots

    Caryophyllidae

    Chenopodiaceae Spinacia oleracea NC_002202 (Aug 2002)/Schmitz-Linneweber et al. (2001)

    Asteridae

    Solanaceae Nicotiana tabacum NC_001879 (Sep 2002)/Shinozaki et al. (1986)

    Rosidae

    Brassicaceae Arabidopsis thaliana NC_000932 (Sep 2002)/Sato et al. (1999)Onagraceae Oenothera elata subsp. hookeri NC_002693 (Sep 2002)/Hupfer et al. (2000)

    Fabaceae

    Papillionoideae

    Loteae Lotus japonicus NC_002694 (Sep 2002)/Kato et al. (2000)

    Trifolieae Medicago truncatula AC093544c(Nov 2001)/Lin et al. (2001)

    Ranks of species follow the NCBIs Taxonomy Guide.

    Data modified from http://megasun.bch.umontreal.ca/ogmp/projects/other/cp_list.html (vers. 20 Dec 2002).

    No gene annotation in this accession.

    427

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    5/18

    Nucleotide sequence divergence between a pair of taxa (or

    groups) was calculated in terms of the numbers of substitutions per

    synonymous site (Ks) or per nonsynonymous site (Ka), using the

    PamiloBianchiLi method implemented in MEGA 2.1. Diver-

    gence value between two groups is presented as average distance

    standard error, obtained from the option Compute Between

    Groups Means in MEGA 2.1. Average distance between two

    groups is the arithmetic mean of all pairwise distances between taxa

    in the intergroup comparisons. To date the divergence between the

    monocot and the dicot lineages, Saito and Neis (1987) neighbor-

    joining (NJ) method and the Ka values (not Ks, because substitu-tions at the third codon positions are saturated across sampled land

    plant lineages; see Results) were used to reconstruct the phylo-

    genetic trees, rooted at the top of the Pinus lineage. Because the six

    sampled dicots (Table 2) represent the two large clades (the rosids

    and the asterids) of core eudicots and one of the remaining four

    small core eudicot clades, they can be used to infer the age or

    diversification date of core eudicots. The NJ trees reconstructed by

    Ka values and Ks values were rooted at the monocot lineage (see

    Results). Relative support for each node was evaluated using the

    bootstrap test and the interior branch test implemented in MEGA

    2.1 with 2000 replicates. The latter test is constructed based on the

    interior branch length and its standard error. If this value is higher

    than 95% for a given branch, then the inferred length for thatbranch is considered significantly higher than 0 (Kumar et al.

    2001). To compare the evolutionary rates of sampled fern, pine,

    monocot, and dicot lineages, Tajimas relative rate test (1993) im-

    plemented in MEGA 2.1 was applied. Because the method does not

    distinguish between Ka, and Ks, the first and second codon posi-

    tions of the combined 61 cp protein-coding genes were used

    instead.

    Calibration Points

    To date the divergence between the monocots and the dicots, three

    split events (see Figs. 1 and 4) with reliable fossil dates were used asreference nodes: (C1) the Psilotum (fern)seed plant split (400420

    Myr old [Pryer et al. 2001]); (C2) the Pinus (conifer)angiosperm

    split (280310 Myr old); and (C3) the maizewheat split (5060

    Myr old). Since uncertainties about the age of the reference node

    were a probable reason behind the discrepancies among previous

    estimates of angiosperm origin (Bremer 2000; Sanderson and Doyle

    2001), we have carefully examined the dates of our calibration

    nodes.

    Fossil Dates

    Psilotum has been repeatedly suggested as a member of ferns by

    molecular data (e.g., Nickrent et al. 2000; Pryer et al. 2001), but the

    architecture of its sperm cell suggests that Psilotum is an early

    divergent fern (Renzaglia et al. 2001) with relatively remote affin-

    ities to Ophioglossaceae (a basal fern family) and Equisetaceae

    (sphenopsids). Kenrick and Crane (1997) considered that the basal

    dichotomy of Euphyllophytina occurred in the earlymid Devo-

    nian (ca. 400420 Myr ago) and resulted in two clades: one con-

    taining the extinct Psilophyton and the other ferns, horsetails, and

    seed plants. We took this splitting date as the lower bound for the

    divergence between Psilotum and seed plants.

    Pinus is a genus of Pinaceae, which contains over 230 species

    and is the largest and most basal family of conifers (Hart 1987;

    Price et al. 1993; Chaw et al. 1997). Delevoryas and Hope (1973)

    and Miller (1977, 1988) proposed that the Triassic (206248 Myr

    ago) period may represent a time when modern conifers were

    evolving. Cladistic and stratigraphic analyses of living seed plants

    (Doyle and Donoghue 1987; Crane 1988; Doyle 1998) suggested

    that diversification of modern seed plants occurred from the lower

    Pennsylvanian to the upper Triassic (215310 Myr ago). The

    earliest fossil evidence of trees bearing the typical conifers bisac-

    cate pollen that germinates distally dates from the late Carbonif-

    erous to early Permian (ca. 250290 Myr ago) and conifer relatives

    are known from ca. 310 Myr ago (Rai et al. 2003). Gymnosperms

    and angiosperms are the two major taxa of seed plants, distinct

    since the end of the Carboniferous, 300 Mya (Bow et al. 2003).

    From the above considerations, we took 280310 Myr as an upper

    bound for the split between the conifer and the angiosperm line-

    ages.

    Fossil leaves of rice (belonging to the grass family Poaceae)have been described from the upper Eocene, about 40 Myr ago

    (Stebbins 1981), and the earliest unequivocal evidence of grass

    fossils (including spikelets and inflorescence with pollen) were

    found in PaleoceneEocene deposits, about 5060 Myr ago (Crepet

    and Feldman 1991). Initial radiation of the grass family was sug-

    gested to be 65 Myr ago (Stebbins 1987; Thomasson 1987). Bremer

    (2000) regarded the 5070 Myr ago estimate of a maizewheat

    divergence used by Wolfe et al. (1989) as rather uncertain. More-

    over, phylogenetic analyses of the cp rpl16 intron sequences (Zhang

    2000), eight character sets (GPWG 2001), cp genome structure

    (Ogihara et al. 2002), and cp genomic comparison (Matsuoka et al.

    2002) indicated that in the grass family, Oryzoideae (rice) and

    Pooideae (wheat) diverged after the subfamily Panicoideae (maize),which was preceded by four other subfamilies (Zhang 2000).

    Therefore, we took 5060 Myr as a reasonable estimate of the

    maizewheat split.

    Results

    Cp Genome Data

    The concatenated lengths of all known cp functional

    protein-coding genes (Appendix 1) in the 12 sampledspecies (Table 2) range from 58,095 bp in the Triticum

    to 71,509 bp in the Marchantia; the average is 63,661

    4,764 bp. Sixty-one cp protein-coding genes, which

    encode two envelope membrane proteins (cemA,

    ycf9), 1 maturase (matK), 1 protease (clpP), 34 pho-

    tosynthetic light reactions (atpA, atpB, atpE, atpF,

    atpH, atpI, petA, petB, petD, petG, petL, petN, psaA,

    psaB, psaC, psaI, psaJ, psbA, psbB, psbC, psbD, psbE,

    psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN,

    psbT), 18 ribosomal proteins (rpl2, rpl14, rpl16, rpl20,

    rpl32, rpl33, rpl36, rps2, rps3, rps4, rps7, rps8, rps11,

    rps12, rps14, rps15, rps18, rps19), 4 RNA polym-

    erases (rpoA, rpoB, rpoC1, rpoC2), and 1 cytochrome

    c biogenesis protein (ccsA), are in common to all 12

    taxa. After elimination of unknown sites, regions

    difficult to align, start and stop codons, and all gaps,

    39,507 sites were used for comparison and tree

    reconstruction.

    As shown in Table 3 (the first row), the 12 cp ge-

    nomic sequences are AT-rich. This bias is particularly

    strong at the third codon positions, primarily because

    of the high T nucleotide contents. These data are

    consistent with the high AT content found earlier in

    the plastid genome (Whitfeld and Bottemley 1983).

    Across the 11 tracheophytes nucleotide base compo-

    sitions are homogeneous at the first and second co-

    428

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    6/18

    on positions (v2 test, p = 0.793 and 0.981) but not

    o at the third codon positions (p < 0.000). The Gontent is particularly high at the first codon posi-

    ions in all taxa and Marchantia much prefers the use

    f synonymous codons ending with A or T.

    The mean Ka/Ks ratio for all species pairs is 0.19.

    The mean Ka/Ks ratio difference between the mono-

    ot (0.156) and the dicot (0.158) lineages is small.

    These data are suggestive of stringent selective con-

    traints on amino acid substitutions and correlate

    well with the observed higher GC contents at the first

    nd second positions (Table 3).

    The Inferred Phylogenetic Trees

    Figure 1A was simplified from the topology of the

    maximum parsimony (MP) trees reconstructed with

    multigenes by Qiu et al. (1999) and by S. P. Soltis

    t al. (1999). Figure 1B is a NJ tree reconstructed with

    Ka values using Marchantia as the outgroup. The

    opology of this tree strongly indicates that, to the

    xclusion of the fern (Psilotum) lineage, the seed

    lants form a monophyletic clade, within which the

    onifer (Pinus) lineage and the angiosperms comprise

    wo separate subgroups. The sampled angiosperms

    re subdivided into two well-supported lineages, the

    monocots and the eudicots. Both bootstrap and in-

    erior branch tests for the above-mentioned major

    tracheophyte lineages are 100%, and the latter test

    yielded a higher percentage support for the rice +wheat clade. The phylogenetic relationships of the

    monocot lineage and the six core eudicots generally

    agree well with those in recent multigene trees (Fig.

    1A [Qiu et al. 1999; PS Soltis et al. 1999; DE Soltis et

    al. 2000]) except that in our NJ tree the Caryophyll-

    ales (represented by Spinacia) and asterid (repre-

    sented by Nicotiana) are well resolved as sister clades.

    This relationship was previously revealed in the trees

    made by Wolfe et al. (1989) and Goremykin et al.

    (2003).

    The NJ tree reconstructed from the Ks values ap-

    pears to be unreliable because it placed Arabidopsis as

    basal to the remaining dicots (data not shown),

    contrary to the most recent multigene phylogenies of

    angiosperms (Mathews and Donoghue 1999; Qiu et

    al. 1999; PS Soltis et al. 1999; DE Soltis et al. 2000;

    Chase et al. 2000). These data caused us to question if

    the third codon position, where most synonymous

    substitution occurs, is saturated with substitutions.

    To assess levels of sequence saturation with the

    concatenated cp genes, pairwise uncorrected numbersof transitions and transversions (uncorrected P) were

    plotted against corrected (Kimuras two-parameter)

    sequence distance (Fig. 2). Sixty-six paired points [12

    (12 ) 1)/2] are presented in Fig. 2. The curves of

    both uncorrected transitions and transversions

    able 3. Nucleotide base composition (%) of the concatenated 61 cp protein-coding genes in Marchantia and 11 sampled tracheophytes

    odon positiona A C G T pb

    All 33.2/29.2 0.4 (1.4%) 14.4/18.4 0.5 (2.7%) 17.8/21.5 0.4 (1.9%) 34.6/31.0 0.5 (1.6%) 0.000

    st 30.3/28.2 0.4 (1.4%) 16.9/19.6 0.3 (1.5%) 29.1/30.2 0.3 (1.0%) 23.7/22.0 0.3 (1.4%) 0.793

    nd 29.1/27.2 0.3 (1.1%) 20.8/21.7 0.2 (0.9%) 17.4/19.1 0.3 (1.6%) 32.6/32.0 0.3 (0.9%) 0.981

    rd 40.1/32.3 0.8 (2.5%) 5.3/13.8 1.0 (7.2%) 7.0/15.0 0.8 (5.3%) 47.5/39.0 1.1 (2.8%) 0.000

    The start and stop codons were not included in analysis. Data on Marchantia are before the slash and the average of 11 sampled

    acheopytes is after the slash and presented as mean SE (coefficient of variation).

    Probabihty (p) was based on v2 tests for homogeneity across the 11 sampled tracheophytes using PAUP 4.0b1 (Swofford 1998).

    Fig. 2. Uncorrected pairwise sequence

    divergence (P distance) plotted against

    corrected distances (Kimura two-pa-

    rameter) for transitions (Ts) and trans-

    versions (Tv) at first and second codon

    positions (A) and third codon position

    (B). Each plot presents 66 data points.

    429

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    7/18

    against sequence divergence at the first two codon

    positions were nearly linear (Fig. 2A). In contrast, the

    curves at the third codon position revealed a signifi-

    cant trend toward asymptotic saturation (Fig. 2B),

    indicating that substitutions at the third codon posi-

    tion are saturated and not suitable for inferring

    phylogenetic relationships among the sampled taxa

    or for dating purposes. For this reason, we used only

    the NJ tree based on the Ka values.

    Because Fig. 1B did not resolve the relationships

    among sampled eudicots, we reconstructed a phylo-

    genetic tree of eudicots using the three monocots as

    the outgroup. The NJ tree based on the Ks

    values

    yielded a reasonable topology (i.e., in agreement with

    the phylogenetic relationships of the orders of flow-

    ering plants compiled by APG [1998]) for the sampled

    six eudicots (Fig. 3A), whereas the NJ tree based on

    Ka values did not (data not shown). Based on the

    limited number of eudicots, Fig. 3A suggests that the

    six core eudicots first split (at node B) into two well-

    supported monophyletic clades, the rosids (repre-

    sented by Oenothera, Arabidopsis, Lotus, and Medi-

    cago) and the asterids + Caryophyllales (represented

    by Nicotiana and Spinacia, separately).

    Both NJ trees reconstructed from Ka (Fig. 1B) and

    Ks (Fig. 3A) values suggested a close relationship

    between the Ehrhartoideae (rice) and the Pooideae(wheat) with the maize as an outgroup, but the

    bootstrap values for the ricewheat clade are low to

    moderate. Recently, using the NJ method with the

    variable sites of 98 genes (including not only all

    protein-coding but also RNA genes) common to the

    cp genomes of these three cereals and rooting at the

    Nicotiana lineage, Matsuoka et al. (2002) also placed

    maize as sister to the ricewheat clade.

    Phylogenetic Distribution of Cp Genes DuringTracheophyte Evolution

    We examined a total of 98 protein-coding genes

    (Appendix 1) that are present among the 12 studied

    taxa. Figure 1B also presents the protein-coding gene

    numbers held in each sampled species and specific

    gene loss, transfer, and retention in the 11 lineages of

    tracheophytes. Although Martin et al. (2002) have

    done a similar evolutionary analysis for deeper

    groups. Fig. 1B is focused on the tracheophyte line-

    ages using Marchantia as the outgroup with addi-tional 5 core eudicots and 1 fern.

    Compared with the cp genome of bryophyte

    (Marchantia), those of tracheophytes have lost three

    genes: two transporters (cysA and cysT) and one with

    unknown function (cys66) (Fig. 1B). During trache-

    ophyte evolution, the fern (Psilotum) and angiosperm

    lineages have parallel losses of three chlorophyll

    biosynthesis genes, chlB, chlB, and chlN. The seed

    plant lineage has commonly transferred the rpl21

    (Martin et al. 2002 and references cited therein) to its

    nucleus. In the spinach and Arabidopsis lineages that

    gene, however, has been unusually replaced by

    a nuclear RPL21c gene of mitochondrial origin

    (Martin et al. 1990; Gallois et al. 2001).

    Within seed plants, the conifer (Pinus) lineage has

    uniquely lost all 11 NADH dehydrogenase subunit

    genes (ndhAK; 4 are completely missing and 7 are

    pseudogenes [Wakasugi et al. 1994]) but has gained a

    new gene of unknown function, ycf68 (Martin et al.

    2002). The angiosperm lineage has further lost two

    genes, one, psaM, involved in the photosynthetic light

    reaction and the other, ycf12, of uncertain function.

    Within angiosperms the three grasses have lost

    three genes: one metabolism related, accD (Hiratsuka

    et al. 1989; Maier et al. 1995; Ogihara et al. 2002), and

    two genes of unknown function, ycf1 and ycf2. How-

    Fig. 3. Relative branch lengths (based on Ka values) of monocots

    and dicots, using Marchantia (A), Psilotum (B), and Pinus (C) asoutgroup, respectively. Gray branches are with Aratbidopsis, Spi-

    nacia, and Nicotiana excluded (for their slower rates). Branch

    lengths are Ka values per 100 sites. D Rooted NJ tree of the core

    eudicot lineages based on Ks values, using the three grasses as

    outgroup. Italicized numbers denote bootstrap percentages (boot-

    strap test before the slash, interior branch test after the slash).

    Branch lengths are Ks values per 100 sites.

    430

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    8/18

    ver, we found that the grass lineage has also recruited

    ine novel genes; one of them,ycf68, is shared with the

    ine lineage, and the remaining eight, ycf6976, are

    nique. Functions of these genes are not known yet

    nd they have no detectable homology to prokaryotic

    enes (Martin et al. 2002). Except for spinach, all

    ampled eudicots have lost the translational initiation

    actor 1 (infA). According to an extensive survey of

    more than 300 diverse angiosperms by Millen et al.

    2001), the infA gene of the cp genome has repeatedly

    ecome defunct in about 24 separate angiosperm lin-

    ages, including almost all rosid species.

    Nucleotide Substitution Rates

    Before applying molecular calibration, we assessedhe assumption of rate constancy. Fig. 1B shows that

    he branches from the calibration point C1 leading to

    he Psilotum (fern) lineage and the Pinus lineage are

    ot equal in length. The NJ trees in Figs. 3B, C, and

    D, using Marchantia, Pinus, and Psilotum as the

    utgroup, respectively, also indicate that the Ka rates

    n the monocot and the dicot lineages are unequal.

    The monocot lineage has evolved faster than the di-

    ots, by 39.6, 37.3, and 32.3%, respectively, for the

    hree outgroups. In Fig. 1B the branches from node

    A leading to Arabidopsis, Spinacia, and Nicotiana are

    trikingly shorter than those leading to the other

    hree dicots and the monocots. Tajimas relative rate

    est using rice, Marchantia, Psilotum, and Pinus as

    utgroups, respectively, confirmed this observation

    all ps < 0.001). However, exclusion of the above

    hree slower dicot lineages (gray lines in Figs. 3BD)

    ed to even higher estimates of divergence dates (data

    ot shown). We therefore used the entire dataset.

    By applying the equation, r = K/(2T), where K is

    he distance and T is the divergence time between the

    wo taxa compared, nonsynonymous rates were cal-

    brated and are shown in Table 4. Based on the three

    ivergence events, C1, C2, and C3, and the dataset

    with all six dicots, the Ka rates are 0.2150.225 10)9,

    .2320.257 10)9, and 0.1640.197 10)9 substi-

    tutions per nonsynonymous site per year, respec-

    tively. Clearly, these three calibrated Ka rates are

    unequal, differing from one another by from 8%

    [(0.232 ) 0.215)/0.215] to nearly 42% [(0.232 ) 0.164)/

    0.164], and the coniferangiosperms Ka rate is the

    highest.

    Dates of the MonocotDicot Divergence and the

    Origin of Core Eudicots

    Molecular Clock or Rate Constancy Method. The

    date of the monocotdicot divergence was estimated

    by applying the equation T= K/(2r). As indicated in

    Table 4, based on the entire dataset and the calibra-

    tion points C1, C2, and C3, three time estimates for

    the monocotdicot divergence, 206 5217 6, 180 7 200 7, and 237 5285 9 Myr, were

    obtained. These estimates suggest that the monocot

    dicot divergence took place 220 40 Myr ago.

    Using either the Ka or the Ks rates of the maize

    wheat divergence and the mean Ka values (see node B

    in Fig. 1B and Fig. 4) of all six eudicots or the Ksvalues between the rosid clade and the asterid +

    Caryophyllales clade (see node B in Fig. 3A), the

    divergence for core eudicots was estimated to be 154

    7185 8 and 149 2181 3 Myr ago (Table

    4), respectively. These two estimates are close to each

    other, and their average is 170 Myr ago.

    LiTanimura Method. Figure 4 was simplified

    from the phylogenetic tree Fig. 1B with all branch

    lengths indicated. The branch length of core eudicots

    was calculated as the mean length of the branch

    leading from their emergence point (node B) to the six

    core eudicots. We then used the LiTanimura meth-

    od, which uses lineages in which the molecular clock

    holds better than the others, to estimate the diver-gence time at points A and B. For example, we know

    that the branching date for Pinus (node C2) is 280

    310 Myrs ago and want to estimate the branching

    dates between the monocot and the dicot lineages.

    The distances from node C2 to Pinus, monocots, and

    able 4. Estimates of the monocotdicot divergence and the age of core eudicots based on the constant rate method

    Outgroup Calibration event (fossil dates; Myr) Ka Rateb (10)9) Ka Time (Myr)

    Monocotdicot divergence

    Marchantia C1: Fernseed plant divergence (400420) Ka: 18.03 0.22 0.2150.225 Ka: 9.30 0.24 206 5217 6

    silotum C2: Coniferangiospenn divergence (280310) Ka: 14.40 0.29 0.2320.257 Ka: 9.28 0.34 180 7200 7

    inus C3: Maizewheat divergence (5060) Ka: 1.97 0.23 0.1640.197 Ka: 9.35 0.29 237 5285 9

    Origin of core eudicots

    inus C3: Maizewheat divergence (5060) Ka: 1.97 0.23 0.1640.197 Ka: 6.08 0.26 154 7185 8

    Monocots C3: Maizewheat divergence (5060) Ks: 12.10 0.11 1.0081.210 Ks: 36.06 0.51 149 2181 3

    K denotes the number of substitutions per 100 synonymous ( Ks) or nonsynonymous (Ka) sites between pair of taxa or groups.

    Rate (r) is defined as the number of substitutions per site per year, r = K/(2T) (Li and Grauer 1991).

    431

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    9/18

    dicots are 6.05, 9.02 (=3.91 + 4.22 + 0.89), and 7.66

    (=3.91 + 0.90 + 2.85), respectively. Since the

    monocot lineage has a longer branch length than do

    the Pinus and dicot lineages, it is not used. Based on

    the branch length of the dicot lineage, the monocot

    dicot divergence (node A in Fig. 1B and Fig. 4) was

    estimated to be 137152 Myr ago, which is derivedfrom (280 or 310) (0.90 + 2.85)/7.66; the origin of

    core eudicots (node B in Fig. 1B and Fig. 4) was

    estimated as 104115 Myr ago, which is calculated

    from (280 or 310) 2.85/7.66. As shown in Fig. 4 the

    distance from node C1 to dicots is 10.4 (=2.74 +

    3.91 + 3.75), and we assume that the molecular clock

    along the dicot lineage is approximately constant.

    Similarly, using the branching date of Psilotum, 400

    420 Myr ago (node C1 in Fig. 1 and Fig. 4), the

    monocotdicot divergence (node A in Fig. 1 and Fig.

    4) was estimated to be (400 or 420) (0.90 + 2.85)/10.4 = 144151 Myr ago, and the origin of core

    eudicots was estimated to be (400 or 420) 2.85/10.4

    = 110115 Myr ago. Table 5 shows that these dates

    are highly close to those estimated from C2. Com-

    bining estimates calibrated from both C1 and C2, we

    estimated that monocots and dicots diverged at 140

    150 Myr ago and the core eudicot lineages originated

    100115 Myr ago.

    Discussion

    Rate Variation Among Tracheophyte Lineages

    Our phylogenetic analyses (Figs. 1B, 3, and 4) in-

    dicate that nonsynonymous substitution rates of cp

    genomes are unequal among the six eudicot line-

    ages, between the two angiosperm lineages (i.e.,

    monocots and dicots), and among the tracheophyte

    lineages (i.e., all sampled seed plants and a fern,

    Psilotum). These observations were confirmed by

    Tajimas relative rate test using Marchantia as the

    outgroup and the first two coding positions (data

    not shown).

    Using 40 cp proteins, Goremykin et al. (1997)found that the average substitution rates (equiva-

    lent to the Ka rate) along the branches from the

    common node (equivalent to node C1 in our

    Fig. 1B) of seed plants to Nicotiana and to Pinus

    were quite similar. However, our cp genomic data

    (Fig. 1B) suggest that the former branch is signifi-

    cantly longer than the latter when Psilotum is

    used as the outgroup (Tajimas relative rate test:

    p < 0.001).

    As revealed in Fig. 1B and Figs. 3BD, the Ka

    rate in the grass lineage has evolved much fasterthan in the dicots. In addition, Fig. 1B (Ka rate)

    and Fig. 3A (Ks rate) also indicate that among the

    six annual dicots sampled, the Nicotiana and Spi-

    nacia evolved more slowly than the rest. Extensive

    rate variation among annual plants has also been

    observed in other cp genes at nonsynonyrnous sites

    (Wolfe et al. 1987, 1989). Generally, there is a

    resonance of Bousquet et al.s observation (1992)

    on the rbcL gene of seed plant lineages. They

    found that the annual form evolved more rapidly

    on average than the perennial form (represented in

    our study by Psilotum and Pinus) and that the

    grass family has the fastest evolution rate. Com-

    paring the rbcL and ndhF loci in the grass family,

    Gaut et al. (1997) found that at Ka sites rate var-

    iation was not correlated between those two plastid

    loci. Most recently, examining the whole cp ge-

    nomes (106 genes) of maize, rice, and wheat,

    Matsuoka et al. (2002) also found variation in Karates. The Ka rate variation seems to correlate well

    with the evolutionary divergence pattern depicted

    in the cp genomic NJ tree (Fig. 1B), which shows

    that the angiosperm lineage has evolved faster than

    the gymnosperm lineage and that the latter in turn

    has evolved more rapidly than the fern lineage.

    Since the number of completely sequenced cp ge-

    Table 5. Ages of nodes (Myr) inferred from the phylogenetic tree

    in Fig. 3 using the LiTanimura method (1987)

    Node C1 (400420 Myr) C2 (280310 Myr)

    C1 a 380421

    C2 295309

    A l44151 137152

    B 110115 104115

    a Nonapplicable.

    Fig. 4. A phylogenetic tree simplified from Fig. 1B. Nodes and

    lineages correspond to those in Fig. 1B. C1 was used to estimate the

    divergence dates of C2, the monocotdicot divergence (node A),and the origin of core eudicots (node B) (refer to Table 5 and the

    text for detail). The number on each branch is the Ka value per 100

    sites.

    432

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    10/18

    omes is quickly increasing, this trend may be re-

    ested soon.

    Significant rate variation in the cp genomes of the

    racheophyte lineages is also consistent with the fin-

    ing of P. S. Soltis et al. (2002), who studied one

    uclear and three plastid genes using MP analyses. In

    ummary, the molecular clock hypothesis does not

    old for the Ka rates among the cp genomes of tra-

    heophyte plants.

    Reference Fossil Dates and the Phylogenetic Tree

    Obtained

    n the Data and Methods section we have carefully

    ross-examined the three fossil dates by adopting

    pdated phylogenies and documented fossil records.

    Bremer (2000, pp 4709, 4710) suggested that in

    hylogenetic dating rate calibration rather than

    nequal substitution rates is the major source ofrror and is behind the discrepancies in earlier es-

    imates of monocot and flowering plant evolution.

    ndeed, in Table 4 the three calibrated rates based

    n the molecular clock are discrete, and the ob-

    ained dates for monocotdicot divergence do not

    gree with one another. To evaluate if the fossil

    ates and the cp Ka rates corresponded well with

    ach other with respect to the two dating methods,

    we also used the divergence rates and the Ka dis-

    ances (Table 4) from the fernseed plant and con-

    ferseed plant splits to date the others divergences.The rate constancy method led to an estimate of

    50390 Myr ago for the former event and 320335

    Myr ago for the latter. These two estimates differ

    widely from the fossil records. In contrast, the di-

    ergence times (Table 5) of these two events esti-

    mated from the LiTanimura method are highly

    ompatible with the paleobotanical data.

    Sanderson and Doyle (2001) proposed that (1)

    iases in the data or the statistical estimation

    method used, (2) variation in rate across sites which

    causes sequence divergences to be estimated in-

    orrectly, and (3) incorrect phylogenies are the

    nderlying sources of error in molecular dating. P.

    . Soltis et al. (2002) added that inadequate sam-

    ling of taxa...can compound the problem. The

    ame concern could be raised about the results we

    resent here. However, the effect of these problems

    s likely to have been considerably reduced by the

    ampling of 12 evolutionary successive land plants

    Table 2; including all three living subclasses of

    udicots), the use of 61 genes (>39,000 bp long)with different functions from the complete cp ge-

    omes, and the highly reliable NJ tree (Fig. 1B),

    which is consistent with the NJ tree of Goremykin

    t al. (1997), inferred from concatenating 14,295

    mino acids of cp genomes.

    Comparison of Estimates from the Molecular Clock

    and LiTanimura Methods

    Tables 4 and 5 show that the dates of the monocot

    dicot split and the origin of core eudicots estimated

    by the rate constancy and LiTanimura methods

    differ greatly, with estimates from the former method

    predating the latter by 50 Myr. Estimates calibrated

    from nodes C1 and C2 using the molecular clockmethod vary more than those obtained from the Li

    Tanimura method.

    In the rate constancy method we used the arith-

    metic mean of all pairwise Ka distances between

    monocots and dicots to estimate the divergence date

    (Table 4). As a result, the obtained dates for the

    monocotdicot split (220 Myr) and for the origin of

    the core eudicots (170 Myr) appear to be severely

    overestimated because the three high-rate grasses

    were included in the distance calculation. This was

    also the case in most previous estimates (Wolfe et al.1989; Martin et al. 1989, 1993; Brandl et al. 1992;

    Laroche et al. 1995; Yang et al. 1999), which not only

    used the molecular clock hypothesis but also included

    one (maize) or several fast-evolving grass species (or

    annual Liliales [such as Ramshaw 1972]).

    Together with the preceding age estimates for the

    monocotdicot split and the origin of core eudicots,

    we concluded that the LiTanimura method can

    substantially reduce the effect of rate variation among

    lineages and provide an estimate more in line with

    known fossil data.

    Comparison of Our Estimates with Previous Estimates

    Since our estimates based on the rate constancy

    method seem unreliable, we shall compare only esti-

    mates obtained from the LiTanimura method with

    those from other methods. Goremykin et al. (1997)

    used a very similar framework of the LiTanimura

    method (1987) and claimed that their approach is

    independent of the rate fluctuation on the grass

    (high rate) and Marchantia (low rate) branches.

    They estimated the divergence time between the Zea

    Oryza lineage and the Nicotiana lineage to be 160

    Myr ago, which predates ours by about 1020 Myr

    (Table 5). Based on our cp genome data (Figs. 1B and

    3A), the Nicotiana lineage has the slowest Ka and Ksrates among the sampled dicots but was used as half

    of the denominator by Goremykin et al. (1997) in

    estimating the monocotdicot divergence time. Since

    our estimates of the dates for the monocotdicotlineage and the origin of core eudicots were based on

    the mean branch length of six dicots, our data should

    be more reliable than data based on single species.

    In order to reduce the effects of unequal rates,

    Bremer (2000) used the mean branch lengths from a

    433

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    11/18

    group of terminal taxa to their common node (which

    has a known fossil age) for calculating the change

    rate (distance/age by Bremers definition). Using the

    rbcL gene, the MP tree of monocots, and the eight

    reference nodes with known fossil dates, Bremer

    (2000) estimated the split between Acorus, presuma-

    bly the basalmost extant monocot (APG 1998; Chase

    et al. 2000), and the remaining monocots at 134 Myr

    ago. According to the integrated and widely usedphylogenetic tree for the orders of flowering plants

    (APG 1998), the separation of the monocot lineage

    from the other magnoliids predated the branching-off

    of eudicots. Therefore, Bremers estimate is compat-

    ible with ours for the monocotdicot split (140150

    Myr ago) and the core eudicot divergence (100115

    Myr ago).

    Using a single gene (rbcL) and the NPRS meth-

    od, two genes (plus 18S rRNA) and maximum

    likelihood analyses, and the calibration date of

    Marchantia (450 Myr ago), Sanderson (1997) andSanderson and Doyle (2001) estimated that the age

    of crown angiosperms originated 160 and 140190

    Myr ago, respectively. Combining a three-gene da-

    taset (rbcL, atpB, and 18S rRNA), the NPRS

    method, and the split between Fagales and Cu-

    curbitales (84 Myr ago), Wikstro m et al. (2001)

    proposed the origin of the extant angiosperms to be

    158179 Myr old and that of eudicots to be 131147

    Myr old. These estimates are in good agreement

    with ours, as the dicots we sampled are all eudicots.

    Recent multigene analyses of angiosperm evolutionhave revealed that the monocotdicot divergence

    was preceded by five living basal dicot lineages, the

    Amborellaceae, the Nymphaeales, and a group in-

    cluding Illiciaceae, Trimeniaceae, and Austrobailey-

    aceae (i.e., the so-called ANITA group) (Qiu et al.

    1999; PS Soltis et al. 1999; DE Soltis et al. 2000; see

    Fig. 1A), and an extinct basal angiosperm lineage,

    the Archaefructaceae (Sun et al. 2002). Therefore,

    previous estimates for angiosperms origin based on

    the monocotdicot split have underestimated the age

    of angiosperms themselves. The above authors es-

    timates are consistent with ours if we postulate that

    approximately 20 (=160 ) 140) to 40 (=190 ) 150)

    Myr separates the angiosperm origin and the split

    between the ancestors of the monocot and eudicot

    lineages.

    Our estimated date for the origin of core eudicot

    lineages is 100115 Myr ago (Table 5). This is earlier

    than the many documented fossil-based estimates for

    core eudicots, such as a possible Rhamnaceae/Rosa-

    ceae (both are rosids, represented here by our sam-

    ples: Lotus, Medicago, Arabidopsis, and Oenothera)

    from the early Cenomanian (9497 Myr [Basinger

    and Dilcher 1984]), 89 Myr for the Caparales (rep-

    resented by our sample: Arabidopsis), 84 Myr for

    Myrtales (Magallon et al. 1999) (represented by our

    sample: Oenothera), and 83 Myr for the Caryophyll-

    ales clade (represented by our sample: Spinacia)

    (Magallo n et al. 1999). In addition, our estimate for

    the age of core eudicots is reasonably shorter than the

    fossil age of a basal eudicot, Tetracentraceae, from

    the Barremian (110118 Myr ago [Magallo n et al.

    1999]). Collectively, our cp genomic data indicate

    that the core eudicots age is also older than known

    fossil records indicate.

    Conclusions

    We observed significant Ka rate variation in cp ge-

    nome data among major tracheophyte lineages.

    Therefore, the rate constancy method is not appro-

    priate for dating the divergence between monocots

    and dicots or the age of eudicots, especially if fast-

    evolving monocots are included. Using cp genome

    data, we demonstrated that the LiTanimura method

    gives estimates that better reflect the known evolu-tionary sequence of tracheophyte lineages and cor-

    respond well with the fossil records of calibration

    points we used.

    Combining our estimates calibrated by two known

    fossil nodes and the LiTanimura method, we pro-

    pose that the monocot lineage branched off from

    dicots 140150 Myr ago, in the late Jurassic to early

    Cretaceous, and that the core eudicots radiated 100

    115 Myr ago, between the Albian and the Aptian of

    Cretaceous. These estimates are in accordance with

    those of Sanderson (1997) and P. S. Soltis et al.

    (2002), who analyzed one to three genes and used

    MP and ML branch lengths with the NPRS method.

    In summary, methods that accommodate unequal

    rates give smaller estimates than the rate constancy

    method and appear to agree well not only with one

    another, but also with the recently documented fossil

    evidence.

    Our results confirm all previous conclusions that

    molecular data indicate a pre-Cretaceous origin for

    angiosperms, but our estimates for the monocotdicot divergence postdate previous estimates based

    on the molecular clock hypothesis by at least 50 Myr

    (=200150 Myr ago).

    Acknowledgments. We thank Robert Friedman for critical com-

    ments on an early version of the manuscript and Yoshihiro

    Matsuoka and Shu-Shin Wu for help with the gene group assign-

    ment for the three grasses and other taxa. We also thank the two

    reviewers critical and valuable comments and suggestions. This

    work was supported in part by National Science Council Grant

    NSC912311B001103, and Academia Sinica Grant IB91 to S.M.C.,

    and NIH Grant GM30998 to W.H.L.

    Appendix

    Appendix Table A1 continues on next page.

    434

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    12/18

    p

    gg

    p

    g

    p

    (

    )

    y

    y

    Taxon

    Genea

    Marc

    hantia

    Psilotum

    Pinus

    Triticum

    Oryza

    Zea

    Lotus

    Medica

    go

    Ara

    bidopsis

    Spinacia

    Nicotiana

    Oenot

    hera

    accD

    951b

    933

    966

    c

    1,506

    1,512

    1,467

    1,569

    1,539

    1,317

    (ORF

    316)

    atpA

    1,524

    1,527

    1,485

    1,515

    1,524

    1,524

    1,533

    1,536

    1,524

    1,296

    1,524

    1,518

    atpB

    1,479

    1,479

    1,479

    1,497

    1,497

    1,497

    1,497

    1,497

    1,497

    1,497

    1,497

    1,497

    atpE

    408

    402

    414

    414

    414

    414

    402

    402

    399

    405

    402

    402

    atpF

    555

    555

    555

    552

    543

    552

    555

    552

    555

    555

    555

    555

    atpH

    246

    246

    246

    246

    246

    246

    246

    246

    246

    246

    246

    246

    atpI

    747

    747

    747

    744

    744

    744

    744

    738

    750

    744

    744

    744

    ccsA

    963

    933

    963

    969

    966

    966

    972

    972

    987

    972

    942

    960

    (ORF

    320)

    (ORF320)

    (ycf5)

    (ORF321)

    (ORF321)

    (ycf5)

    (ycf5)

    (ycf5)

    (ycf5)

    (ycf5)

    (ycf5)

    cemA

    1,305

    1,350

    786

    693

    693

    693

    690

    690

    690

    702

    690

    690

    (ORF

    434)

    (ycf10)

    (ORF261)

    (ORF230)

    (ycf10)

    (ORF229)

    (ycf10)

    chlB

    1,542

    1,533

    (ORF

    513)

    chlL

    870

    876

    (frxC)

    chlN

    1,398

    1,404

    (108667110064)

    chlP

    612

    597

    591

    651

    651

    651

    591

    588

    591

    591

    591

    750

    (ORF

    203)

    (ORF216)

    cysA

    1,113

    (mbpX

    )

    cysT

    867

    (ORF

    288)

    infA

    237

    243

    237

    342

    324

    324

    177

    Wd

    matK

    1,113

    1,512

    1,548

    1,629

    1,629

    1,635

    1,527

    1,521

    1,581

    1,518

    1,530

    1,539

    (ORF

    370i)

    (ORF542)

    ndhA

    1,107

    1,116

    1,089

    1,089

    1,089

    1,092

    1,077

    1,083

    1,098

    1,092

    1,092

    (ndh1)

    ndhB

    1,504

    1,491

    Ye

    1,533

    1,533

    1,533

    1,164

    1,164

    1,164

    1,533

    1,533

    1,533

    (ndh2)

    ndhC

    363

    363

    Y

    363

    363

    363

    363

    363

    363

    363

    363

    363

    (ndh3)

    ndhD

    1,500

    1,545

    Y

    1,503

    1,503

    1,503

    1,494

    1,494

    1,503

    1,383

    1,503

    1,503

    (ndh4)

    ndhE

    303

    321

    Y

    306

    306

    306

    306

    306

    306

    306

    306

    306

    (ndh4L)

    ndhF

    2,079

    2,223

    2,220

    2,205

    2,217

    2,244

    2,235

    2,241

    2,229

    2,223

    2,211

    (ndh5)

    ndhG

    576

    567

    531

    531

    531

    531

    531

    531

    531

    531

    531

    (ORF

    191)

    Continued

    435

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    13/18TableA1.

    Continued

    Taxon

    Genea

    Marc

    hantia

    Psilotum

    Pinus

    Triticum

    Oryza

    Z

    ea

    Lotus

    Medicago

    Ara

    bidopsis

    Spinacia

    Nicotiana

    Oenot

    hera

    ndhH

    1,179

    1,182

    Y

    1,182

    1,182

    1,182

    1,182

    1,182

    1,182

    1,182

    1,182

    1,182

    (ORF392)

    (ORF393)

    ndhI

    552

    498

    Y

    543

    537

    543

    486

    486

    519

    513

    504

    498

    (frxB)

    (ORF178)

    ndhJ

    510

    477

    480

    480

    480

    477

    477

    477

    477

    477

    477

    (ORF

    169)

    (ORF480)

    ndhK

    732

    624

    Y

    738

    741

    747

    693

    684

    678

    846

    744

    744

    (psbG)

    (psbG)

    (psbG)

    (psbG)

    petA

    963

    966

    960

    963

    963

    963

    963

    963

    963

    963

    963

    957

    petB

    648

    648

    648

    648

    648

    705

    648

    648

    648

    648

    636

    648

    petD

    483

    483

    543

    483

    483

    483

    483

    483

    483

    483

    483

    483

    petE

    114

    114

    114

    114

    114

    114

    114

    114

    114

    114

    114

    114

    (ORF

    37)

    (petG)

    (petG)

    (petG)

    (petG)

    (petG)

    (petG)

    (petG)

    (petG)

    (petG)

    petL

    96

    96

    189

    96

    96

    96

    96

    96

    96

    96

    96

    96

    (ORF

    31)

    (ORF62b)

    (ORF31)

    (ORF31)

    (ycf7)

    (ORF31)

    petN

    90

    90

    90

    90

    90

    90

    90

    90

    90

    90

    90

    90

    (5168

    5257)

    (ORF29)

    (ycf6)

    (ORF29)

    (ORF29)

    (ycf6)

    (ycf6)

    (ycf6)

    (ycf6)

    (ycf6)

    (ycf6)

    psaA

    2,253

    2,253

    2,262

    2,253

    2,253

    2,253

    2,253

    2,277

    2,253

    2,253

    2,253

    2,256

    psaB

    2,205

    2,205

    2,205

    2,205

    2,205

    2,208

    2,205

    2,205

    2,205

    2,205

    2,205

    2,205

    psaC

    246

    246

    246

    246

    246

    246

    246

    246

    246

    246

    246

    246

    (frxA)

    psaI

    111

    111

    159

    111

    111

    111

    105

    105

    114

    102

    111

    105

    (ORF

    36b)

    (ORF36)

    psaJ

    129

    129

    135

    129

    135

    129

    135

    135

    135

    135

    135

    132

    (ORF

    42b)

    (ORF44)

    psaM

    99

    99

    93

    (ORF

    32)

    psbA

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    psbB

    1,527

    1,527

    1,527

    1,527

    1,527

    1,527

    1,527

    1,527

    1,527

    1,527

    1,527

    1,527

    psbC

    1,422

    1,386e

    1,422

    1,422

    1,422

    1,422

    1,422

    1,422

    1,422

    1,422

    1,386

    1,422

    (1422)

    psbD

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    1,062

    psbE

    252

    252

    252

    252

    252

    252

    252

    252

    252

    252

    252

    252

    psbF

    120

    120

    120

    120

    120

    120

    120

    120

    120

    120

    120

    120

    psbH

    225

    225

    228

    222

    222

    222

    222

    219

    222

    222

    222

    222

    (ORF

    74)

    psbI

    111

    111

    158

    111

    111

    111

    111

    111

    111

    111

    111

    111

    (ORF

    36a)

    (83988508)

    psbJ

    123

    123

    123

    123

    123

    123

    123

    123

    123

    123

    123

    123

    (ORF

    40)

    (ORF40)

    psbK

    168

    177

    180

    186

    186

    186

    186

    186

    180

    180

    186

    180

    (ORF

    55)

    (ORF98)

    psbL

    117

    117

    117

    117

    117

    117

    117

    117

    117

    117

    117

    117

    (ORF

    38)

    436

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    14/18psbM

    105

    105

    114

    105

    105

    105

    105

    105

    105

    105

    105

    105

    (ORF34)

    psbN

    132

    132

    132

    132

    132

    132

    132

    132

    132

    132

    132

    132

    (ORF43)

    psbT

    108

    99

    108

    117

    108

    102

    102

    108

    102

    102

    105

    108

    (ORF35)

    (ORF35)

    (ORF33)

    rbcL

    1,428

    1,428

    1,428

    1,434

    1,434

    1,431

    1,428

    1,428

    1,440

    1,428

    1,434

    1,428

    rpl2

    612

    834

    831

    822

    822

    822

    825

    792

    825

    819

    825

    825

    (ORF203)

    rpl14

    369

    369

    369

    372

    372

    372

    369

    369

    369

    366

    372

    369

    rpl16

    432

    423

    405

    411

    411

    411

    408

    408

    408

    408

    405

    408

    rpl20

    351

    345

    360

    360

    360

    360

    366

    360

    354

    387

    387

    393

    rpl21

    351

    390

    rpl22

    360

    351

    429

    447

    450

    447

    483

    600

    468

    414

    rpl23

    276

    273

    276

    282

    282

    282

    282

    282

    282

    W

    282

    282

    (ORF42)

    rpl32

    210

    183

    213

    192

    192

    180

    153

    180

    159

    174

    168

    156

    (ORF69)

    (ORF63)

    rpl33

    198

    198

    207

    201

    201

    201

    201

    201

    201

    201

    201

    201

    rpl36

    114

    114

    114

    114

    114

    114

    114

    114

    114

    114

    114

    114

    (sec

    X)

    rpoA

    1,023

    1,023

    1,008

    1,020

    1,014

    1,020

    1,002

    1,002

    990

    1,008

    1,014

    1,104

    rpoB

    3,198

    3,201

    3,228

    3,321

    3,228

    3,228

    3,213

    3,213

    3,219

    3,213

    3,213

    3,219

    rpoC1

    2,055

    2,025

    2,091

    2,052

    2,049

    2,052

    2,049

    2,061

    2,043

    2,034

    2,067

    2,040

    rpoC2

    4,161

    4,227

    3,675

    4,440

    4,542

    4,584

    3,999

    4,145

    4,031

    4,086

    4,179

    4,161

    rps2

    708

    720

    705

    711

    711

    711

    711

    711

    711

    711

    711

    711

    rps3

    654

    663

    654

    720

    720

    675

    657

    636

    657

    657

    657

    657

    rps4

    609

    600

    597

    606

    606

    606

    606

    606

    606

    606

    606

    612

    rps7

    468

    468

    468

    471

    471

    471

    468

    474

    468

    468

    468

    468

    rps8

    399

    399

    399

    411

    411

    411

    405

    405

    405

    405

    405

    417

    rps11

    393

    393

    393

    432

    432

    432

    417

    417

    417

    417

    426

    435

    rps12

    372

    372

    372

    369

    375

    375

    372

    372

    372

    372

    372

    372

    rps14

    303

    303

    300

    312

    312

    312

    303

    303

    303

    303

    303

    303

    rps15

    267

    261

    267

    273

    273

    237

    273

    276

    267

    273

    264

    264

    rps16

    258

    189

    258

    243

    240

    267

    258

    267

    rps18

    228

    228

    303

    513

    492

    513

    315

    297

    306

    306

    306

    306

    rps19

    279

    279

    279

    282

    282

    282

    279

    279

    279

    279

    279

    279

    ycf1f

    3,207

    5,112

    5,271

    Y

    1,032

    5,502

    1,053

    7,305

    (ORF1068)

    (ORF1756)

    (ORF350)

    ycf2g

    6,411

    6,942

    6,165

    6,897

    5,658

    6,885

    6,396

    6,843

    6,843

    (ORF2136)

    (ORF2054)

    ycf3

    51

    3e

    516

    510

    513

    510

    513

    381

    381

    381

    498

    507

    477

    (ORF167)

    (ORF169)

    (ORF170)

    (ORF170)

    (37

    8)

    ycf4

    555

    555

    555

    558

    558

    558

    603

    576

    555

    555

    555

    558

    (ORF184)

    (ORF184)

    (ORF185)

    (ORF185)

    Continued

    437

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    15/18TableA1.

    Continued

    Taxon

    Genea

    Marchantia

    Psilotum

    Pinus

    Triticum

    Oryza

    Zea

    Lotus

    Medicago

    Ara

    bidopsis

    Spinacia

    Nicotiana

    Oenot

    hera

    ycf9

    189

    189

    189

    189

    189

    189

    189

    189

    189

    189

    189

    189

    (ORF62)

    (ORF62)

    (ORF62)

    (ORF62)

    ycf12

    102

    102

    102

    (ORF33)

    (ORF33)

    ycf15

    300

    204

    234

    192

    264

    Y

    (ORF99)

    (140818141021)

    (ORF77)

    (9084891039)

    ycf66

    408

    (ORF135)

    ycf68

    228

    435

    402

    405

    (ORF75a)

    (9299193425)

    (ORF133)

    (ORF133)

    ycf69

    177

    216

    177

    396

    (124696124872)

    (ORF72)

    (ORF58)

    (ORF131)

    ycf70

    129

    270

    210

    (1453814666)

    (ORF91)

    (ORF69)

    ycf71

    153

    249

    225

    (8077380925)

    (ORF82)

    (ORF75)

    ycf72

    414

    414

    414

    (8104881461)

    (ORF137)

    (ORF137)

    ycf73

    750

    750

    522

    (8375884507)

    (ORF249)

    (ORF173)

    ycf74

    150

    330

    150

    (9446794616)

    (ORF109)

    (ORF49)

    ycf75

    192

    192

    (ORF63)

    (ORF63)

    ycf76

    255

    258

    258

    (124382124636)

    (ORF85)

    (ORF85)

    TotalNo.genes

    87

    81

    73

    84

    85

    86

    77

    74

    79

    79

    80

    78

    Totallength

    71,509

    68,355

    60,470

    58,095

    58,677

    58,581

    61,908

    60

    ,296

    63,543

    67,839

    64,551

    70,110

    TotalNo.genes9

    8

    Averagelength,63,661

    4,764

    a

    GenenamesfollowthoseofMartinetal.(2002)and

    Swiss-ProtProteinKnowlegebase(2003)andeachNCBIaccessionofagiv

    entaxon(refertoTable2).

    b

    Genelength(bp

    )isgivenaftereachgenenameunder

    eachspecies.Withinparenthesesare

    thepositionranges(wherenoannotationwasavailablebutaputativelyres

    pectivegenehomologuewas

    detectedusingtheBLASTXinNCBI),ororiginalgenenames,orORFnamesinagiventaxon,respectively.

    c

    Absenceofthegeneinagivenchloroplastgenome.

    d

    Pseudogene.

    e

    Thegenelength

    weusedwasdifferentfromtheNCBIannotationofagivenspeciesdueto

    anearlierstoporlongerreadingfram

    edetected.

    f

    Martinetal.(20

    02)consideredthatthisgeneisnotrelatedtoprokaryoticgenesanddesign

    atedityc

    f78.

    g

    AFtsH-likepro

    teingenedesignatedyc

    f77byMartin

    etal.(2002).

    438

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    16/18

    References

    APG (Angiosperm Phylogeny Group) (1998) An ordinal classifi-

    cation for the families of flowering plants. Annal Mo Bot Gard

    85:531533

    asinger JF, Dilcher DL (1984) Ancient bisexual flowers. Science

    224:511513

    ousquet J, Strauss SH, Doerksen AH, Price RA (1992) Extensive

    variation in evolutionary rate of rbcL gene sequences among

    seed plants. Proc Natl Acad Sci USA 89:78447848

    owe LM, Coat G, dePamphilis CW (2000) Phylogeny of seed

    plants based on all three genomic compartments: Extant gym-

    nosperms are monophyletic and Gnetales closest relatives are

    conifers. Proc Natl Acad Sci USA 97:40924097

    randl R, Mann W, Sprintzl M (1992) Estimation of the monocot

    dicot age through t-RNA sequences from the chloroplast. Proc

    R Soc Lond B 249:1317

    remer K (2000) Early Cretaceous lineages of monocots flowering

    plants. Proc Natl Acad Sci USA 97:47074711

    hase MW, et al. (1993) Phylogenetics of seed plants: An analysis

    of nucleotide sequences from the plastid gene rbcL. Annal Ma

    Bot Gard 80:528580

    hase MW, et al. (2000) Higher-level systematics of the mono-cotyledons: An assessment of current knowledge and a new

    classification. In: Wilson KL, Morrison DA (eds) Monocots:

    Systematics and evolution. Commonwealth Scientific and In-

    dustrial Research Organization, Collingwood, Australia, pp 3

    16

    haw SM, Zharkikh HA, Sung HM, Lau TC, Li WH (1997)

    Molecular phylogeny of extant gymnosperms and seed plant

    evolution: analysis of nuclear 18S rRNA sequences. Mol Biol

    Evol 14:5668

    haw SM, Parkinson CL, Cheng Y, Vincent TM, Palmer JD

    (2000) Seed plant phylogeny inferred from all three plant ge-

    nomes: Monophyly of extant gymnosperms and origin of

    Gnetales from conifers. Proc Natl Acad Sci USA 97:40864091legg MT, Gaut BS, Learn GH Jr, Morton BR (1994) Rates and

    patterns of chloroplast DNA evolution. Proc Natl Acad Sci

    USA 91:67956801

    rane PR (1988) Major clades and relationships in higher

    gymnosperms. In: Beck CB (ed) Origin and evolution of gym-

    nosperms. Columbia University Press, New York, pp 218272

    repet WL, Feldman GD (1991) The earliest remains of grasses in

    the fossil record. Am J Bot 78:10101014

    ronquist A (1988) The evolution and classification of fowering

    plants, 2nd ed. New York Botanical Garden, Bronx, NY

    Darwin C, Darwin F, Seward AC (eds) (1903) More letters from

    Charles Darwin. D. Appleton, New York

    Delevoryas T, Hope RC (1973) Fertile coniferophyte remains from

    the Late Triassic Deep River Basin, North Carolina. Am J Bot

    60:810818

    Doyle JA (1992) Revised palynological correlations of the lower

    Potomac Group (USA) and the Cocobeach sequence of Gabon

    (Barremian-Aptian). Cretaceous Res 13:337349

    Doyle JA (1998) Molecules, morphology, fossils, and the rela-

    tionship of angiosperms and Gnetales. Mol Phylogenet Evol

    9:448462

    Doyle JA, Donoghue MJ (1987) The origin of angiosperms: a

    cladistic approach. In: Friis EM, Chaloner WG, Crane PR (eds)

    The origins of angiosperms and their biological consequences.

    Cambridge University Press, Cambridge, pp 1749

    Gallois JL, Achard P, Green G, Mache R (2001) The Arabidopsis

    chloroplast ribosomal protein L21 is encoded by a nuclear gene

    of mitochondrial origin. Gene 274:179185

    Gantt JS, Baldauf SL, Caline PJ, Weeden NF, Palmer JD (1991)

    Transfer of rpl22 to the nucleus greatly preceded its loss from

    the chloroplast and involved the gain of an intron. EMBO

    J 10:30734078

    Gaut BS, Muse SV, Clark WD, Clegg MT (1992) Relative rates of

    nucleotide substitution at the rbcL locus of monocotyledonous

    plants. J Mol Evol 35:292303

    Gaut BS, Muse SV, Clegg MT (1993) Relative rates of nucleotide

    substitution in the chloroplast genome. Mol Phylogenet Evol

    2:8996

    Gaut BS, Clark LG, Wendel JF, Clegg MT, Muse SV (1997)

    Comparisons of the molecular evolutionary process at rbcL and

    ndhF in the grass family (Poaceae). Mol Biol Evol 14:769777

    Goremykin VV, Hansmann S, Martin WF (1997) Evolutionary

    analysis of 58 proteins encoded in six completely sequenced

    chloroplast genomes: Revised molecular estimates of two seed

    plant divergence times. Pl Syst Evol 206: 337351

    Goremykin VV, Hirsch-Ernst KI, Wolfl S, Hellwig FH (2003)

    Analysis of the Amborella trichopoda chloroplast genome se-

    quence suggests that Amborella is not a basal angiosperm. Mol

    Biol Evol 20:14991505

    GPWG (Grass Phylogeny Working Group) (2001) Phylogeny and

    subfamilial classification of the grasses (Poaceae). Ann Mo Bot

    Gard 88:373373

    Gu Z, Cavalcanti ARO, Chen FC, Bouman P, Li WH (2002) Ex-tent of gene duplication in the genomes of Drosophila, nema-

    tode, and yeast. Mol Biol Evol 19:256262

    Hallick RB, Bairoch A (1994) Proposals for the naming of chlo-

    roplast genes. III. Nomenclature for open reading frames

    encoded in chloroplast genomes. Plant Mol Biol Rep 12:S29

    S30

    Hart JA (1987) A cladistic analysis of conifers: Preliminary results.

    J Arnold Arbor 68:296307

    Herendeen PS, Crane, PR (1995) The fossil history of the mono-

    cotyledons. In: Rudall PJ, Cribb PJ, Cutler DF, Humphries CJ

    (eds) Monocotyledons: Systematics and evolution. Royal Bo-

    tanic Gardens, Kew, pp 121

    Hiratsuka J, et al. (1989) The complete sequence of the rice ( Oryzasativa) chloroplast genome: Intermolecular recombination be-

    tween distinct tRNA genes accounts for a major plastid DNA

    inversion during the evolution of the cereals. Mol Gen Genet

    217:185194

    Hughes NF (1994) The enigma ofangiosperm origins. Cambridge

    University Press, Cambridge

    Hupfer H, Swiatek M, Hornung S, Hermann RG, Maier RM, Chiu

    WL, Sears B (2000) Complete nucleotide sequence of the Oe-

    nothera elata plastid chromosome, representing plastome I of

    the five distinguishable euoenothera plastomes. Mol Gen Genet

    263:581585

    Ikeo K, Ogihara Y (2000) Triticum aestivum chloroplast, complete

    genome (unpublished)Katayama H, Ogihara Y (1996) Phylogenetic affinities of the

    grasses to other monocots as revealed by molecular analysis of

    chloroplast DNA. Curr Genet 29:572581

    Kato T, Kaneko T, Sato S, Nakamura Y, Tabata S (2000) Com-

    plete structure of the chloroplast genome of a legume, Lotus

    japonicus. DNA Res 7:323330

    Kenrick P, Crane PR (1997) The origin and early evolution of land

    plants. Nature 389:3339

    Kumar S, Tamura K, Jakobsen IB, Nei M (2001) MEGA 2: Mo-

    lecular evolutionary genetics analysis software. Arizona State

    University, Tempe

    Laroche J, Li P, Bousquet J (1995) Mitochondrial DNA and

    monocotdicot divergence time. Mol Biol Evol 12:11511156

    Li WH, Graur D (1991) Fundamentals of molecular evolution.

    Sinauer Associates, Sunderland, MA

    Li WH, Tanimura M (1987) The molecular clock runs more slowly

    in man than in apes and monkeys. Nature 326:9396

    439

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    17/18

    Lin S, Wu H, Jia H, Zhang P, Dixon R, May G, Gonzales R, Roe

    BA (2000) Medicago truncatula variety Jema Long A-17 chlo-

    roplast, complete sequence (unpublished)

    Lockhart PJ, Howe CJ, Barbrook AC, Larkum AWD, Penny D

    (1999) Spectral analysis, systematic bias, and the evolution of

    chloroplasts. Mol Biol Evol 16:573576

    Magallo n S, Sanderson MJ (2001) Absolute diversification rates in

    angiosperm clades. Int J Org Evol 55:17621780

    Magallo n S, Crane PR, Herendeen PS (1999) Phylogenetic pattern,

    diversity, and diversification of eudicots. Ann Mo Bot Gard

    86:297372

    Maier RM, Neckermann K, Igloi GL, Kossel H (1995) Complete

    sequence of the maize chloroplast genome: Gene content, hot-

    spots of divergence and fine tuning of genetic information by

    transcript editing. J Mol Biol 251:614628

    Martin W, Gierl A, Saedler H (1989) Molecular evidence for pre-

    Cretaceous angiosperm origin. Nature 339:4648

    Martin W, Lagrange T, Li YF, Bisanz-Seyer C, Mache R (1990)

    Hypothesis for the evolutionary origin of the chloroplast ri-

    bosomal protein L21 of spinach. Curr Genet 18:553556

    Martin W, Lydiate D, Brinkmann H, Forkmann G, Saedler H,

    Cerff R (1993) Molecular phylogenies in angiosperm evolution.

    Mol Biol Evol 10:140162Martin W, Stoebe B, Goremykin V, Hansmann S, Hasegawa M,

    Kowallik KV (1998) Gene transfer to the nucleus and the

    evolution of chloroplasts. Nature 393:162165

    Martin W, et al. (2002) Evolutionary analysis of Arabidopsis, cy-

    anobacterial, and chloroplast genomes reveals plastid phylog-

    eny and thousands of cyanobacterial genes in the nucleus. Proc

    Natl Acad Sci USA 99:1224612251

    Mathews S, Donoghue MJ (1999) The root of angiosperm phy-

    logeny inferred from duplicate phytochrome genes. Science

    286:947950

    Matsuoka Y, Yamazaki Y, Ogihara Y, Tsunewaki K (2002) Whole

    chloroplast genome comparison of rice, maize, and wheat: im-

    plications for chloroplast gene diversification and phylogeny ofcereals. Mol Biol Evol 19:20842091

    Millen RS, Olmstead RG, Adams KL, Palmer JD, Lao NT, Heggie

    L, Kavanagh TA, Hibberd JM, Gray JC, Morden CW, Calie

    PJ, Jermiin LS, Wolfe KH (2001) Many parallel losses of infA

    from chloroplast DNA during angiosperm evolution with

    multiple independent transfers to the nucleus. Plant Cell

    13:645658

    Miller Jr CN (1977) Mesozoic conifers. Bot Rev 43:217280

    Miller Jr CN (1988) The origin of modern conifer families. In: Beck

    CB (ed) Origin and evolution of gymnosperms. Columbia

    University Press, New York, pp 448486

    Muse SV, Gaut BS (1997) Interlocus comparisons of the nucleotide

    substitution process in the chloroplast genome. Genetics146:393399

    Nicholas KB, Nicholas HB Jr (1997) GeneDoc: Analysis and vis-

    ualization of genetic variation. http://www.cris.com/Ketchup/

    genedoc.shtml

    Nicholas KJ, Tiffney BH, Knoll AH (1983) Patterns in vascular

    land plant diversification. Nature 303:614616

    Nickrent DL, Parkinson CL, Palmer JD, Duff RJ (2000) Multigene

    phylogeny of land plants with special reference to bryophytes

    and the earliest land plants. Mol Biol Evol 17:18851895

    Ogihara Y, Isono K, Kojima T, Endo A, Hanaoka M, Shiina T,

    Terachi T, Utsugi S, Murata M, Mori N, Takumi S, Ikeo K,

    Gojobori T, Murai R, Murai K, Matsuoka Y, Ohnishi Y, Tajiri

    H, Tsunewaki K (2002) Structural features of a wheat plastome

    as revealed by complete sequencing of chloroplast DNA. Mol

    Gen Genomics 266:740746

    Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S,

    Umesono K, Shiki Y, Takeuchi M, Chang Z, Aota S, Inokuchi

    H, Ozeki H (1986) Chloroplast gene organization deduced from

    complete sequence of liverwort Marchantia polymorpha chlo-

    roplast DNA. Nature 322:572574

    Palmer JD (1985a) Comparative organization of chloroplast ge-

    nomes. Annu Rev Genet 19:325354

    Palmer JD (1985b) Evolution of chloroplast and mitochondrial

    DNA in plants and algae. In: MacIntyre RJ (ed) Molecular

    evolutionary genetics. Plenum Press, New York, pp 131240

    Parkinson CL, Adams KL, Palmer JD (1999) Multigene analyses

    identify the three earliest lineages of extant flowering plants.

    Curr Biol 9:14851488

    Price RA, Thomas J, Strauss SH, Gadek PA, Quinn CJ, Palmer JD

    (1993) Familial relationships of the conifers from rbcL sequence

    data. Am J Bot 80:172

    Pryer KM, Schneider H, Smith AR, Cranfill R, Wolf PG, Hunt JS,

    Sipes SD (2001) Horsetails and ferns are a monophyletic group

    and the closest livingrelatives to seed plants. Nature 409:618622

    Qiu YL, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis

    M, Chen Z, Savolainen V, Chase MW (1999) The earliest an-

    giosperms: Evidence from mitochondrial, plastid and nuclear

    genomes. Nature 402:404407

    Rai HS, OBrien HE, Reeves PA, Olmstead RG, Graham SW

    (2003)1 Inference of higher-order relationships in the cycadsfrom a large chloroplast data set. Mol Phylogenet Evol 29:350

    359

    Ramshaw JAM, Richardson DL, Meatyard BT, Brown RH, Ri-

    chardson M, Thompson EW, Boulter D (1972) The time of

    origin of the flowing plants determined by using amino acid

    sequence data of cytochrome C. New Phytol 71:773779

    Renzaglia KS, Johnson TH, Gates HD, Whittier DP (2001) Ar-

    chitecture of the sperm cell ofPsilotum. Am J Bot 88:11511163

    Rost B (1999) Twilight zone of protein sequence alignments. Pro-

    tein Eng 12:8594

    Saito N, Nei M (1987) The neighbor-joining method: A new

    method for reconstructing phylogenetic trees. Mol Biol Evol

    4:406425Sanderson MJ (1997) A nonparametric approach to estimating

    divergence times in the absence of rate constancy. Mol Biol

    Evol 14:12181231

    Sanderson MJ, Doyle JA (2001) Sources of error and confidence

    intervals in estimating the age of angiosperms from rbcL and

    18S rDNA data. Amer J Bot 88:14991516

    Sato S, Nakamura Y, Kaneko T, Asamizu E, Tabata S (1999)

    Complete structure of the chloroplast genome of Arabidopsis

    thaliana. DNA Res 6:283290

    Schmitz-Linneweber C, Maier RM, Alcaraz JP, Cottet A, Herr-

    mann RG, Mache R (2001) The plastid chromosome of spinach

    (Spinacia oleracea): Complete nucleotide sequence and gene

    organization. Plant Mol Biol 45:307315

    Shinozaki K, et al. (1986) The complete nucleotide sequence of

    tobacco chloroplast genome: Its gene organization and ex-

    pression. EMBO J 5:20432049

    Soltis DE, et al. (2000) Angiosperm phylogeny inferred from 18S

    rDNA, rbcL, and atpB sequendes. Bot J Linn Soc 133:381461

    Soltis PS, Soltis DE, Chase MW (1999) Angiosperm phylogeny

    inferred from multiple genes: A research tool for comparative

    biology. Nature 402:402404

    Soltis PS, Soltis DE, Savolainen V, Crane PR, Barraclough TG

    (2002) Rate heterogeneity among lineages of tracheophytes:

    Integration of molecular and fossil data and evidence for

    molecular living fossils. Proc Natl Acad Sci USA 99:4430

    4435

    Stebbins GL (1981) Coevolution of grasses and herbivores. Ann

    Mo Bot Gard 68:7576

    Stebbins GL (1987) Grass systematics and evolution: Past, present

    and future. In: Sonderstrom TR, Hilu KH, Campbell CS,

    440

  • 8/8/2019 Dating the MonocotDicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes

    18/18

    Varkworth ME (eds) Grass systematics and evolution.

    Smithsonian Institution Press, Washington, DC, pp 359367

    tewart WN, Rothwell GW (1993) Paleobotany and the evolution

    of plants, 2nd ed. Cambridge University Press, Cambridge

    toebe B, Martin W, Kowallik KV (1998) Distribution and no-

    menclature of protein-coding genes in 12 chloroplast genomes.

    Plant Mol Biol Rep 16:243255

    un G, Ji Q, Dilcher DL, Zheng S, Nixon KC, Wang X (2002)

    Archaefructaceae, a new basal angiosperm family. Science

    296:899904

    wiss-Prot Protein Knowledgebase (2003) List of chloropl