18
Breakthrough Technologies Insights into the Evolution of Hydroxyproline-Rich Glycoproteins from 1000 Plant Transcriptomes 1[OPEN] Kim L. Johnson 2 , Andrew M. Cassin 2 , Andrew Lonsdale, Gane Ka-Shu Wong, Douglas E. Soltis, Nicholas W. Miles, Michael Melkonian, Barbara Melkonian, Michael K. Deyholos, James Leebens-Mack, Carl J. Rothfels, Dennis W. Stevenson, Sean W. Graham, Xumin Wang, Shuangxiu Wu, J. Chris Pires, Patrick P. Edger, Eric J. Carpenter, Antony Bacic, Monika S. Doblin 3 , and Carolyn J. Schultz 3 * Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); Departments of Biological Sciences and Medicine, University of Alberta, Edmonton, Alberta, Canada, and BGI-Shenzhen, Bei Shan Industrial Zone, Yantian District, Shenzhen, China (G.K.-S.W., E.J.C.); Florida Museum of Natural History, Department of Biology, University of Florida, Gainsville, Florida 32611 (D.E.S., N.W.M.); Botanical Institute, Cologne Biocenter, University of Cologne, D50674 Cologne, Germany (M.M., B.M.); Department of Biology, University of British Columbia, Kelowna, British Columbia V1V 1V7, Canada (M.K.D.) Department of Plant Biology, University of Georgia, Athens, Georgia 3062 (J.L.-M.); University Herbarium and Department of Integrative Biology, University of California, Berkeley, California 94720 (C.J.R.); New York Botanical Garden, Bronx, New York 10458 (D.W.S.); Department of Botany, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada (S.W.G.); Key Laboratory of Genome Science and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China (X.W., S.W.); Division of Biological Sciences and Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211 (J.C.P.); Department of Horticulture, Michigan State University, East Lansing, Michigan 48823 (P.P.E.); and School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.) ORCID IDs: 0000-0001-6917-7742 (K.L.J.); 0000-0002-0292-2880 (A.L.); 0000-0001-6108-5560 (G.K.-S.W.); 0000-0003-4205-3891 (M.K.D.); 0000-0003-4811-2231 (J.L.-M.); 0000-0002-6605-1770 (C.J.R.); 0000-0002-2986-7076 (D.W.S.); 0000-0001-8209-5231 (S.W.G.); 0000-0003-0302-8909 (X.W.); 0000-0001-9682-2639 (J.C.P.); 0000-0001-7483-8605 (A.B.); 0000-0002-8921-2725 (M.S.D.); 0000-0003-2026-9122 (C.J.S.). The carbohydrate-rich cell walls of land plants and algae have been the focus of much interest given the value of cell wall-based products to our current and future economies. Hydroxyproline-rich glycoproteins (HRGPs), a major group of wall glycoproteins, play important roles in plant growth and development, yet little is known about how they have evolved in parallel with the polysaccharide components of walls. We investigate the origins and evolution of the HRGP superfamily, which is commonly divided into three major multigene families: the arabinogalactan proteins (AGPs), extensins (EXTs), and proline-rich proteins. Using motif and amino acid bias, a newly developed bioinformatics pipeline, we identied HRGPs in sequences from the 1000 Plants transcriptome project (www.onekp.com). Our analyses provide new insights into the evolution of HRGPs across major evolutionary milestones, including the transition to land and the early radiation of angiosperms. Signicantly, data mining reveals the origin of glycosylphosphatidylinositol (GPI)-anchored AGPs in green algae and a 3- to 4-fold increase in GPI-AGPs in liverworts and mosses. The rst detection of cross-linking (CL)-EXTs is observed in bryophytes, which suggests that CL-EXTs arose though the juxtaposition of preexisting SP n EXT glycomotifs with rened Y-based motifs. We also detected the loss of CL-EXT in a few lineages, including the grass family (Poaceae), that have a cell wall composition distinct from other monocots and eudicots. A key challenge in HRGP research is tracking individual HRGPs throughout evolution. Using the 1000 Plants output, we were able to nd putative orthologs of Arabidopsis pollen-specic GPI-AGPs in basal eudicots. Cell walls of plants and algae are widely used for food, textiles, paper, and timber, yet our understand- ing of their assembly and dynamic remodeling in response to growth, development, and environmental stresses (abiotic and biotic) remains rudimentary (Doblin et al., 2010, 2014). There are two contrasting types of walls/extracellular matrices in the green plant lineage, protein-rich walls of some green algae and the cellulose-rich walls of embryophytes (land plants). Recent studies of cell wall evolution suggest that the origins of many wall components occurred in the streptophyte green algae prior to the evolution of em- bryophytes (Sørensen et al., 2010; Popper et al., 2011; Domozych et al., 2012) with elaboration of a preexisting set of polysaccharides rather than an entirely new poly- mer framework. Although both the ultrastructure of plant walls and the ne structure of their component polymers vary widely, they all typically constitute a brillar net- work of cellulose microbrils that is embedded within a gel-like matrix of noncellulosic polysaccharides, pectins, 904 Plant Physiology Ò , June 2017, Vol. 174, pp. 904921, www.plantphysiol.org Ó 2017 American Society of Plant Biologists. All Rights Reserved. www.plantphysiol.org on July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

Embed Size (px)

Citation preview

Page 1: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

Breakthrough Technologies

Insights into the Evolution of Hydroxyproline-RichGlycoproteins from 1000 Plant Transcriptomes1[OPEN]

Kim L. Johnson2, Andrew M. Cassin2, Andrew Lonsdale, Gane Ka-Shu Wong, Douglas E. Soltis,Nicholas W. Miles, Michael Melkonian, Barbara Melkonian, Michael K. Deyholos, James Leebens-Mack,Carl J. Rothfels, Dennis W. Stevenson, Sean W. Graham, Xumin Wang, Shuangxiu Wu, J. Chris Pires,Patrick P. Edger, Eric J. Carpenter, Antony Bacic, Monika S. Doblin3, and Carolyn J. Schultz3*

Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University ofMelbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); Departments of BiologicalSciences and Medicine, University of Alberta, Edmonton, Alberta, Canada, and BGI-Shenzhen, Bei ShanIndustrial Zone, Yantian District, Shenzhen, China (G.K.-S.W., E.J.C.); Florida Museum of Natural History,Department of Biology, University of Florida, Gainsville, Florida 32611 (D.E.S., N.W.M.); Botanical Institute,Cologne Biocenter, University of Cologne, D50674 Cologne, Germany (M.M., B.M.); Department of Biology,University of British Columbia, Kelowna, British Columbia V1V 1V7, Canada (M.K.D.) Department of PlantBiology, University of Georgia, Athens, Georgia 3062 (J.L.-M.); University Herbarium and Department ofIntegrative Biology, University of California, Berkeley, California 94720 (C.J.R.); New York Botanical Garden,Bronx, New York 10458 (D.W.S.); Department of Botany, University of British Columbia, Vancouver, BritishColumbia V6T 1Z4, Canada (S.W.G.); Key Laboratory of Genome Science and Information, Beijing Institute ofGenomics, Chinese Academy of Sciences, Beijing 100101, China (X.W., S.W.); Division of Biological Sciencesand Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211 (J.C.P.); Department ofHorticulture, Michigan State University, East Lansing, Michigan 48823 (P.P.E.); and School of Agriculture,Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064,Australia (C.J.S.)

ORCID IDs: 0000-0001-6917-7742 (K.L.J.); 0000-0002-0292-2880 (A.L.); 0000-0001-6108-5560 (G.K.-S.W.); 0000-0003-4205-3891 (M.K.D.);0000-0003-4811-2231 (J.L.-M.); 0000-0002-6605-1770 (C.J.R.); 0000-0002-2986-7076 (D.W.S.); 0000-0001-8209-5231 (S.W.G.);0000-0003-0302-8909 (X.W.); 0000-0001-9682-2639 (J.C.P.); 0000-0001-7483-8605 (A.B.); 0000-0002-8921-2725 (M.S.D.);0000-0003-2026-9122 (C.J.S.).

The carbohydrate-rich cell walls of land plants and algae have been the focus of much interest given the value of cell wall-basedproducts to our current and future economies. Hydroxyproline-rich glycoproteins (HRGPs), a major group of wall glycoproteins,play important roles in plant growth and development, yet little is known about how they have evolved in parallel with thepolysaccharide components of walls. We investigate the origins and evolution of the HRGP superfamily, which is commonlydivided into three major multigene families: the arabinogalactan proteins (AGPs), extensins (EXTs), and proline-rich proteins.Using motif and amino acid bias, a newly developed bioinformatics pipeline, we identified HRGPs in sequences from the1000 Plants transcriptome project (www.onekp.com). Our analyses provide new insights into the evolution of HRGPs acrossmajor evolutionary milestones, including the transition to land and the early radiation of angiosperms. Significantly, data miningreveals the origin of glycosylphosphatidylinositol (GPI)-anchored AGPs in green algae and a 3- to 4-fold increase in GPI-AGPs inliverworts and mosses. The first detection of cross-linking (CL)-EXTs is observed in bryophytes, which suggests that CL-EXTsarose though the juxtaposition of preexisting SPn EXT glycomotifs with refined Y-based motifs. We also detected the loss ofCL-EXT in a few lineages, including the grass family (Poaceae), that have a cell wall composition distinct from other monocotsand eudicots. A key challenge in HRGP research is tracking individual HRGPs throughout evolution. Using the 1000 Plantsoutput, we were able to find putative orthologs of Arabidopsis pollen-specific GPI-AGPs in basal eudicots.

Cell walls of plants and algae are widely used forfood, textiles, paper, and timber, yet our understand-ing of their assembly and dynamic remodeling inresponse to growth, development, and environmentalstresses (abiotic and biotic) remains rudimentary(Doblin et al., 2010, 2014). There are two contrastingtypes of walls/extracellular matrices in the green plantlineage, protein-rich walls of some green algae andthe cellulose-rich walls of embryophytes (land plants).Recent studies of cell wall evolution suggest that the

origins of many wall components occurred in thestreptophyte green algae prior to the evolution of em-bryophytes (Sørensen et al., 2010; Popper et al., 2011;Domozych et al., 2012) with elaboration of a preexistingset of polysaccharides rather than an entirely new poly-mer framework. Although both the ultrastructure of plantwalls and the fine structure of their component polymersvary widely, they all typically constitute a fibrillar net-work of cellulose microfibrils that is embedded within agel-like matrix of noncellulosic polysaccharides, pectins,

904 Plant Physiology�, June 2017, Vol. 174, pp. 904–921, www.plantphysiol.org � 2017 American Society of Plant Biologists. All Rights Reserved. www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from

Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 2: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

and (glyco)proteins, the latter being both structural andenzymatic (Somerville et al., 2004; Doblin et al., 2010).Research to date has largely focused on the evolution

of the polysaccharides, cellulose, and the noncellulosicpolysaccharides (hemicelluloses) and pectins, pri-marily through the availability of polysaccharideepitope-specific antibodies coupled with immuno-fluorescence and/or transmission electron micros-copy and arraying techniques such as comprehensivemicroarray polymer profiling and plate-based arrays(Moller et al., 2007; Pattathil et al., 2010; Moore et al.,2014). In contrast, relatively little is known aboutthe origin and evolution of the hydroxyproline-richglycoproteins (HRGPs), a major group of wall gly-coproteins, despite their widespread occurrence(Supplemental Table S1) and importance in plantgrowth and development (for review, see Fincheret al., 1983; Kieliszewski and Lamport, 1994; Majewska-Sawka and Nothnagel, 2000; Ellis et al., 2010; Lamportet al., 2011; Draeger et al., 2015; Velasquez et al., 2015;Showalter and Basu, 2016). Examination of these gly-coproteins in a wider range of plant species shouldprovide valuable insights into how they have evolvedin parallel with the polysaccharide components inplant walls.HRGPs are commonly divided into three major

multigene families, the highly glycosylated arabinoga-lactan proteins (AGPs), the moderately glycosylatedextensins (EXTs), and the minimally glycosylated

proline-rich proteins (PRPs). The protein backbones ofthese HRGPs are distinguished by differences in aminoacid bias and the repeat motifs characteristic to eachfamily and include the posttranslationalmodification ofPro to Hyp (Johnson et al., 2017). Each of these familiesis further elaborated by chimeric and hybrid HRGPs,complicating not only the classification of members butalso the elucidation of their functions.

Classical AGPs consist of up to 99% carbohydratedue to the addition of large type II arabino-3,6-galactanpolysaccharides (degree of polymerization, 30–120) ontothe minor (1%–10%) protein backbone. The complexityof this family is immense and lies both in the diversity ofprotein backbones containing AGP glycomotifs and theincredible heterogeneity of glycan structures and sugarsdecorating these proteins. The specific binding of AGPsto the dye b-glucosyl Yariv reagent (b-Glc Yariv) and thegeneration of AGP-glycan-specific antibodies have pro-vided valuable insight into their function, structure, andevolution (Supplemental Table S1; Jermyn and Yeow,1975; van Holst and Clarke, 1985; Kitazawa et al., 2013;Paulsen et al., 2014). These studies suggest that AGPsare evolutionarily ancient. O-Glycosylation occurs inbryophytes, some chlorophytes, and brown algae, asshown by the binding of b-Glc Yariv and/or theisolation of Hyp-containing glycoproteins (Godlet al., 1997; Ligrone et al., 2002; Lee et al., 2005;Shibaya and Sugawara, 2009; Hervé et al., 2016;Johnson et al., 2017). Classical AGPs are likely toexist in the chlorophyte algae, as one of the first-cloned HRGP genes (accession no. AAB23258.2)from Chlamydomonas reinhardtii, CrZSP1 (Woessnerand Goodenough, 1989), is predicted to be a glyco-sylphosphatidylinositol (GPI)-AGP according to ourcriteria (Johnson et al., 2017). It is likely that AGPsin the chlorophyte algae have distinct functionsfrom streptophytes, as no large arabino-3,6-galactanpolysaccharides have been reported (Ferris et al., 2001).Short, branched oligosaccharides (Ferris et al., 2001)and linear, Ara- and Gal-containing oligosaccharides(Bollig et al., 2007) have been isolated from C. reinhardtiiwalls.

The highly glycosylated AGPs are soluble and havebeen proposed to act as mobile signals, potentiallyinvolved in cell-cell communication (Schultz et al.,1998; Motose et al., 2004; Pereira et al., 2014). Linkingof AGPs to the plasma membrane by a GPI-anchor issignificant, as it points toward their increased orga-nization within the cell wall-plasma membrane in-terface. The finding of GPI-AGPs in lipid rafts/nanodomains in eudicots is intriguing, as these havebeen proposed to stabilize and increase the activity ofproteins and protein complexes at the plasma mem-brane involved in cell-cell communication, signaltransduction, immune responses, and transport (Borneret al., 2005; Grennan, 2007; Tapken and Murphy,2015). It is tempting to speculate on a role for GPI-AGPs in similar processes in all species in whichthey occur, although experimental evidence is largelylacking.

1 This work was supported by the Victorian Life Sciences Compu-tation Initiative (grant no. VR0191) on its Peak Computing Facility atthe University of Melbourne, an initiative of the Victorian Govern-ment, Australia; by the 1000 Plants Initiative, which is funded by theAlberta Ministry of Enterprise and Advanced Education, Alberta In-novates Technology Futures, Innovates Centre of Research Excel-lence, and Musea Ventures; and by the Australia Research CouncilCentre of Excellence in Plant Cell Walls (grant no. CE1101007 to K.L.J., A.M.C., A.L., M.S.D., and A.B.), which is funded by the AustralianGovernment’s National Collaborative Research Infrastructure Pro-gram administered through Bioplatforms Australia.

2 These authors contributed equally to the article.3 These authors contributed equally to the article.* Address correspondence to [email protected] author responsible for distribution of materials integral to the

findings presented in this article in accordance with the policy de-scribed in the Instructions for Authors (www.plantphysiol.org) is:Carolyn J. Schultz ([email protected]).

D.E.S., N.W.M., M.M., B.M., M.K.D., J.L.-M., C.J.R., D.W.S., S.W.G.,J.C.P., P.P.E., X.W., and S.W. selected species for analysis and col-lected plant and algal samples, and M.M., B.M., M.K.D., J.C.P., C.J.R.,D.W.S., and S.W. extracted RNA for sequencing; J.L.-M. organized tran-scripts into gene families, and S.W. screened RNA-seq data for contam-ination; E.J.C. and G.K.-S.W. generated the sequence data; E.J.C. set upand maintained 1KP Web resources used to communicate data, andG.K.-S.W. oversaw the 1KP project.; A.M.C. generated the HRGP Website, and K.L.J., A.M.C., M.S.D., C.J.S., and A.B. designed and oversawthe research; A.L. provided bioinformatics support and assisted inmain-tenance of the HRGP Web site; K.J.L., A.M.C., M.S.D., and C.J.S. ana-lyzed the data and, together with A.B., wrote the article.

[OPEN] Articles can be viewed without a subscription.www.plantphysiol.org/cgi/doi/10.1104/pp.17.00295

Plant Physiol. Vol. 174, 2017 905

Evolution of the Plant HRGP Superfamily

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 3: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

AGPs have been most studied in embryophytes andhave been isolated from a wide range of tissues, in-cluding seeds, roots, stems, leaves, and inflorescences,and are particularly abundant in the stems of gymno-sperms and angiosperms (Gaspar et al., 2001). Theco-occurrence of specific glycan epitopes with devel-opmental stages, the expression pattern of AGP genes,and mutant studies have implicated AGPs in manyprocesses. These include roles in plant growth and de-velopment, cell expansion and division, hormone sig-naling, embryogenesis of somatic cells, differentiationof xylem, and responses to abiotic stress. The hetero-geneity of the AGP family and the vast number offamily members suggest that they are multifunctional,similar to what is found in mammalian proteoglycan/glycoproteins and intrinsically disordered proteins(IDPs; Filmus et al., 2008; Schaefer and Schaefer, 2010;Tan et al., 2012). Given that diverse members couldhave the same glycosylation and, conversely, that thesame backbone could have alternate glycosylatedforms, determining the function of any single AGP andthe establishment of precise AGP function is a majorchallenge. Despite the magnitude of this task, enor-mous progress has been made. For example, two well-characterized AGPs are the pollen-specific AtAGP6 andAtAGP11, which play important roles in pollen devel-opment and female tissue interactions and are sug-gested to be involved in multiple, integrated signalingpathways (Nguema-Ona et al., 2012). Investigation ofwhen these AGPs evolved would aid us in ascertainingtheir specific functions in pollen development.

The EXTs are an equally diverse family of HRGPswhose members contain repetitive SO3-5 motifs thatdirect moderate glycosylation with Ara oligosaccha-rides (arabinosides; degree of polymerization, 1–4).Classical EXTs play important structural roles throughcross-linking into the wall, facilitated by Tyr (Y)-basedmotifs such as VXY and YXY (Tierney andVarner, 1987;Lamport et al., 2011). This cross-linking into the wallcan have essential roles in plant development, such asthe requirement for Arabidopsis (Arabidopsis thaliana)AtEXT3 during embryogenesis for cell plate formation(Cannon et al., 2008) and AtEXT18 in pollen cell wallintegrity (Choudhary et al., 2015). Cross-linking ofEXTs into the wall is proposed to result in the cessationof cell expansion, and yet, some EXTs are important forcell elongation in root hairs (Velasquez et al., 2011,2015). The importance of glycosylation on EXT-likemotifs in the moss Physcomitrella patens was shown re-cently throughmutants inHypO-arabinosyltransferases(MacAlister et al., 2016). Reduced levels of cell wall-associated Hyp arabinosides (as found in EXTs) inP. patens resulted in increased elongation of protonemaltip cells. Therefore, different classes of EXTs have evolveddifferent biological roles, and these functions are gov-erned by both their protein backbone and their glycosy-lation sequences and patterns.

The first EXTs were isolated from tomato (Solanumlycopersicum; Lamport, 1977), and proteins with blocksof SO3-5motifs have been extracted fromother angiosperms

such as tobacco (Nicotiana tabacum), carrot (Daucuscarota), sugar beet (Beta vulgaris), and alfalfa (Medicagosativa; Chen and Varner, 1985; Smith et al., 1986; Liet al., 1990) as well as other green plant lineages, in-cluding gymnosperms (Fong et al., 1992), the likelysister group to angiosperms, and chlorophycean alga(Woessner and Goodenough, 1989), representing thesister to the remaining green plants. EXTs from theseother green plant lineages can have distinct featuresfrom embryophyte EXTs, and we use the term cross-linking (CL)-EXTs to distinguish EXTs with alternatingSOn and Y-based motifs, which occur predominantly inland plants, from the HRGPs that have Y residues in adifferent domain from the SOn-rich domains, as occursin volvocine algae (see Fig. 1 in Johnson et al., 2017).Exceptions may exist; for example, no classical EXTswere detected in the Hyp-poor cell walls of maize(Zea mays), which also do not have detectable iso-dityrosine, an intramolecular cross-link that occurs be-tween Y residues in EXTs (Kieliszewski et al., 1990;Kieliszewski and Lamport, 1994).

The PRPs are the most difficult family of HRGPs todefine, as very few have been studied to date. Investi-gation of PRPs has largely been restricted to legumesand Arabidopsis, where they have been shown to beminimally glycosylated, if at all (Averyhart-Fullardet al., 1988; Datta et al., 1989; Kleis-San Francisco andTierney, 1990; Lindstrom and Vodkin, 1991; Fowleret al., 1999). The majority of PRPs are chimeric, as theycontain an Ole PFAM domain, and they have beenimplicated in roles in root development (Bernhardt andTierney, 2000), stress responses (Li et al., 2016), fiberdevelopment (Xu et al., 2013), and nodule formation(Sherrier et al., 2005). Despite these important func-tions, information on the origins and evolution of PRPsremains fundamentally unexplored.

Collectively, the functional characterization of HRGPspoints toward their involvement in many processeswithin the wall, including signaling and cell wall in-tegrity pathways (Costa et al., 2013; Pereira et al., 2015,2016; Voxeur and Höfte, 2016). As plants evolvedmulticellularity and transitioned to land, it is likely thatgreater complexity in signaling networks was required,a role that could seemingly be fulfilled by HRGPs, dueto their distinct protein and glycan characteristics. It iscurrently unknown if the occurrence of specific HRGPsis associated with evolutionary milestones and thespecialization of plant cell walls. Greater insight intothe origins, evolution, and diversity of HRGPs will en-able researchers to address these key issues.

To study HRGP function throughout plant evolu-tion, we developed a robust method for their detectionthat exploits the amino acid bias of the predicted pro-tein backbone and characteristic glycomotifs (Johnsonet al., 2017). The motif and amino acid bias (MAAB)pipeline classifies HRGP sequences into one of 24 pre-defined descriptive subclasses (23 HRGP classes andone MAAB class) and is optimized for the recovery ofGPI-AGPs, non-GPI-AGPs, and CL-EXTs. Here, we usethis automated bioinformatics pipeline to identify and

906 Plant Physiol. Vol. 174, 2017

Johnson et al.

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 4: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

classify HRGPs within the extensive 1000 Plant (1KP)transcriptome data.The 1KP project (www.onekp.com) has generated

large-scale transcriptomic data for over 1000 plant andalgal species, thereby providing deeper sampling of keyplant taxa beyond those with sequenced genomes(Johnson et al., 2012; Matasci et al., 2014; Wickett et al.,2014). Species selected include mostly non-model landplants, many of which are important for medicine, ag-riculture, forestry, and biodiversity/conservation, aswell as red algae (Rhodophyta), green algae (chlorophytesand algal streptophytes), and taxa from a small groupof freshwater unicellular algae known as glaucophytes,collectively termed the Archaeplastida (Plantae). Anal-yses of the 1KP data are already helping resolve long-standing questions, such as from which streptophytegreen algal lineage the progenitors of land plants likelyarose (Wickett et al., 2014). The 1KP transcriptome dataset, therefore, offers an unprecedented opportunity toinvestigate HRGP superfamily evolution and determineif changes coincide with major evolutionary transitionsin the Archaeplastida.A tandem repeat annotation library (TRAL) is also

used to identify repeats in MAAB sequences and sup-port findings of selective loss and/or gain of CL-EXTsin bryophytes and commelinid monocots. A combina-tion of BLAST and profile hidden Markov models wasused to search for putative orthologs of pollen-specificAtAGP6/11 in angiosperms (flowering plants) withhits validated by phylogenetic analysis. These vignettesdemonstrate the broad utility of the bioinformaticspipeline on the 1KP data, providing evolutionary in-sights into the complexity of the HRGP superfamily.

RESULTS AND DISCUSSION

MAAB Pipeline for GPI-AGPs and CL-EXTs

Using the newly developed bioinformatics tool, theMAAB pipeline (Johnson et al., 2017), we evaluated the1KP data set for HRGPs. We set parameters in MAABto best detect the classical, non-chimeric GPI-AGPs,CL-EXTs, PRPs, and non-GPI-AGPs (classes 1–4, re-spectively) as well as the hybrid HRGPs (classes 5–23)using features of the well-characterized Arabidopsissequences. MAAB was validated using 15 proteomesfrom Phytozome and shown to reliably detect and ac-curately classify HRGPs into 23 descriptive subclasses(Johnson et al., 2017). MAAB represents a significantadvance of previous methods to detect HRGPs in beingable to identify all classes in one pipeline, withoutmanual curation.MAAB did not identify any false positives in Phyto-

zome data sets, and evaluation of a subset of the 1KPdata sets also suggests a very low rate of false positives(Johnson et al., 2017). In Ranunculales, a well-sampledbasal eudicot order (39 samples derived from 20 spe-cies), we analyzed 143 non-redundant GPI-AGP se-quences (five Ranunculaceae and six Papaveraceae data

sets; Supplemental Fig. S1A) and observed 98.6% truepositives, with only two sequences not being clearGPI-AGPs (class 1) and four other sequences flaggedas likely contaminants based on their appearance inmultiple unrelated data sets (see legend, SupplementalFig. S1). The rate of false positives for CL-EXTs (class 2)was estimated based on analysis of 45 non-redundantbryophyte sequences (Supplemental Fig. S2) derivedfrom the larger set of 67 identified within the 1KP data(Fig. 1). Shading of motifs reveals that 42 of 45 sequences(93.3%) have the expected CL-EXT motifs throughoutthe entire backbone. The three remaining sequencescould be considered CL-EXTs, or hybrid CL-EXTs-AGPsor AGPs, as they havemore AGPmotifs than SPn motifs,but both motif types are present (Supplemental Fig. S2).In either case, they are definitive HRGPs, suggestingthat the true positive rate is 100% (45/45), althoughwe cannot rule out that the sequences are partialchimeric HRGPs until full-length sequences becomeavailable.

Using the MAAB Pipeline to Identify and Classify HRGPsin Transcriptomic Data Sets Spanning Large EvolutionaryTime Scales

Across the entire 1KP data, 45,304 HRGPs (classes1–23) were identified (Fig. 1) using a multiple k-merapproach (Johnson et al., 2017). The majority of se-quences detected were in HRGP classes 1 to 4 (38,051)and MAAB class 24 (39,933), with a grand total of85,237 sequences across all 24 MAAB classes. Wecompared HRGPs identified for each of the 1KP tax-onomic groups (hereafter 1KP groups, which includesclades and grades; see http://www.onekp.com/samples/list.php) by calculating the percentage ofsequences in the classical (classes 1–4) and minor(classes 5–23) HRGP classes and the final MAABclass (class 24) out of the total sequences classifiedby MAAB (Fig. 2A). The majority of 1KP groupshad a similar proportion of sequences in classes 1 to4 (average ;50%), 5 to 23 (average ;10%), and24 (average ;40%). However, the algal groups, in-cluding those well represented in the 1KP data set(green algae, comprising chlorophytes and algalstreptophytes, and red algae and Chromista), exhibi-ted a trend toward lower percentages of HRGP se-quences (classes 1–4 and 5–23) and a higher percentageof sequences inMAAB class 24 (Fig. 2A). The relativelylow number of GPI-AGPs (class 1) and the highernumber of class 4 sequences most likely reflect un-derlying functional differences in algal AGPs. Thehigh number of class 24 sequences, likely either non-HRGPs or unknown HRGPs, suggests that algae havea more diverse array of IDPs than land plants. Thepossibility that HRGP transcripts have not beendetected within k-mer assemblies also is possible andcould be due to sequence quality and/or samplingissues (lower than optimal number of data sets andtissues; see below).

Plant Physiol. Vol. 174, 2017 907

Evolution of the Plant HRGP Superfamily

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 5: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

As a measure of the depth of taxon sampling, thenumber of taxonomic orders represented and thenumber of samples within each 1KP group are reportedin Figure 2B. The green alga group is the best repre-sented of the algae, with 34 orders and 152 samplesincluded in our analyses, and hence provides the mostcomprehensive data set for comparison. The averagenumber of sequences for all MAAB classes (1–24) issummarized by the 1KP group (Fig. 2C). As each grouprepresents a variable number of species, and not allspecies in a group have the same HRGPs, we have alsosummarized the detection rate as a heatmap by the 1KPgroup (Fig. 2C) and by taxonomic order (SupplementalTable S2). This allowed us to assess whether detectionof a particular HRGP class is either real or spurious. If50% or more species in a given taxonomic order haveone ormore sequence for a givenHRGP class, we reporta greater than 50% detection rate and assume that thisHRGP class is widespread throughout the taxonomicorder. The few class 2 (CL-EXTs) sequences identified ingreen algae and Chromista are likely sporadic con-taminants based on the low mean number of CL-EXTs(0.1 for green algae and 0.2 for Chromista; Fig. 2C), lowdetection rates (Fig. 2C; Supplemental Table S2), andBLASTp analysis revealing high sequence similarity tovascular plant CL-EXTs (data not shown).

Here, we present a preliminary analysis of the minorHRGP classes (5–23) and will then focus on the major

HRGP classes (1–4). Classes 5 to 23 include hybrid-typeHRGPs with motifs associated with two or moreclasses of HRGP (see Fig. 4 in Johnson et al., 2017). Forexample, class 5 HRGPs contain AGP bias withCL-EXT motifs (both SP3-5 and Y-based) and are mosthighly represented in gymnosperms (conifers, Gnetales,Ginkgo, and Cycadales; Fig. 2C). In general, the detec-tion of classes 5 to 23 is low and sporadic for mostclades, with the exception of conifers and core eudicots.In these clades, a high detection rate (greater than 50%)of sequences occurs for most HRGP classes (Fig. 2C).This suggests that more hybrid/unusual HRGPs occurin these clades, although the low mean numbers indi-cate that these sequences are found in a few, rather thanall, data sets within core eudicot and conifer orders.Data sets from the 1KP green algae group have a rela-tively high number of sequences in class 6 (1.46 4; AGPbias, EXT glycomotifs [SPn], but not Y-based cross-linking motifs; Fig. 2C), which is consistent with ourcurrent knowledge of HRGPs isolated from Chlamydo-monas and Volvox (Adair et al., 1983; Woessner andGoodenough, 1989; Woessner et al., 1994; Ferris et al.,2005). The majority of CL-EXTs identified in Arabi-dopsis are not predicted to be GPI-anchored (see Table Iand Supplemental Table S2 in Johnson et al., 2017), andconsistent with this, only a few GPI-anchored EXTs(class 9) were observed in Phytozome and 1KP data.Only one class, GPI-anchored PRPs (class 14), had no

Figure 1. Number of HRGP sequences identified byMAAB in each HRGP class by 1KP group. The number of data sets per cladeis shown in parentheses after the 1KP group name. All MAAB classes are reported for multiple k-mers (multk: 39, 49, 59, 69;unshaded columns). No sequences were detected for HRGP class 14. Selected data are reported for assemblies with k-mer =25 (k25; shaded columns). Red numbers indicate that the sequences detected within this class and 1KP group are likely to becontaminants (see “Results and Discussion”).

908 Plant Physiol. Vol. 174, 2017

Johnson et al.

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 6: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

representatives from any 1KP group, which is consis-tent with our current, albeit limited, knowledge of PRPs(Averyhart-Fullard et al., 1988; Datta et al., 1989; Fowleret al., 1999).The mean numbers of GPI-AGPs and CL-EXTs in the

1KP data were lower than expected, particularly in thecore eudicots/rosids (approximately four and sevenrespectively; Fig. 2C), compared with the numbers inthe majority of eudicot and commelinid genomic datasets (see Table I in Johnson et al., 2017). We suspected

that this might be due to the restricted number of tissuetypes used for sequencing, so we investigated a subsetof 1KP data sets in the Ranunculales (39 samples de-rived from 20 species). For each data set, we manuallyremoved likely redundant sequences (see “Materialsand Methods”), reducing the mean number of GPI-AGP sequences from 7.3 to 4.8 (Supplemental Fig.S1A) and CL-EXTs from 5.1 to 2.9 (Supplemental Fig.S1B). Most of the redundant sequences contained var-iably sized and positioned deletions (data not shown).

Figure 2. Summary of HRGP sequences identified using the MAAB pipeline. A, Percentage of total HRGP sequences (by 1KPgroup) that are found in the classical HRGP classes (1–4), hybrid HRGP classes (5–23), and non-HRGPs (class 24). B, Number oforders analyzed and number of samples per order for each 1KP group. Themonocot group excludes the commelinidmonocots. C,Overview of the mean number of HRGPs for eachMAAB class in the 1KP data set (by 1KP group). Shading of boxes represents thedetection rate, indicated as the percentage of orders with hits (Supplemental Table S2). A shaded box showing a detection ofHRGPs with a value of zero indicates an average number of sequences between 0 and 0.1. Red numbers indicate that the se-quences detected within this HRGP class and 1KP group are likely to be contaminants (see “Results and Discussion”).

Plant Physiol. Vol. 174, 2017 909

Evolution of the Plant HRGP Superfamily

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 7: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

The majority of the 1KP angiosperm data sets are fromleaf tissue, yet this tissue has the lowest average num-ber of GPI-AGPs (four), with 4.4 in roots, five in flow-ers, and six in developing fruits (Supplemental Fig.S1A). Leaf tissue data sets also had a low number ofCL-EXTs (2.6) compared with roots (3.8) and fruits (3.3;Supplemental Fig. S1B). When all unique sequences aretallied across the four Papaver spp., 12 GPI-AGPs(Supplemental Fig. S1C) and six CL-EXTs were identi-fied (Supplemental Fig. S1D). Thus, the total numbersof GPI-AGPs and CL-EXTs observed in most 1KP datasets are underestimates due to limited tissue sampling,despite the likely sequence redundancy that was notroutinely removed by MAAB pipeline processing.

Significantly, the greater species sampling within1KP mostly corroborated our Phytozome observationsand allow us to draw somemajor conclusions: (1) AGPsare widespread, both GPI-AGPs and non-GPI-AGPsbeing convincingly found in the green algae group aswell as Chromista and Glaucophyta; (2) CL-EXTsoccur in most groups of extant land plants, includingbryophytes (nonvascular plants); and (3) non-chimericPRPs (class 3) are uncommon and largely confined toeudicots (Fig. 2C).

GPI-AGPs Evolved Early in the Green Plant Lineage

Our data show that GPI-AGPs are present in all algalgroups sampled in the 1KP project with the exception ofthe Rhodophyta. The absence of GPI-AGPs in red algaeis consistent with several other independent studies.A detailed analysis of the completed genome of twounicellular red algae,Cyanidioschyzonmerolae (Hashimotoet al., 2009; Ulvskov et al., 2013) and Galdieria sulphuraria(Ulvskov et al., 2013), revealed that they lack the keyenzymes for the synthesis of GPI-anchors; therefore,GPI-AGPs would not be expected to be present inthese algae. To our knowledge, there are no reports inthe literature of red algal AGPs based on either AGPglycan epitopes or b-Glc Yariv staining (SupplementalTable S1). AGPs are not well known in the Chromista,which include brown algae, although there are reportsof low levels of Hyp (less than 0.04%) in the solublefraction of cell walls from a variety of brown algalspecies (Gotelli and Cleland, 1968). Also, in a recentstudy, O-linked glycosylation typical of AGPs wasdetected in five species of brown algae, and analysis ofthe Ectocarpus proteome shows that proteins contain-ing AGP-like glycomotifs are present (Hervé et al.,2016). A similar distribution of the amino acid motifsAP, SP, and TP is seen in all 1KP groups from Chro-mista to eudicots (see Supplemental Fig. S5 in Johnsonet al., 2017), suggesting that AGP-type glycomotifs areevolutionarily ancient.

A consistent trend in genomic and transcriptomicdata sets was the decrease in the number of non-GPI-AGPs from the green algae group to vascular plantsand the opposite trend (an increase) for GPI-AGPs, in-cluding a 3- to 4-fold increase in GPI-AGPs, in liverworts

and mosses (approximately four sequences per spe-cies), compared with the green algae group and horn-worts (Figs. 1 and 2). The green algae (chlorophytesand algal streptophytes) are well represented in the1KP data, including several species from Volvocaceae(two species, three samples) and Chlamydomonaceae(five species, five samples). The average number ofGPI-AGPs and non-GPI-AGPs in the 1KP sampleof Chlamydomonas spp. is 0.8 and 43.6, respectively,whereas in Volvox spp., it is one and 78.3, respectively(Data File 6). It is possible that some of the non-GPI-AGPs could be partial chimeric HRGPs rather thantrue non-GPI-AGPs; further work is needed to resolvethis issue.

The functional specification of selected AGPs mayhave been established early in green plant evolution.For example, AGPs have been shown to be involved inapical growth in the moss P. patens (Lee et al., 2005) andsome vascular plants (Nguema-Ona et al., 2012). b-GlcYariv staining in eight liverwort species revealedubiquitous tissue staining, with darker staining in api-ces of some species, suggesting that apical targeting ofAGPs may be a feature that arose in bryophytes(hornworts, mosses, and liverworts; Basile and Basile,1987). AGPs also have been implicated in cell plateformation in liverworts (Shibaya and Sugawara, 2009).Since apical targeting as well as AGP delivery tophragmoplasts (a scaffold for cell plate assembly) existin streptophyte algae (Charales, Coleochaetales, andZygnematales), it is possible that these two functions ofAGPs evolved earlier than the emergence of bryophytes(Leliaert et al., 2012; Bowman, 2013).

It should be noted, however, that b-Glc Yariv stain-ing and glycan epitope labeling can provide onlygeneral information about the presence/absence ofAGP-specific glycans, as these tools are not able todistinguish the different subclasses of AGPs. The in-formation revealed by MAAB in the 1KP data detail-ing the suite of AGPs present in bryophytes in additionto the increasing use of the bryophyte models P. patens(Kofuji and Hasebe, 2014) and Marchantia polymorpha(Ishizaki et al., 2016) will facilitate molecular approachesto investigate the functions of specific AGPs through, forexample, mutant studies.

Origin and Selective Loss of CL-EXTs

Previous studies of HRGPs show that EXT-like Hyp-arabinosides are evolutionarily ancient and are foundin chlorophyte algae, such asChlorella vulgaris (Lamportand Miller, 1971) and C. reinhardtii (Ferris et al., 2001;Bollig et al., 2007). Our data suggest that CL-EXTs arosein bryophytes, as no CL-EXTs were detected in the twovolvocine (chlorophyte green algal) genomes (see TableI in Johnson et al., 2017) and only contaminating se-quences were detected in algal 1KP data sets (Fig. 2).EXT epitopes have been identified in some chlorophytealgal species (Domozych et al., 2009; Sørensen et al.,2011; Supplemental Table S1), but this likely reflects the

910 Plant Physiol. Vol. 174, 2017

Johnson et al.

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 8: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

presence of chimeric and/or hybrid EXTs rather thanCL-EXTs. Similar to the Phytozome analysis, hybridHRGPs with both AGP and EXT glycomotifs werefound in the 1KP green algae group (e.g. RYJX_Locus_857[Pandorina morum] and VALZ_Locus_7817 [C. noctigama],both class 19). Where Y residues are present, these arefound in a separate domain from the P-rich domain (seeFig. 1 in Johnson et al., 2017). This result suggests thatCL-EXTs arose though the juxtaposition of preexistingSPn EXT glycomotifs with refined Y-based motifs (e.g.YXY and VXY) rather than via the simultaneous evo-lution of both protein motifs.To gain a better understanding of the occurrence of

CL-EXTs in bryophytes, the phylogenetic distributionof CL-EXTs present in the 1KP data was assessed inmore detail (Fig. 3). CL-EXTs were detected by theMAAB pipeline in all five hornwort species represent-ing two taxonomic orders (Supplemental Fig. S2),whereas they were identified in only five of 16 orders ofmosses (Hypnales, Pottiales, Diphysciales, Buxbaumiales,and Sphagnales) and two of eight orders of liverworts(Marchantiales and Sphaerocarpales) that were sam-pled by 1KP (Supplemental Table S2). This relativelylow detection rate in mosses and liverworts is unlikelyto be due to low species representation, as it is com-parable to the sampling depth in hornworts (Fig. 3). Thedata, therefore, suggest that while CL-EXTs are wide-spread in hornworts, they have a more limited distri-bution within the moss and liverwort lineages. Ifhornworts are the sister group of other land plants(Wickett et al., 2014), then this may imply that CL-EXTswere lost from some lineages of liverworts and mosses.An alternative explanation is that CL-EXT genes arepresent in mosses and liverworts but are expressed atvery low levels and, therefore, were not detected.To exclude the possibility that MAAB missed partial

CL-EXT sequences due to a strict criterion for an en-doplasmic reticulum (ER) signal sequence (Johnsonet al., 2017), a TRAL was generated to identify CL-EXTrepeat motifs (Schaper et al., 2015). The TRAL identifiedall repeats in class 2 sequences that were longer than10 amino acids, present in four or more copies, andsignificantly different from random sequence evolution(P , 0.05; see “Materials and Methods”). A library forMAAB class 24 sequences also was generated as acontrol. Each repeat was used to construct a circularsequence profile hidden Markov model, and thesemodels were used to search the MAAB input data (seeData File 2 in Johnson et al., 2017) to identify all se-quences that contain these repeats. A simple tool forbrowsing the results of TRAL is available at http://services.plantcell.unimelb.edu.au/hrgp/index.html, andthis tool was used to search all the bryophyte species forTRAL hits.There is a good correlation between the number of

species with CL-EXTs from MAAB and those withSPn-/Y-based repeats identified by TRAL (Fig. 3; DataFile 6). Most importantly, there was no increase in thenumber of bryophyte species containing TRAL repeats(in the MAAB input data; see Data File 2 in Johnson

et al., 2017) compared with MAAB class 2 sequences,suggesting that only a small number of partial CL-EXTsare excluded by the MAAB pipeline.

TRAL repeats from class 24 also were searched, asthese could identify both novel HRGP motifs and otherrepeats of interest that may have been excluded by theMAAB-imposed filters. The majority of bryophytespecies had hits to TRAL repeats generated from class24 sequences; however, none contained both SPn and Yrepeats and only a few have repeats containing Y,suggesting that class 24 contains few, if any, CL-EXTs.Combined, TRAL and MAAB outputs indicate thatthere are a number of moss and liverwort species thatlack CL-EXTs. This leads us to suggest that there hasbeen either selective loss (e.g. Jungermanniales andmost species from Hypnales) or multiple independentgains (e.g. Sphagnales and Marchantiales) of CL-EXTsin mosses and liverworts. Confirming this result usinggenome sequencing of selected species and investiga-tion of the morphology and development of mossesand liverworts with and without CL-EXTs would bean interesting avenue for future research. These non-vascular plants also may help uncover the role ofCL-EXTs with different sequence motifs and repeatlengths.

CL-EXTs Have Been Lost in Many Grass Species

An evaluation of Phytozome data shows thatCL-EXTs containing both SPn $ 3 and Y-basedmotifs arefound in almost all eudicot species but are surpris-ingly absent from Eucalyptus grandis (eudicot/rosid),Aquilegia coerulea (basal eudicot), and the four species ofgrasses (monocot/commelinid; Poaceae). No CL-EXTswere found in either P. patens or the two volvocine algalspecies (see Table I in Johnson et al., 2017). The lownumber of HRGPs in some of these species is suspectedto be due to annotation and assembly issues, based onour detection of CL-EXT-like open reading frames inthese genomes using 1KP-identified CL-EXTs as querysequences (see “Materials and Methods”). In contrast,this approach was not successful for genomes from thecommelinid monocots (APG IV, 2016), designated asmonocots/commelinids in the 1KP group nomencla-ture. This absence is convincing because there are17 different grass species (21 data sets) in the 1KP data,and one species, Eleusine coracana, is represented byfour different data sets (leaves, flowers, and unstressedand stressed roots; see Data File 3 in Johnson et al.,2017). As an additional check, we employed BUSCO(Benchmarking Universal Single-Copy Orthologs), ananalysis tool that quantitatively assesses transcriptomecompleteness based on the presence of near-universalsingle-copy orthologous genes selected from OrthoDB(Simão et al., 2015). Between 65% and 92% (excludingthree outliers) of 956 single-copy genes were identifiedin the monocots/commelinids data sets (Fig. 4B). Thevalue for the Poaceae was at the high end of this range(median, 88.4%;mean, 83.8%), suggesting that the grass

Plant Physiol. Vol. 174, 2017 911

Evolution of the Plant HRGP Superfamily

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 9: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

transcriptome assemblies are high quality and thatCL-EXTs should have been detected if they were pre-sent. Our findings also are supported by a recent bio-informatics study of the maize and rice (Oryza sativa)genomes, where no CL-EXTs were identified (Liu et al.,2016).

Primary cell walls of commelinid monocots have acomposition distinct from eudicots such as Arabi-dopsis and gymnosperms (Bacic et al., 1988; Carpitaand Gibeaut, 1993; Doblin et al., 2010). Therefore, weinvestigated whether an absence of CL-EXTs also wasobserved in other commelinid families representedby 1KP. As for Poaceae, CL-EXTs were not detected inthe sedge family Cyperaceae (total of three species)and three other families within Poales, although theywere identified in Bromeliaceae and Restionaceae(one species each; Fig. 4A). CL-EXTs also were notdetected in two families within the Zingiberales(Zingiberaceae and Marantaceae). The commelinidmonocots are a major clade within the larger monocotgroup (APG IV, 2016). As CL-EXTs are absent from atleast four grass genomes, including the Sanger-sequenced rice genome (see Table I in Johnson et al.,2017), and occur widely in the 1KP monocot group(which excludes the commelinid monocots; Fig. 2;Supplemental Table S2; Data File 6), we suggest thatthere has been selective loss of CL-EXTs within thecommelinid monocots. Further studies are requiredto confirm their absence from specific commelinidspecies.

It is likely that the cross-linking function of CL-EXTswould need to be replaced by something else in com-melinid monocots to provide the necessary scaffold forcell wall development. Obvious candidates are mono-meric phenolic acids such as the hydroxycinnamic acids(ferulic and p-coumaric) of commelinid monocot walls,absent in the noncommelinid monocot, eudicot (withthe exception of the Caryophyllales), and gymnospermwalls, which form covalent cross-links between arabi-noxylan chains (for review, see Bacic et al., 1988; Harris,2005). Additional experimentation is needed to test thishypothesis.

Orthologs of GPI-AGPs and CL-EXTs in Brassicales

The broad sampling of species in the 1KP data pro-vides an opportunity to investigate how far ortholo-gous relationships of selected sequences can be tracedin plant lineages (Wickett et al., 2014). Given that IDPs

Figure 3. Distribution and number of CL-EXTs (class 2) in the bryo-phytes.MAAB detected CL-EXTs in all hornwort species in the 1KP data,whereas the distribution in mosses and liverworts shows either loss of,

or multiple independent gains of, CL-EXTs. TRAL of repeats identified inclass 2 and class 24 (control) sequences were used to search theMAAB input sequences and identify repeats present in the bryophytelineages. There is good correlation between the species with CL-EXTsidentified by MAAB and those with SPn/Y-based repeats identified byTRAL using class 2 repeats. No CL-EXTs were identified using TRALrepeats from class 24 sequences. The phylogenetic tree is based onCole and Hilger (2013) and was selected because it includes allbryophyte orders.

912 Plant Physiol. Vol. 174, 2017

Johnson et al.

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 10: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

are usually poorly conserved throughout evolution. wewanted to test if putative HRGP orthologs could bedetected. As Arabidopsis HRGP sequences are themostwell characterized,we undertook phylogenetic analysis

of Arabidopsis and GPI-AGPs and CL-EXTs identifiedby MAAB in the Brassicales (see “Materials andMethods”; Fig. 5). The ML tree of the GPI-AGPs showsthat sequences from Brassicaceae data sets group mostclosely with the Arabidopsis sequences, generally withgood (greater than 70%) bootstrap values. Sequencesfrom other Brassicales families cluster further awayaccording to their evolutionary relatedness and withlower bootstrap support (Fig. 5A). Although showinglow bootstrap support, AtAGP59, the additional GPI-AGP identified by MAAB analysis, clusters with AGP6and AGP11 (Fig. 5A).

The phylogenetic analysis of Brassicales CL-EXTsrevealed a similar hierarchical pattern to the GPI-AGPs, with Brassicaceae sequences generally clus-tering nearer to the Arabidopsis sequences, but withvariable bootstrap support (Fig. 5B). Some CL-EXT1KP sequences did not group with an Arabidopsissequence, making the prediction of orthologous re-lationships difficult, even within the Brassicaceae.This is perhaps not surprising, since tandem repeatproteins often evolve more rapidly than other pro-teins (Moesa et al., 2012; van der Lee et al., 2014). Theclustering pattern of GPI-AGPs in the Brassicales tree(Fig. 5A) largely reflects that observed for the Ara-bidopsis sequences (see Fig. 2C in Johnson et al.,2017). This suggests that, despite the low bootstrapvalues, it may be possible to identify orthologoussequences using phylogenetic analysis over largerevolutionary distances, such as within angiosperms(flowering plants).

Detection of Orthologs in Angiosperms: AtAGP6/11 asan Example

One pair of Arabidopsis GPI-AGPs, AtAGP6 andAtAGP11 (see subclade AGP-h in Fig. 2C in Johnsonet al., 2017), has been well studied, as they are in-volved in pollen development and pollen tubegrowth and, therefore, impact reproductive capacity(Levitin et al., 2008; Coimbra et al., 2009, 2010).Therefore, orthologs of these genes could be expectedto be found in a broader subset of eudicots and po-tentially early angiosperms, depending on geneconservation. Although a recent attempt to find pu-tative orthologs of AtAGP6/11 inQuercus suber (corkoak; core eudicots/rosids, Fagales), pollen EST datawere unsuccessful (Costa et al., 2015), we predictedthat it should be possible to identify orthologs in the1KP data, particularly within data sets that includefloral tissue. A combination of BLAST (Phytozomeand National Center for Biotechnology Information[NCBI]) and profile hidden Markov (HMMER)models (see “Materials and Methods”) was used tosearch for putative AtAGP6/11 orthologs in the 1KPmultiple k-mer data (MAAB input and output; seeData Files 2–4 in Johnson et al., 2017). In some cases,additional sequences were identified by BLAST fromother sources (e.g. the Amborella trichopoda genome)

Figure 4. A, Mean number of MAAB class 2 CL-EXT sequences iden-tified using the MAAB pipeline within each family of monocots/commelinids. B, Many families lack CL-EXTs, but this is not due to poor-quality transcriptomes, as BUSCO analysis shows that the percentage of956 single-copy orthologous genes found in themonocots/commelinidsfamilies is high (average ;80%) Family data (k-mer 39 only) are sum-marized in a box-and-whisker plot after the removal of outliers (twoPoaceae spp., 59.4 and 65.7; and one Aracaeae sp., 18.1). C, The meannumber of class 1 GPI-AGPs was evaluated, and these are present in allfamilies that lack CL-EXTs. Only Cannaceae and Restionaceae haveboth GPI-AGPs and CL-EXTs, although for most families, the number ofdata sets is low (one to three). The total number of data sets per family isindicated in parentheses.

Plant Physiol. Vol. 174, 2017 913

Evolution of the Plant HRGP Superfamily

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 11: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

Figure 5. Maximum likelihood (ML) tree of Brassicales 1KP and Arabidopsis GPI-AGPs (A) and CL-EXTs (B). ML treesgenerally show strong support for subclades with Arabidopsis sequences and 1KP sequences from family Brassicaceae.Putative GPI-AGP subclades (A) are separated by a horizontal dotted line, and the Arabidopsis orthologs are indicated byboxes (right of tree). Sequences from the Brassicaceae are in green, with Arabidopsis sequences in larger, boldface font.

914 Plant Physiol. Vol. 174, 2017

Johnson et al.

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 12: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

for use as full-length evolutionary markers (see“Materials and Methods”). The HMMER modelswere optimized by varying the input sequences andthe length of the aligned region (see “Materials andMethods”). The best results were obtained using full-length AtAGP6/11-type proteins (including theN-terminal ER and C-terminal GPI anchor signal se-quences) from the Brassicaceae family (model 1;Supplemental Table S3). This was measured by ahigher true positive and a lower false positive rate, asassessed bymanual curation (Supplemental Table S4;see “Materials and Methods”).The resulting sequences were subjected to phylo-

genetic analysis, and the ML tree displayed in Figure6 shows the relatedness of a large subset of AtAGP6/11-like sequences derived from 1KP data and se-quenced genomes. Almost all the eudicot sequencesgroup together with AtAGP6/11. The monocot andcommelinid monocot sequences form a separatesubclade with AtAGP58 and AtAGP59 (Fig. 6),albeit with low bootstrap support. Notably, noAtAGP6/11-like sequences were detected within1KP samples from Fagales, the order that includesQ. suber, including those containing floral tissue(Supplemental Table S4), despite two similar se-quences being identified in the Quercus robur genome(Qurob_scaffold_362 and Qurob_scaffold_554). Onlytwo sequences identified by HMMER model 1 wereapparent false positives (OHAE_12357 andOHAE_4810;Polygala lutea, Fabales), positioned in a subclade withArabidopsis AtAGP1/2/4/5 and OsAGP1 from rice(Fig. 6). The relatively clear separation of putativeAtAGP6/11 orthologs from the other GPI-AGPssuggests that phylogenetic analysis may provide areasonable prediction of gene orthology for theseIDPs. In this case, potential AtAGP6/11 orthologscould be confidently detected in both eudicot andmonocot lineages, suggesting the existence of anancestral gene in a common ancestor prior to theirdivergence ;140 to 150 million years ago (Chawet al., 2004). However, no potential AtAGP6/11ortholog was identified in A. trichopoda, which is thesister group of all other angiosperms. Unfortu-nately, all eight 1KP data sets from basal angio-sperms (members of Amborellaceae, Nymphaeales,and Austrobaileyales) were derived from either leafor shoot tissue, not floral tissue, and A. trichopoda isthe only basal angiosperm genome that is publiclyavailable. Our demonstrated success at finding moreHRGPs using multiple k-mer assemblies (Johnson

et al., 2017) suggests that using this approach to an-alyze existing and future basal-most angiospermfloral transcriptome data sets would be advantageousbefore concluding an absence of AtAGP6/11 ortho-logs among this group.

To explore the robustness of the AtAGP6/11 sub-clade further, all GPI-AGPs from the 1KP data setsreported in Figure 6 were used to produce a largerangiosperm GPI-AGP tree (Supplemental Fig. S3).As expected, the bootstrap values were low; how-ever, most of the additional 290 GPI-AGP sequencesgrouped into the subclades observed in Figure 2C ofour companion article (Johnson et al., 2017). Thesereflect putative Arabidopsis orthologs for AtAGP17/18, AtAGP9, AtAGP4/7/10, AtAGP2/3, AtAGP58,AtAGP25/26/27, AtAGP1, and AtAGP6/11/59. Im-portantly, despite the low bootstrap support, the ma-jority of 1KP sequences in the AtAGP6/11 subclade inFigure 6 also group with AtAGP6/11 in the angio-sperm GPI-AGP tree (Supplemental Fig. S3), suggest-ing that these sequences are indeed the most similarand, therefore, likely to be AtAGP6/11 orthologswithin the analyzed transcriptomes.

Investigation of the GPI-AGP sequence alignmentand further analysis of each individual GPI-AGPsubclade revealed differences in the presence anddistribution of specific amino acids (e.g. K, M, Q, andD/E) among subclademembers (Fig. 7; SupplementalFig. S4). For example, putative orthologs of AtAGP6/11 from both eudicots and some monocots sharescattered K residues, concentrated in the first half ofthe protein followed by a region containing severalacidic residues (Fig. 7; Supplemental Fig. S4K).

The scattered K residues are noticeably absent fromthe putative orthologs fromQ. robur (Supplemental Fig.S4, A and K). Therefore, further analysis is required toconfirm these sequences and other putative AtAGP6/11genes in order to determine the true AtAGP6/11orthologs. With additional taxonomic and appropriatetissue sampling, it should be possible to fill the evolu-tionary gaps and trace the origin ofAtAGP6/11 genes aswell as other HRGPs. Complementation of angio-sperm mutant lines, such as the agp6 agp11 doublemutant of Arabidopsis that displays pollen pheno-types (Levitin et al., 2008; Coimbra et al., 2009, 2010),with progenitor AtAGP6/11 genes will be essential totest their functional orthology and may reveal infor-mation about these motifs and, possibly, the specificglycosylation required for HRGP function in particu-lar tissue types.

Figure 5. (Continued.)AtEXT3 was included with CL-EXTs (B) even though it was classified as class 20 because of its shared bias (less than 2%difference [D]) between percentage PSKY (74.2%) and percentage PVYK (73.1%; Johnson et al., 2017). Numbers on thenodes represent support with 100 bootstrap replicates (70 or greater, green; 60–69, orange; 40–59, black). 1KP data sets(1KP locus identifier and family name) are shown on the Brassicales tree (inset [Stevens, 2001]; version 12, July 2012, http://www.mobot.org/mobot/research/apweb/). Scale bars for branch length measure the number of substitutions per site. Se-quences used to generate the trees are shown in: GPI-AGPs Brassicales (Supplemental Fig. S5A) and CL-EXTs Brassicales(Supplemental Fig. S5B).

Plant Physiol. Vol. 174, 2017 915

Evolution of the Plant HRGP Superfamily

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 13: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

HRGPs: Remaining Challenges and Future Perspectives

Despite being minor components of the wall, theHRGPs make a significant contribution to wall prop-erties and to cellular identity (Knox et al., 1991). Withaccess to the 1KP data, we have made significant pro-gress toward understanding the evolutionary timeframe of when this large gene family of diverse andcomplex proteins likely evolved. We have provideda platform to guide experimental approaches to con-firm our data and answer questions regarding howHRGPs evolved as well as to investigate the lossesof major classes of HRGPs through evolution (e.g. lossof CL-EXTs in mosses, liverworts, and commelinidmonocots).

Tracking individual HRGPs throughout evolution isa major challenge. Our investigation of AtAGP6/11orthologs shows that it is possible to trace HRGPs withspecific characteristics and determine their evolution-ary origins. However, the CL-EXTs in particular, as aconsequence of their variable tandem repeat motifs anddiverse sequence length, cannot be aligned in a mean-ingful waywith the tools developed for folded proteins.Alternative approaches are needed, such as investi-gating length variation and insertions/deletions ratherthan pairwise alignment. An approach that could beuseful is to perform global multiple sequence align-ments using a graph-based method that takes intoaccount the observation that insertions/deletionscan occur anywhere in a repeat (Szalkowski andAnisimova, 2013).

The biggest challenge will be to understand the in-dividual and coordinated roles of HRGPs within thecontext of either a single cell or tissue type. Our dataprovide a valuable resource to initiate system-wideanalyses to identify spatiotermporal expression changesof HRGPs throughout development. Even in theless complex nonvascular plants, GPI-AGPs andCL-EXTs are multigene families, and understanding

Figure 6. Phylogenetic analysis of class 1 GPI-AGPs to identifyAtAGP6/11 orthologs. The ML tree (MEGA) was constructed using pu-tative AtAGP6/11 orthologs identified predominantly from 1KP tran-scriptomes and HMMER model 1 (Supplemental Table S3), with a few

sequences from other sources to increase the breadth of sampling (see“Materials and Methods”; Supplemental Table S4). The tree also in-cludes at least one sequence from each Arabidopsis GPI-AGP subclade,representative rice GPI-AGPs identified by MAAB (see Fig. 2 in Johnsonet al., 2017), and four GPI-AGPs fromA. trichopoda (Amtri_ERM96654,Amtri_ERM95342, Amtri_ERN01113, and Amtri_ERN06202). Sequencenames are colored by 1KP group: basal angiosperms (gray), non-commelinid monocots (purple), commelinid monocots (pink), basaleudicots (aqua), core eudicots (green), asterids (red), and rosids (or-ange). Numbers on the nodes represent support with 100 bootstrapreplicates (70 or greater, green; 60–69 orange; 40–59, black). 1KPsequences are identified by Group_Order_1KP identifier_sequencelocus. Genomic sequences and other sequences from NCBI follow asimilar format but include a five-letter abbreviation of genus andspecies. Symbols next to sequence names are used to indicate thesource and MAAB class of sequences and other relevant information.An asterisk after the sequence name indicates that additional infor-mation is provided in Supplemental Table S4; for example, the YNXR(Poales) data set is contaminated with Asparagales, although the MLtree suggests that these sequences are indeed Poales. The scale bar forbranch length measures the number of substitutions per site.

916 Plant Physiol. Vol. 174, 2017

Johnson et al.

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 14: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

their individual functions will provide excitingchallenges for researchers for many years to come.Defining the role of individual AGPs and EXTs isconfounded by their interactions with other cell wallpolymers; cross-linking of AGPs (Tan et al., 2013) andEXTs to other polymers such as pectins and hetero-xylans has been experimentally verified (for review,see Velasquez et al., 2012). Understanding howwidespread cross-linking of HRGPs to other cell wallpolymers is will be particularly challenging. Theemerging availability of tractable genetic systems fornon-model land plants will allow researchers to usethe information provided by MAAB to explore thesequestions in a range of species.

Glycosylation is a characteristic feature of HRGPsand defines the interactive molecular surface. Pre-dicting the posttranslational modification of HRGPs,including the sites of P hydroxylation and types ofglycosylation, will require detailed structural anal-ysis of many members of each multigene family,from key transitions throughout the plant lineage.Even within a single species, glycosylation is tissuedependent and directed by the context of aminoacids surrounding the glycomotif (Tan et al., 2003;Shimizu et al., 2005; Kurotani and Sakurai, 2015). Formany HRGPs, glycosylation often completely en-cases and, hence, masks the protein backbone, as isthe case for GPI-AGPs, arabino-3,6-galactan pep-tides, and CL-EXTs (see Fig. 1 in Johnson et al., 2017).Furthermore, it also drives the three-dimensionalconformation of the protein backbone and, there-fore, determines the presentation of epitopes on themolecule surface. Given that plant HRGPs have thesame structural complexity as animal mucins, it isexpected that the core protein sequence will influ-ence the extent and type of glycosylation, which alsocan vary in a tissue-/cell-specific manner (Gerkenet al., 2013). It is known that HRGP glycosylationdiffers between volvocine algae and embryophytes,but information on streptophyte green algae islacking. Introducing the same HRGP backbone intomultiple species and tissue types throughout evolu-tion would be an interesting approach to investigatecontext-dependent glycosylation patterns and/orfunctions. A complementary investigation of theglycoysltransferases that act to glycosylate HRGPsalso would greatly assist our understanding of thiscomplex process. An exciting future of HRGP re-search beckons to determine the biological andfunctional significance of this fascinating superfamilyof secreted glycoproteins.

Figure 7. Schematic representation of the GPI-AGP subclades showingthe presence and distribution of specific amino acids in the PAST-richprotein backbone. The order of GPI-AGPs is based on clades in theangiospermGPI-AGP tree (Supplemental Fig. S3); figure parts A to K arebased on alignments (Supplemental Fig. S4), and the AtAGP5/10 (G)subclade is included for comparison (based on the AGP-c subclade; seeFig. 2C in Johnson et al., 2017). A, Putative orthologs of Lys-richAtAGP17/18 have scattered Lys (K) residues throughout the sequenceand contain a short K-rich domain near the C terminus. The glycomotifsare predominantly XP1-2. B, The AGP9 subclade also has a short Lys-richregion with three or more K residues and scattered K residues; however,the glycomotifs are distinct from AtAGP17/18, being predominantly[S/T]P3. C, D, and G to I, AGP4/7/10 (C), AGP2/3 (D), AGP5/10 (G),AGP5 (H), and AGP1 (I) subclades have a classical PAST-rich backbonewith no obvious bias toward other amino acids. E, Most of the putativeorthologs of AtAGP58 are relatively Met (M) rich, with scattered Mresidues between AGP glycomotifs and a Q residue at the presumed N

terminus. This Q residue at the presumed mature N terminus also isfound in putative orthologs of Lys-rich AtAGP17/18 (A), AtAGP9 (B),AtAGP1/2/3/4/7/5/10 (C, D, and G–I), and AtAGP58 (E) but notAtAGP25/26/27 (F) or AGP6/11/59 (J/K). J/K, Putative orthologs ofAtAGP6/11/59 share scattered K residues, concentrated in the first halfof the protein and also a small cluster of acidic residues, either Asp (D)or Glu (E).

Plant Physiol. Vol. 174, 2017 917

Evolution of the Plant HRGP Superfamily

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 15: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

MATERIALS AND METHODS

Phylogenetic Analysis

Phylogenetic analyses were performed using MEGA 6.06 (Tamura et al.,2013). Sequences (untrimmed) were first aligned using MUSCLE (Edgar, 2004)using the default settings, except for Supplemental Figure S3, where gap = 22(default gap = 22.9). The best model was identified [find best protein model(ML) option] and used for each alignment to generate a maximum likelihoodtree (with 100 bootstrap replications) using all sites in the alignment. Sequencesused for phylogenetic analysis (fasta format), and the best model, are given inSupplemental Figure S5.

Data Sets

A summary of data sets and methods is also available at http://services.plantcell.unimelb.edu.au/hrgp/index.html.

Data analysis was based on 1,282 samples downloaded from the official 1KPmirror (onekp.westgrid.ca; as of March 2014; Johnson et al., 2017). Five ad-ditional data sets were analyzed (July 2015) to improve the coverage ofhornworts (UCRN-Megaceros_tosanus, FAJB-Paraphymatoceros_hallii,RXRQ-Phaeoceros_carolinianus-sporophyte, and WCZB-Phaeoceros_carolinianus-gametophyte) and to include the single sample from clade Ginkgoales(SGTW-Ginkgo biloba). These additional five data sets are included in theMAAB input and output files (see Data Files 2–4 in Johnson et al., 2017), andmost of the analysis with these samples is presented, with the exception ofSupplemental Table S2.

MAAB Summary Analysis

The sequences and associated annotations from MAAB classification, in-cluding the assembly in which they were identified, are provided in Data Files3 and 4 in the companion article (Johnson et al., 2017). Data from individualsamples (hereafter referred to as data sets) that were identified by the 1KPconsortium as containing contaminants were removed from analyses (unlessnoted otherwise) and can be found in Data File 5 (Johnson et al., 2017). Thenumber of sequences in each MAAB class for each individual 1KP sample isprovided in Data File 6. The mean number of HRGPs by MAAB class (for each1KP group [Fig. 2C] or commelinid monocot family [Fig. 4, A and C]) wascalculated by averaging the appropriate data in Data File 6 (which reports thenumber of sequences per HRGP class [columns] for each 1KP data set [rows]).Total numbers of HRGPs by 1KP group (Fig. 1) also were generated from DataFile 6. Graphs (Fig. 4, A and C) were generated using R/Bioconductor(Gentleman et al., 2004). DNA sequences of GPI-AGPs (class 1) and CL-EXTs(class 2) are provided in Data Files 7 and 8 respectively.

Reducing Redundancy in 1KP Data Sets

For phylogenetic analyses, redundant sequences were eliminated fromdata sets. The most probable sequence was identified based on existing knowl-edge, as follows:MAAB 1KP output (seeData Files 3 and 4 in Johnson et al., 2017)was sorted by 1KP locus. Sequences for each data set (1KP locus)were then sortedbyHRGP class, and the sequences from the desiredHRGP class were pasted intoa new Microsoft Excel sheet. Sequences were sorted alphabetically by sequence(to group sequences with similar N termini) and converted to tfasta format.Multiple sequence alignments were performed with MUSCLE (http://www.ebi.ac.uk/Tools/msa/muscle/) to identify redundant sequences.Where twoormore sequences had large regions of identity, the longer sequence was retainedunless there were clear artifacts at either the N or C terminus. Retained se-quences were subjected to further rounds of alignment to ensure that onlyunique sequences remained. When redundancy was unclear, both sequenceswere retained in small data sets but only one for larger data sets.

Removing redundancies from the 1KP sequence datawas necessary because,like other short-read sequence data, these offers a wealth of information, butredundancy can occur from multiple sources, including splice variants, se-quencing errors, recent duplications, chimeric transcripts, and polyploidy(Gruenheit et al., 2012; Yang and Smith, 2013; Xie et al., 2014). Sequence re-dundancies were obvious in our analysis (Supplemental Fig. S1, A and B), andtwo relatively common occurrences were read through of GC-rich hairpins inthe cDNA during reverse transcription and poor-quality sequence at one end ofthe paired read. cDNA synthesis for all 1KP data sets was performed at 42°C(Chen Li, personal communication).

TRAL Construction

TRAL version 0.3.5 (Schaper et al., 2015) was used to identify and evaluatetandem repeats present in 1KP sequences from HRGP class 2 (CL-EXTs), class3 (PRPs), and class 24 (less than 15% known motifs), as these MAAB classes arethe most likely to include tandem repeats. Tandem repeats were identifiedusing T-REKS (Jorda and Kajava, 2009) and HHrepID (Biegert and Söding,2008) algorithms. Tandem repeats were accepted only if they passed all thefollowing criteria: (1) tandem repeat unit length of at least 10 amino acids; (2) atleast four tandem repeat unit copies; (3) P, 0.05 using the phylo_gap01 model;and (4) repeat identified by T-REKS and/or HHrepID. A total of 5,139 repeatswere identified and accepted fromMAAB sequences in classes 2, 3, and 24. Eachrepeat was used to construct a circular sequence profile hidden Markov model(Schaper et al., 2015) and then searched with HMMER version 3.1b1hmmsearch (E value cutoff of 1e-5; http://hmmer.org/) to identify repeats inMAAB input data (Data File 2 in Johnson et al., 2017). The resulting 4,070models had two or more hits above threshold to the input data. These hits wereused to construct alignments (maximum of 300 sequences per alignment),randomly chosen after 95% per sample clustering using USEARCH (Edgar,2010), and aligned with MUSCLE version 3.81 (Edgar, 2004) for each model. Atool for browsing and searching TRAL results is available at http://services.plantcell.unimelb.edu.au/hrgp/index.html and was used to determine thenumber of species with class 2 and class 24 TRAL repeats, using keywordsearches by order and/or species (Fig. 3).

BUSCO Analysis

1KP transcriptomes were assessed for completeness using the BUSCOanalysis tool (Simão et al., 2015). The BUSCO plant data set and software weredownloaded from http://buscos.ezlab.org/.

Profile Hidden Markov Models for Finding PutativeAtAGP6/11 Orthologs

Protein sequences were aligned with MUSCLE version 3.81 (Edgar, 2004)using default parameter settings. Sequence alignments were then used as inputinto HMMER version 3.1b1 hmmbuild (http://hmmer.org/) to construct aprofile hidden Markov model. Subsequently, hmmsearch was used (with de-fault thresholds) to search against MAAB 1KP input sequences (see Data File2 in Johnson et al., 2017). Seven different models were evaluated (SupplementalTable S3), and the results of the best model (see Data File 4 in Johnson et al.,2017) are summarized in Supplemental Table S4. For each positive 1KP data set,a single best sequence was selected manually based on presence of N-terminalER signal or full or partial C-terminal GPI signal sequences and then checked byiterative multiple sequence alignment and phylogenetic analysis. To distin-guish between true class 4 non-GPI-AGPs and potential partial GPI-AGPs,proteins needed to have the following GPI anchor signal characteristics: po-tential v, v+1 and v+2 sites, and a basic residue and/or a hydrophobic domain(Eisenhaber et al., 2003), unless noted otherwise (e.g. ZHMB_16956 andXMVD_14518), as summarized in Supplemental Table S4.

Additional AtAGP6/11-like sequences were obtained from other sources(outlined below) for use as full-length controls and to increase the sequencerepresentation in various 1KP groups. TBLASTn was used to search Phyto-zome and/or the nucleotide collection (NR/NT atNCBI) usingword size 2, nofiltering, and restricting organisms to same species, genus, or family as ap-propriate, with either AtAGP6/11 or OsAGP7/10 as query sequence. If un-successful, a suitable 1KP sequence was selected (coding sequence only; DataFile 7) and BLASTn was performed (word size 7 and no filtering). Whenpositive hits were identified, these were used as query sequences to findadditional sequences (maximum of two per species used). Quercus roburAtAGP6/11 putative orthologs were identified in TBLASTn searches ofQ. robur assembly version 1 scaffolds available at https://urgi.versailles.inra.fr/blast/. For the basal angiosperm Amborella trichopoda (not present inPhytozome version 9), GPI-AGPs were identified among annotated proteinsaccessible at ftp://ftp.ensemblgenomes.org/pub/plants/release-31/fasta/amborella_trichopoda/ using 50% PAST as outlined by Schultz et al. (2002).A similar approach was used to find evidence for genomic sequences ofCL-EXTs from selected genomes using 1KP-identified CL-EXTs as query se-quences (Data File 8). For example, 1KP sequence AYMT_Locus_34756identified partial CL-EXT sequence Eucalyptus grandis scaffold_6:36791844.36792375, and VGHH_Locus_4377 identified Aquilegia coeruleascaffold_34: 1845754.1846286).

918 Plant Physiol. Vol. 174, 2017

Johnson et al.

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 16: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

Data Access and Data Sets

Additional data sets for this article are listed below. Numbering continuesfrom Johnson et al. (2017), and a full list of data files is provided there.

Data File 6

Mean number of HRGPs in eachHRGP class (columns) for each 1KP data set(rows), calculated from MAAB output files (Data Files 3 and 4). Filename:SA003_MAAB-hits-summary-by-class-and-sample.xls.

Data File 7

DNA sequences for 1KP GPI-AGPs (class 1) detected by MAAB. The locusidentifier (Data Files 3 and 4) of class 1 GPI-AGPs was used to extract the DNAsequence from the appropriate multiple k-mer assembly. Where more than onelocus identifier is reported for a single protein (e.g. with different k-mers), onlyone DNA sequence is reported. Filename: 1kp_agp_incl_dnas_9-12-2015.xls.

Data File 8

DNA sequences for 1KP CL-EXTs (class 2) detected by MAAB. The locusidentifier (Data Files 3 and 4) of class 2 CL-EXTs was used to extract the DNAsequence from the appropriate multiple k-mer assembly. Where more than onelocus identifier is reported for a single protein (e.g. with different k-mers), onlyone DNA sequence is reported. Filename: class2_incl_dnas_9-12-2015.xls.

Data File 9

Sequences identified by HMMERmodel 1 (putative AtAGP6/11 orthologs).Filename: agp6_model1_hmm_hits_20150922.xls.

Supplemental Data

The following supplemental materials are available.

Supplemental Figure S1. Number of GPI-AGPs and CL-EXTs by tissue inRanunculaceae and Papaveraceae samples represented in 1KP.

Supplemental Figure S2. Bryophyte CL-EXT sequences.

Supplemental Figure S3. Phylogenetic analysis (ML tree) of GPI-AGPs (allsubclades) from selected angiosperms.

Supplemental Figure S4. Conserved features for putative orthologs ofselected Arabidopsis GPI-AGPs.

Supplemental Figure S5. Sequences used for phylogenetic analysis (infasta format).

Supplemental Table S1. Summary of selected AGP and EXT epitopes de-scribed in the literature.

Supplemental Table S2. HRGP detection rate for each 1KP taxonomic order.

Supplemental Table S3. Summary of HMMER models used to identifyputative orthologs of AtAGP6 and AtAGP11.

Supplemental Table S4. Summary of HMMER model performance by 1KPsamples and analysis of sequences selected for phylogenetic analysis.

ACKNOWLEDGMENTS

We thankMark Chase from the KewRoyal Botanic Gardens,Markus Ruhsamand Philip Thomas at the Royal Botanic Garden Edinburgh, and Steven Joya fromthe University of British Columbia for supplying plant samples to 1KP as well asDrs. Birgit and Frank Eisenhaber (Bioinformatics Institute A*Star Singapore) forproviding us with a stand-alone version of the big-PI plant predictor.

Received March 6, 2017; accepted April 21, 2017; published April 26, 2017.

LITERATURE CITED

Adair WS, Hwang C, Goodenough UW (1983) Identification and visuali-zation of the sexual agglutinin from the mating-type plus flagellarmembrane of Chlamydomonas. Cell 33: 183–193

APG IV (2016) An update of the Angiosperm Phylogeny Group classifi-cation for the orders and families of flowering plants: APG IV. Bot J LinnSoc 181: 1–20

Averyhart-Fullard V, Datta K, Marcus A (1988) A hydroxyproline-richprotein in the soybean cell wall. Proc Natl Acad Sci USA 85: 1082–1085

Bacic A, Harris PJ, Stone BA (1988) Structure and function of plant cellwalls. In J Priess, ed, The Biochemistry of Plants, Vol 14. AcademicPress, New York, pp 297–371

Basile DV, Basile MR (1987) The occurrence of cell wall-associated arabi-nogalactan proteins in the Hepaticae. Bryologist 90: 401–404

Bernhardt C, Tierney ML (2000) Expression of AtPRP3, a proline-richstructural cell wall protein from Arabidopsis, is regulated by cell-type-specific developmental pathways involved in root hair formation. PlantPhysiol 122: 705–714

Biegert A, Söding J (2008) De novo identification of highly diverged pro-tein repeats by probabilistic consistency. Bioinformatics 24: 807–814

Bollig K, Lamshöft M, Schweimer K, Marner FJ, Budzikiewicz H, Waf-fenschmidt S (2007) Structural analysis of linear hydroxyproline-boundO-glycans of Chlamydomonas reinhardtii: conservation of the inner core inChlamydomonas and land plants. Carbohydr Res 342: 2557–2566

Borner GHH, Sherrier DJ, Weimar T, Michaelson LV, Hawkins ND,Macaskill A, Napier JA, Beale MH, Lilley KS, Dupree P (2005) Anal-ysis of detergent-resistant membranes in Arabidopsis: evidence forplasma membrane lipid rafts. Plant Physiol 137: 104–116

Bowman JL (2013) Walkabout on the long branches of plant evolution. CurrOpin Plant Biol 16: 70–77

Cannon MC, Terneus K, Hall Q, Tan L, Wang Y, Wegenhart BL, Chen L,Lamport DTA, Chen Y, Kieliszewski MJ (2008) Self-assembly of theplant cell wall requires an extensin scaffold. Proc Natl Acad Sci USA 105:2226–2231

Carpita NC, Gibeaut DM (1993) Structural models of primary cell walls inflowering plants: consistency of molecular structure with the physicalproperties of the walls during growth. Plant J 3: 1–30

Chaw SM, Chang CC, Chen HL, Li WH (2004) Dating the monocot-dicotdivergence and the origin of core eudicots using whole chloroplast ge-nomes. J Mol Evol 58: 424–441

Chen J, Varner JE (1985) Isolation and characterization of cDNA clones forcarrot extensin and a proline-rich 33-kDa protein. Proc Natl Acad SciUSA 82: 4399–4403

Choudhary P, Saha P, Ray T, Tang Y, Yang D, Cannon MC (2015)EXTENSIN18 is required for full male fertility as well as normal vegetativegrowth in Arabidopsis. Front Plant Sci 6: 553

Coimbra S, Costa M, Jones B, Mendes MA, Pereira LG (2009) Pollen graindevelopment is compromised in Arabidopsis agp6 agp11 null mutants. JExp Bot 60: 3133–3142

Coimbra S, Costa M, Mendes MA, Pereira AM, Pinto J, Pereira LG (2010)Early germination of Arabidopsis pollen in a double null mutant for thearabinogalactan protein genes AGP6 and AGP11. Sex Plant Reprod 23:199–205

Cole TCH, Hilger HH (2013) Bryophyte phylogeny. http://www2.biologie.fu-berlin.de/sysbot/poster/BPP-E.pdf

Costa M, Nobre MS, Becker JD, Masiero S, Amorim MI, Pereira LG,Coimbra S (2013) Expression-based and co-localization detection ofarabinogalactan protein 6 and arabinogalactan protein 11 interactors inArabidopsis pollen and pollen tubes. BMC Plant Biol 13: 7

Costa ML, Sobral R, Ribeiro Costa MM, Amorim MI, Coimbra S (2015)Evaluation of the presence of arabinogalactan proteins and pectinsduring Quercus suber male gametogenesis. Ann Bot (Lond) 115: 81–92

Datta K, Schmidt A, Marcus A (1989) Characterization of two soybeanrepetitive proline-rich proteins and a cognate cDNA from germinatedaxes. Plant Cell 1: 945–952

Doblin MS, Johnson KL, Humphries J, Newbigin EJ, Bacic A (2014) Aredesigner plant cell walls a realistic aspiration or will the plasticity of theplant’s metabolism win out? Curr Opin Biotechnol 26: 108–114

Doblin MS, Pettolino F, Bacic A (2010) Plant cell walls: the skeleton of theplant world. Funct Plant Biol 37: 357–381

Domozych DS, Ciancia M, Fangel JU, Mikkelsen MD, Ulvskov P, WillatsWG (2012) The cell walls of green algae: a journey through evolutionand diversity. Front Plant Sci 3: 82

Domozych DS, Wilson R, Domozych CR (2009) Photosynthetic eukaryotesof freshwater wetland biofilms: adaptations and structural characteris-tics of the extracellular matrix in the green alga, Cosmarium reniforme(Zygnematophyceae, Streptophyta). J Eukaryot Microbiol 56: 314–322

Plant Physiol. Vol. 174, 2017 919

Evolution of the Plant HRGP Superfamily

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 17: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

Draeger C, Ndinyanka Fabrice T, Gineau E, Mouille G, Kuhn BM, MollerI, Abdou MT, Frey B, Pauly M, et al (2015) Arabidopsis leucine-richrepeat extensin (LRX) proteins modify cell wall composition and influ-ence plant growth. BMC Plant Biol 15: 155

Edgar RC (2004) MUSCLE: multiple sequence alignment with high accur-acy and high throughput. Nucleic Acids Res 32: 1792–1797

Edgar RC (2010) Search and clustering orders of magnitude faster thanBLAST. Bioinformatics 26: 2460–2461

Eisenhaber B, WildpanerM, Schultz CJ, Borner GHH, Dupree P, Eisenhaber F(2003) Glycosylphosphatidylinositol lipid anchoring of plant proteins: sensi-tive prediction from sequence- and genome-wide studies for Arabidopsis andrice. Plant Physiol 133: 1691–1701

Ellis M, Egelund J, Schultz CJ, Bacic A (2010) Arabinogalactan-proteins:key regulators at the cell surface? Plant Physiol 153: 403–419

Ferris PJ, Waffenschmidt S, Umen JG, Lin H, Lee JH, Ishida K, Kubo T,Lau J, Goodenough UW (2005) Plus and minus sexual agglutinins fromChlamydomonas reinhardtii. Plant Cell 17: 597–615

Ferris PJ, Woessner JP, Waffenschmidt S, Kilz S, Drees J, GoodenoughUW (2001) Glycosylated polyproline II rods with kinks as a structuralmotif in plant hydroxyproline-rich glycoproteins. Biochemistry 40:2978–2987

Filmus J, Capurro M, Rast J (2008) Glypicans. Genome Biol 9: 224Fincher GB, Stone BA, Clarke AE (1983) Arabinogalactan-proteins:

structure, biosynthesis and function. Annu Rev Plant Physiol 34: 47–70Fong C, Kieliszewski MJ, de Zacks R, Leykam JF, Lamport DT (1992) A

gymnosperm extensin contains the serine-tetrahydroxyproline motif.Plant Physiol 99: 548–552

Fowler TJ, Bernhardt C, Tierney ML (1999) Characterization and expres-sion of four proline-rich cell wall protein genes in Arabidopsis encodingtwo distinct subsets of multiple domain proteins. Plant Physiol 121:1081–1092

Gaspar Y, Johnson KL, McKenna JA, Bacic A, Schultz CJ (2001) Thecomplex structures of arabinogalactan-proteins and the journey towardsunderstanding function. Plant Mol Biol 47: 161–176

Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S,Ellis B, Gautier L, Ge Y, Gentry J, et al (2004) Bioconductor: opensoftware development for computational biology and bioinformatics.Genome Biol 5: R80

Gerken TA, Revoredo L, Thome JJ, Tabak LA, Vester-Christensen MB,Clausen H, Gahlay GK, Jarvis DL, Johnson RW, Moniz HA, et al(2013) The lectin domain of the polypeptide GalNAc transferase familyof glycosyltransferases (ppGalNAc Ts) acts as a switch directing glyco-peptide substrate glycosylation in an N- or C-terminal direction, furthercontrolling mucin type O-glycosylation. J Biol Chem 288: 19900–19914

Godl K, Hallmann A, Wenzl S, Sumper M (1997) Differential targeting ofclosely related ECM glycoproteins: the pherophorin family from Volvox.EMBO J 16: 25–34

Gotelli IB, Cleland R (1968) Differences in the occurrence and distributionof hydroxyproline-proteins among the algae. Am J Bot 55: 907–914

Grennan AK (2007) Lipid rafts in plants. Plant Physiol 143: 1083–1085Gruenheit N, Deusch O, Esser C, Becker M, Voelckel C, Lockhart P (2012)

Cutoffs and k-mers: implications from a transcriptome study in allo-polyploid plants. BMC Genomics 13: 92

Harris PJ (2005) Diversity in plant cell walls. In RJ Henry, ed, Plant Di-versity and Evolution: Genotypic and Phenotypic Variation in HigherPlants. CAB International Publishing, Wallingford, UK, pp 201–227

Hashimoto K, Tokimatsu T, Kawano S, Yoshizawa AC, Okuda S, Goto S,Kanehisa M (2009) Comprehensive analysis of glycosyltransferases ineukaryotic genomes for structural and functional characterization ofglycans. Carbohydr Res 344: 881–887

Hervé C, Siméon A, Jam M, Cassin A, Johnson KL, Salmeán AA, WillatsWG, Doblin MS, Bacic A, Kloareg B (2016) Arabinogalactan proteinshave deep roots in eukaryotes: identification of genes and epitopes inbrown algae and their role in Fucus serratus embryo development. NewPhytol 209: 1428–1441

Ishizaki K, Nishihama R, Yamato KT, Kohchi T (2016) Molecular genetictools and techniques for Marchantia polymorpha research. Plant CellPhysiol 57: 262–270

Jermyn MA, Yeow YM (1975) A class of lectins present in the tissues ofseed plants. Aust J Plant Physiol 2: 501–531

Johnson KL, Cassin AM, Lonsdale A, Bacic A, Doblin MS, Schultz CJ(2017) A motif and amino acid bias bioinformatics pipeline to identifyhydroxyproline-rich glycoproteins. Plant Physiol 174: 886–903

Johnson MTJ, Carpenter EJ, Tian Z, Bruskiewich R, Burris JN, CarriganCT, Chase MW, Clarke ND, Covshoff S, Depamphilis CW, et al (2012)Evaluating methods for isolating total RNA and predicting the successof sequencing phylogenetically diverse plant transcriptomes. PLoS ONE7: e50226

Jorda J, Kajava AV (2009) T-REKS: identification of Tandem REpeats insequences with a K-meanS based algorithm. Bioinformatics 25: 2632–2638

Kieliszewski MJ, Lamport DTA (1994) Extensin: repetitive motifs, func-tional sites, post-translational codes, and phylogeny. Plant J 5: 157–172

Kieliszewski MJ, Leykam JF, Lamport DTA (1990) Structure of thethreonine-rich extensin from Zea mays. Plant Physiol 92: 316–326

Kitazawa K, Tryfona T, Yoshimi Y, Hayashi Y, Kawauchi S, Antonov L,Tanaka H, Takahashi T, Kaneko S, Dupree P, et al (2013) b-GalactosylYariv reagent binds to the b-1,3-galactan of arabinogalactan proteins.Plant Physiol 161: 1117–1126

Kleis-San Francisco SM, Tierney ML (1990) Isolation and characterizationof a proline-rich cell wall protein from soybean seedlings. Plant Physiol94: 1897–1902

Knox JP, Linstead PJ, Peart J, Cooper C, Roberts K (1991) Developmen-tally regulated epitopes of cell surface arabinogalactan proteins andtheir relation to root tissue pattern formation. Plant J 1: 317–326

Kofuji R, Hasebe M (2014) Eight types of stem cells in the life cycle of themoss Physcomitrella patens. Curr Opin Plant Biol 17: 13–21

Kurotani A, Sakurai T (2015) In silico analysis of correlations betweenprotein disorder and post-translational modifications in algae. Int J MolSci 16: 19812–19835

Lamport DTA (1977) Structure, biosynthesis and significance of cell wallglycoproteins. Recent Advances in Phytochemistry 11: 79–115

Lamport DTA, Kieliszewski MJ, Chen Y, Cannon MC (2011) Role of theextensin superfamily in primary cell wall architecture. Plant Physiol 156:11–19

Lamport DTA, Miller DH (1971) Hydroxyproline arabinosides in the plantkingdom. Plant Physiol 48: 454–456

Lee KJD, Sakata Y, Mau SL, Pettolino F, Bacic A, Quatrano RS, KnightCD, Knox JP (2005) Arabinogalactan proteins are required for apical cellextension in the moss Physcomitrella patens. Plant Cell 17: 3051–3065

Leliaert F, Smith DR, Moreau H, Herron MD, Verbruggen H, DelwicheCF, De Clerck O (2012) Phylogeny and molecular evolution of the greenalgae. Crit Rev Plant Sci 31: 1–46

Levitin B, Richter D, Markovich I, Zik M (2008) Arabinogalactan proteins6 and 11 are required for stamen and pollen function in Arabidopsis.Plant J 56: 351–363

Li J, Ouyang B, Wang T, Luo Z, Yang C, Li H, Sima W, Zhang J, Ye Z(2016) HyPRP1 gene suppressed by multiple stresses plays a negativerole in abiotic stress tolerance in tomato. Front Plant Sci 7: 967

Li XB, Kieliszewski M, Lamport DTA (1990) A chenopod extensin lacksrepetitive tetrahydroxyproline blocks. Plant Physiol 92: 327–333

Ligrone R, Vaughn KC, Renzaglia KS, Knox JP, Duckett JG (2002) Di-versity in the distribution of polysaccharide and glycoprotein epitopesin the cell walls of bryophytes: new evidence for the multiple evolutionof water-conducting cells. New Phytol 156: 491–508

Lindstrom JT, Vodkin LO (1991) A soybean cell wall protein is affected byseed color genotype. Plant Cell 3: 561–571

Liu X, Wolfe R, Welch LR, Domozych DS, Popper ZA, Showalter AM(2016) Bioinformatic identification and analysis of extensins in the plantkingdom. PLoS ONE 11: e0150177

MacAlister CA, Ortiz-Ramírez C, Becker JD, Feijó JA, Lippman ZB (2016)Hydroxyproline O-arabinosyltransferase mutants oppositely alter tipgrowth in Arabidopsis thaliana and Physcomitrella patens. Plant J 85: 193–208

Majewska-Sawka A, Nothnagel EA (2000) The multiple roles of arabino-galactan proteins in plant development. Plant Physiol 122: 3–10

Matasci N, Hung LH, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S,Nguyen N, Warnow T, Ayyampalayam S, Barker M, et al (2014) Dataaccess for the 1,000 Plants (1KP) project. Gigascience 3: 17

Moesa HA, Wakabayashi S, Nakai K, Patil A (2012) Chemical compositionis maintained in poorly conserved intrinsically disordered regions andsuggests a means for their classification. Mol Biosyst 8: 3262–3273

Moller I, Sørensen I, Bernal AJ, Blaukopf C, Lee K, Øbro J, Pettolino F,Roberts A, Mikkelsen JD, Knox JP, et al (2007) High-throughputmapping of cell-wall polymers within and between plants using novelmicroarrays. Plant J 50: 1118–1128

920 Plant Physiol. Vol. 174, 2017

Johnson et al.

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 18: Insights into the Evolution of Hydroxyproline-Rich · Insights into the Evolution of Hydroxyproline-Rich ... (Godl et al., 1997; Ligrone et al., ... AGPs in similar processes in all

Moore JP, Nguema-Ona E, Fangel JU, Willats WG, Hugo A, Vivier MA(2014) Profiling the main cell wall polysaccharides of grapevine leavesusing high-throughput and fractionation methods. Carbohydr Polym 99:190–198

Motose H, Sugiyama M, Fukuda H (2004) A proteoglycan mediates inductiveinteraction during plant vascular development. Nature 429: 873–878

Nguema-Ona E, Coimbra S, Vicré-Gibouin M, Mollet JC, Driouich A(2012) Arabinogalactan proteins in root and pollen-tube cells: distribu-tion and functional aspects. Ann Bot (Lond) 110: 383–404

Pattathil S, Avci U, Baldwin D, Swennes AG, McGill JA, Popper Z,Bootten T, Albert A, Davis RH, Chennareddy C, et al (2010) A com-prehensive toolkit of plant cell wall glycan-directed monoclonal anti-bodies. Plant Physiol 153: 514–525

Paulsen BS, Craik DJ, Dunstan DE, Stone BA, Bacic A (2014) The Yarivreagent: behaviour in different solvents and interaction with a gum ar-abic arabinogalactan-protein. Carbohydr Polym 106: 460–468

Pereira AM, Lopes AL, Coimbra S (2016) JAGGER, an AGP essential forpersistent synergid degeneration and polytubey block in Arabidopsis.Plant Signal Behav 11: e1209616

Pereira AM, Masiero S, Nobre MS, Costa ML, Solís MT, Testillano PS,Sprunck S, Coimbra S (2014) Differential expression patterns of arabi-nogalactan proteins in Arabidopsis thaliana reproductive tissues. J ExpBot 65: 5459–5471

Pereira AM, Pereira LG, Coimbra S (2015) Arabinogalactan proteins: risingattention from plant biologists. Plant Reprod 28: 1–15

Popper ZA, Michel G, Hervé C, Domozych DS, Willats WG, Tuohy MG,Kloareg B, Stengel DB (2011) Evolution and diversity of plant cellwalls: from algae to flowering plants. Annu Rev Plant Biol 62: 567–590

Schaefer L, Schaefer RM (2010) Proteoglycans: from structural compoundsto signaling molecules. Cell Tissue Res 339: 237–246

Schaper E, Korsunsky A, Pe�cerska J, Messina A, Murri R, Stockinger H,Zoller S, Xenarios I, Anisimova M (2015) TRAL: tandem repeat anno-tation library. Bioinformatics 31: 3051–3053

Schultz CJ, Gilson P, Oxley D, Youl JJ, Bacic A (1998) GPI-anchors onarabinogalactan-proteins: implications for signalling in plants. TrendsPlant Sci 3: 426–431

Schultz CJ, Rumsewicz MP, Johnson KL, Jones BJ, Gaspar YM, Bacic A(2002) Using genomic resources to guide research directions: the arabinoga-lactan protein gene family as a test case. Plant Physiol 129: 1448–1463

Sherrier DJ, Taylor GS, Silverstein KA, Gonzales MB, VandenBosch KA(2005) Accumulation of extracellular proteins bearing unique proline-rich motifs in intercellular spaces of the legume nodule parenchyma.Protoplasma 225: 43–55

Shibaya T, Sugawara Y (2009) Induction of multinucleation by b-glucosylYariv reagent in regenerated cells from Marchantia polymorpha proto-plasts and involvement of arabinogalactan proteins in cell plate for-mation. Planta 230: 581–588

Shimizu M, Igasaki T, Yamada M, Yuasa K, Hasegawa J, Kato T, TsukagoshiH, Nakamura K, Fukuda H, Matsuoka K (2005) Experimental determinationof proline hydroxylation and hydroxyproline arabinogalactosylation motifsin secretory proteins. Plant J 42: 877–889

Showalter AM, Basu D (2016) Extensin and arabinogalactan-protein bio-synthesis: glycosyltransferases, research challenges, and biosensors.Front Plant Sci 7: 814

Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM(2015) BUSCO: assessing genome assembly and annotation complete-ness with single-copy orthologs. Bioinformatics 31: 3210–3212

Smith JJ, Muldoon EP, Willard JJ, Lamport DTA (1986) Tomato extensin pre-cursors P1 and P2 are highly periodic structures. Phytochemistry 25: 1021–1030

Somerville C, Bauer S, Brininstool G, Facette M, Hamann T, Milne J,Osborne E, Paredez A, Persson S, Raab T, et al (2004) Toward a sys-tems approach to understanding plant cell walls. Science 306: 2206–2211

Sørensen I, Domozych D, Willats WGT (2010) How have plant cell wallsevolved? Plant Physiol 153: 366–372

Sørensen I, Pettolino FA, Bacic A, Ralph J, Lu F, O’Neill MA, Fei Z, RoseJKC, Domozych DS, Willats WGT (2011) The charophycean green algaeprovide insights into the early origins of plant cell walls. Plant J 68:201–211

Szalkowski AM, Anisimova M (2013) Graph-based modeling of tandemrepeats improves global multiple sequence alignment. Nucleic Acids Res41: e162

Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6:Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30:2725–2729

Tan L, Eberhard S, Pattathil S, Warder C, Glushka J, Yuan C, Hao Z, ZhuX, Avci U, Miller JS, et al (2013) An Arabidopsis cell wall proteoglycanconsists of pectin and arabinoxylan covalently linked to an arabinoga-lactan protein. Plant Cell 25: 270–287

Tan L, Leykam JF, Kieliszewski MJ (2003) Glycosylation motifs that directarabinogalactan addition to arabinogalactan-proteins. Plant Physiol 132:1362–1369

Tan L, Showalter AM, Egelund J, Hernandez-Sanchez A, Doblin MS,Bacic A (2012) Arabinogalactan-proteins and the research challenges forthese enigmatic plant cell surface proteoglycans. Front Plant Sci 3: 140

Tapken W, Murphy AS (2015) Membrane nanodomains in plants: cap-turing form, function, and movement. J Exp Bot 66: 1573–1586

Tierney ML, Varner JE (1987) The extensins. Plant Physiol 84: 1–2Ulvskov P, Paiva DS, Domozych D, Harholt J (2013) Classification, nam-

ing and evolutionary history of glycosyltransferases from sequencedgreen and red algal genomes. PLoS ONE 8: e76511

van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, DunkerAK, Fuxreiter M, Gough J, Gsponer J, Jones DT, et al (2014) Classifi-cation of intrinsically disordered regions and proteins. Chem Rev 114:6589–6631

van Holst GJ, Clarke AE (1985) Quantification of arabinogalactan-proteinin plant extracts by single radial gel diffusion. Anal Biochem 148: 446–450

Velasquez M, Salter JS, Dorosz JG, Petersen BL, Estevez JM (2012) Recentadvances on the posttranslational modifications of EXTs and their rolesin plant cell walls. Front Plant Sci 3: 93

Velasquez SM, Marzol E, Borassi C, Pol-Fachin L, Ricardi MM, ManganoS, Juarez SP, Salter JD, Dorosz JG, Marcus SE, et al (2015) Low sugar isnot always good: impact of specific O-glycan defects on tip growth inArabidopsis. Plant Physiol 168: 808–813

Velasquez SM, Ricardi MM, Dorosz JG, Fernandez PV, Nadra AD,Pol-Fachin L, Egelund J, Gille S, Harholt J, Ciancia M, et al (2011)O-Glycosylated cell wall proteins are essential in root hair growth.Science 332: 1401–1403

Voxeur A, Höfte H (2016) Cell wall integrity signaling in plants: “To growor not to grow that’s the question.” Glycobiology 26: 950–960

Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, Matasci N,Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA, et al(2014) Phylotranscriptomic analysis of the origin and early diversifica-tion of land plants. Proc Natl Acad Sci USA 111: E4859–E4868

Woessner JP, Goodenough UW (1989) Molecular characterization of azygote wall protein: an extensin-like molecule in Chlamydomonas rein-hardtii. Plant Cell 1: 901–911

Woessner JP, Molendijk AJ, van Egmond P, Klis FM, Goodenough UW,Haring MA (1994) Domain conservation in several volvocalean cell wallproteins. Plant Mol Biol 26: 947–960

Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, Huang W, He G, Gu S, LiS, et al (2014) SOAPdenovo-Trans: de novo transcriptome assembly withshort RNA-Seq reads. Bioinformatics 30: 1660–1666

Xu WL, Zhang DJ, Wu YF, Qin LX, Huang GQ, Li J, Li L, Li XB (2013)Cotton PRP5 gene encoding a proline-rich protein is involved in fiberdevelopment. Plant Mol Biol 82: 353–365

Yang Y, Smith SA (2013) Optimizing de novo assembly of short-read RNA-seq data for phylogenomics. BMC Genomics 14: 328

Plant Physiol. Vol. 174, 2017 921

Evolution of the Plant HRGP Superfamily

www.plantphysiol.orgon July 3, 2018 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.