Upload
joshua-der
View
518
Download
8
Embed Size (px)
DESCRIPTION
http://www.intl-pag.org/18/abstracts/W54_PAGXVIII_396.htmlAs the sister lineage to seed plants, ferns (i.e. monilophytes) are an important clade for comparative evolutionary studies of land plants. Additionally, with the evolution and maintenance of free-living and photosynthetic gametophyte and sporophyte life stages, ferns are an ideal group for studies of both life-cycle evolution in land plants and genome function in haploid and diploid phases. The development of genomic resources in ferns lags far behind that in other plants, due primarily to large genome sizes and the absence of economic crop species. High-throughput sequencing technologies have now enabled genome-scale studies in non-model organisms. We present an analysis of the gametophyte transcriptome of the bracken fern, Pteridium aquilinum. A full-length enriched, normalized cDNA library was generated with RNA derived from a pool of sexually mature male, female, and hermaphroditic gametophytes and sequenced with the Roche 454 GS FLX Titanium chemistry. A total of 681,722 reads with a mean length of 372.6 bp remained after quality filtering, repeat masking, and primer/vector screening. Cleaned reads were assembled de novo, resulting in 50,658 assembled unigenes with a mean length of 637.65 bp and a total length of 32.65 MB (5.49X unigene read-depth coverage). Unigenes were BLASTed against the inferred proteins of ten complete plant genomes and pseudo-annotated with the GO-slim vocabulary. 34,254 unigenes (68%) had a BLAST best hit and were assigned a tentative functional annotation. We also present an assessment of transcriptome coverage and explore the utility of these data for comparative evolutionary and functional genomic studies in land plants.Authors: Joshua P Der(1), Michael S. Barker(2,3), Norman Wickett(4), Claude W. Depamphilis(4) and Paul Wolf(1,5)(1) Department of Biology, Utah State University, Logan, UT 84322, USA(2) The Biodiversity Research Centre and Department of Botany, University of British Columbia, Vancouver BC V6T 1Z4, CANADA(3) Department of Biology and Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA(4) Department of Biology, Institute of Molecular Evolutionary Genetics, and The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA.(5) Ecology Center, Utah State University, Logan, UT 84322, USAhttp://www.intl-pag.org/18/abstracts/W54_PAGXVIII_396.html
Citation preview
Functional Genomics of Fern Gametophytes:
Transcriptome Sequencing in Pteridium Aquilinum
Joshua Der, Michael Barker, Norman Wickett, Claude dePamphilis and Paul Wolf
Acknowledgments
Coauthors:• Michael Barker (U. British Columbia) - project design and
transcriptome assembly• Norman Wickett and Claude dePamphilis (Penn State U.) -
transcriptome annotation and interpretation of results• Paul Wolf (Utah State U.) - project design, funding,
interpretation of results, and general supportUtah State University:• Aaron Duffy - tissue culture & bioinformatics help• Mike Pfrender - RNA lab space & equipment• VP for Research & Center for Integrated BioSystems -
research funds• Dept. of Biology, Center for Integrated BioSystems, & Ecology
Center - travel fundsIndiana University:• Keithanne Mockaitis - cDNA library preparation & 454
sequencingUniversity of British Columbia:• Katrina Dlugosch - sequence cleaning scriptPenn State University:• Eric Wafula - general scripting help
Fern Evolution
Ferns
Lycophytes
Bryophytes
Seed PlantsSister to seed plants
Ancient lineage (Devonian)
~11000 extant species
High diversity in morphology, geography, and ecology
Evolved and maintain independent gametophyte and sporophyte generations
Fern Evolution
haploid spores (n)
meiosis
sperm (n)
egg (n)
zygote (2n)
Fern life cycle
syngamy
Sister to seed plants
Ancient lineage (Devonian)
~11000 extant species
High diversity in morphology, geography, and ecology
Evolved and maintain independent gametophyte and sporophyte generations
Fern Genetics
Recessive alleles are not masked in haploid gametophytes
Gametic phase segregation and recombination can be directly observed
Controlled crosses can be performed to produce double haploid sporophytes (i.e. complete homozygotes)
Apogamy and apospory can be induced, unlinking ploidy and life stage Klekowski 1971
Challenges In Fern Genetics
Limited agronomic importance
Large genome sizes (avg. 10 Gb)
High chromosome numbers (avg. n = 57)
Extensive history of hybridization and polyploidy
Photo credit: Mike Windham
Fern Genomics
Genomic resource development in ferns has lagged far behind those in flowering plants (but wait for Mike's talk next)
No fern genome sequencing projects have been funded
New high throughput sequencing has started to bring the power of genomics to non-model organisms
www.454.com
Bracken Fern: Pteridium aquilinum
Worldwide distribution
Toxic to livestock and weedy in pasture, so has been extensively studied
Highly adaptable and phenotypically plastic
Established culture techniques
Model system for understanding the fern life cycle, gametophyte development, and sex determination
Phylogeny is well characterized
Paleopolyploid with diploid gene expression
Genome size: 1C = 9.8 GbLindman. 1917-1926. Bilder ur Nordens Flora-508
The Fern Gametophyte Transcriptome
How has the fern life cycle influenced genome evolution?
What genes are active in the gametophyte generation?
What is the functional profile of these genes?
Do gametophyte specific genes experience purifying selection?
Do reproductive proteins have a signature of positive selection or is their rate of molecular evolution elevated?
What is the function of "flowering" gene homologues in fern gametophytes?
Sequence Pre-processing: Cleaned ESTs
RNA from whole gametophytes: male, female, and bisexual
cDNA library normalized and enriched for full-length mRNA
Reads were quality and length filtered, adapter and polyA/T trimmed
Cleaned reads: 681,722Mean length: 372.60 bpTotal bases: 254 Mb
Histogram of cleaned reads
Cleaned read length, maximum = 624
Num
ber o
f seq
uenc
es
0 100 200 300 400 500 600
050
0010
000
1500
0
EST Assembly: Unigenes
Two-step strategy for EST assembly to reduce redundancy in the unigene set:
1. ESTs were first assembled in MIRA2. Assembly passed to CAP3 to join
additional contigs
Histogram of transcriptome unigenes (CAP3)
Unigene length, largest transcript = 4897 bp
Num
ber o
f seq
uenc
es
0 500 1000 1500 2000 25000
2000
4000
6000
Total unigenes = 38889
Mean length = 685.76 bp
Total bases = 26.67 Mp
Assembly: MIRA (1º)
CAP3 (2º)
# singletons: 638 183
# 1º contigs: 50,020 32,801
# 2º contigs: 0 5,905
# unigenes: 50,658 38,889
mean unigene length: 637.7 bp 685.8 bp
largest unigene length: 4,489 bp 4,897 bp
total consensus: 32.30 Mb 26.67 Mb
Transcriptome CoverageTo assess the depth and breadth of transcriptome coverage, we compared our assembly with the predictions from a simulation model using ESTcalc
Wall et. al., 2009. BMC Genomics 10:347
Parameters ESTcalc Actual (CAP3)
Technology 454 GSFLX 454 GSFLX (Titanium)
Library type normalized normalizedReads/plate 681,722 681,722Read length 372.6 bp 372.6 bp
OutputTotal sequence amount 254 MB 254.0076 MBTotal assembled sequence 26.2 MB 26.67 MBPercent transcriptome (A) 87 % ?Percent of genes tagged (B) 100 % ?Unigene count (C) 32,044 38,889Mean unigene length (D) 819 bp 685.8 bpSingleton yield (E) 19 % 0.0047 %Percent of genes with 90% coverage 69.8 % ?Percent of genes with 100% coverage (F) 23.7 % ?
Transcriptome Annotation
Two complementary strategies for functional annotation
1. BLAST unigenes in NCBI nr protein database
GO annotation using Blast2GO
Broad functional perspective with a rich objective GO annotation
2. BLAST to inferred proteomes of 10 complete plant genomes
Pseudo-annotated based on MCL cluster membership in PlantTribes2.0
Tribe and OrthoGroup assignment, GO-slim function, and Arabidopsis gene id & description
Plant gene family classification with detailed information from well-curated reference genomes
Transcriptome Annotation: nr BLASTx
Top-Hit species distribution
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000BLAST HITs
Physcomitrella patensVitis vinifera
Picea sitchensisRicinus communis
Populus trichocarpaArabidopsis thaliana
Oryza sativaSorghum bicolor
Glycine maxZea mays
Gossypium hirsutumMedicago truncatula
unknownAdiantum capillus-veneris
Ceratopteris richardiiNicotiana tabacum
Marchantia polymorphaSolanum tuberosum
Chlamydomonas reinhardtiiAlsophila spinulosa
Ginkgo bilobaMicromonas sp.
Pteris vittataElaeis guineensis
Pinus taedaSolanum lycopersicum
Micromonas pusillaTriticum aestivum
Gossypium barbadenseothers
46%54%
Positive BLAST hitNo BLAST hit
21,097 of 38,889 unigenes with positive hit (e-value cutoff 1e-10)
Transcriptome Annotation: Blast2Go
Localization of genes is predominantly in the nucleus, mitochondria, and plastids
cellular_component Level 5
endoplasmic reticulum
(317)
nucleoplasm (376)
vacuole (274)
Golgi apparatus
(212)
microbody (119)
plastid (3,613)
cytoskeleton (238)
nucleus (1,325)
endosome (10)
nucleolus (197)
nuclear lumen (555)
cytosol (448)
mitochondrion (1,967)
Cellular Component - GO level 5
Transcriptome Annotation: Blast2Go
Two main biological processes involve metabolism and cellular machinery
biological_process Level 2multicellular organismal
process (166)
localization (1,713)
multi-organism process (15)
growth (41)
establishment of localization
(1,713)
reproduction (73)
biological regulation (853)
developmental process (194)
reproductive process (29)
cellular process (7,432)
regulation of biological
process (716)
response to stimulus (908)
metabolic process (7,641)
Biological Process - GO level 2
Transcriptome Annotation: Blast2Go
Two main molecular functions are binding (DNA, RNA, and protein) and catalytic activity (hydrolase and transferase activity)
molecular_function Level 2
enzyme regulator
activity (106)
binding (8,120)
transcription regulator
activity (409)
structural molecule
activity (542)
translation regulator activity (1)
transporter activity (908)
molecular transducer
activity (357)
catalytic activity (7,915)
Molecular Function - GO level 2
Transcriptome Annotation: PlantTribes2.0
25,172 of 38,889 unigenes with positive hit, e-value cutoff 1e-5
Unigenes classified into 7,126 Tribes and 9,548 OrthoGroups
35%
65%
Positive BLAST hitNo BLAST hit
Transcriptome Annotation: PlantTribes2.0
Some interesting results:
Single unigene similar to LEAFY
one copy found in seed plants, two in Physcomitrella and Selaginella
Single unigene similar to SEPALLATA3
a gene family absent from gymnosperms, thought to have originated with flowers and required by B and C floral organ identity genes to function
Single unigene similar to PISTILLATA and two unigenes similar to CAULIFLOWER
not known in gymnosperms, Physcomitrella, or Selaginella
WARNING: these annotations are based on BLAST which may return distant homologues. A detailed phylogenetic examination is needed!
Future Work
Sequence the sporophyte transcriptomeTranscriptome profiling in various life stages/tissues (RNA-seq)Examine gene family evolution in land plantsRNA editing in the chloroplast genomePopulation genomics (with mined SSR and SNP loci)Linkage mapping
Thank You!
Collecting bracken in the Rocky Mountains with my field assistant