16
The progress of Glossina genomics at RIKEN GSC Todd Taylor [email protected] RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori) December 15, 2006, IGGI, Sanger, UK

The progress of Glossina genomics at RIKEN GSC Todd Taylor [email protected] RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Embed Size (px)

Citation preview

Page 1: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

The progress of Glossina genomics at RIKEN GSC

Todd [email protected]

RIKEN Genomic Sciences Center, Yokohama, Japan(on behalf of Masahira Hattori)

December 15, 2006, IGGI, Sanger, UK

Page 2: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Background

• Sequencing and analysis of human chromosomes • 11, 18 and 21• Contributed about 4-5% of human genome sequence

• Sequencing and analysis of chimpanzee genomic regions including• Whole-genome BAC-end sequence analysis• Chimpanzee chromosome 22

• Found differences (most minor) in nearly all of the coding genes between human and chimp

• Chimpanzee Y chromosome

• Development of novel methods for gene and promoter prediction• Identifying genes missed by other high-throughput

methods

• Identification of unique regulatory mechanisms

Page 3: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Phase III sequence-related activities

• BAC ends

• Finished BAC clones

• Full length cDNAs

• Whole-genome shotgun

Page 4: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

BAC end sequencing • The first BAC library has been

constructed (Yale) and 100,000 BAC end sequences are being produced (RIKEN)• Not yet• We will be able to sequence the ends of

up to 50,000 BACs (100,000 reads)• Or possibly more if fosmid ends instead?

• Can start from April 2007• Will take about one month

Page 5: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Finished BAC clone sequencing• Five BACs have been fully

sequenced (RIKEN) and no serious 'issues' have arisen.• VMRC29 library (CHORI)

• 97H16, 39G22, 36N9, 31O6, 3E11

• 759,387 bp• GC level: 38.89%• Repeat content: 6.10%

• Using the Drosophila fruit fly genus repeat library

Page 6: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

file name: gmm_clonessequences: 5total length: 759387 bpGC level: 38.89 %bases masked: 46333 bp ( 6.10 %)===================================================== number of length percentage elements occupied of sequence-----------------------------------------------------Retroelements 56 12376 bp 1.63 % SINEs: 0 0 bp 0.00 % Penelope 31 2872 bp 0.38 % LINEs: 49 7695 bp 1.01 % CRE/SLACS 0 0 bp 0.00 % L2/CR1/Rex 7 3181 bp 0.42 % R1/LOA/Jockey 5 1138 bp 0.15 % R2/R4/NeSL 1 51 bp 0.01 % LTR elements: 7 4681 bp 0.62 % BEL/Pao 2 230 bp 0.03 % Gypsy/DIRS1 5 4451 bp 0.59 %

DNA transposons 10 4348 bp 0.57 % Tc1-IS630-Pogo 8 2143 bp 0.28 % Other (Mirage, 1 126 bp 0.02 % P-element, Transib)

Total interspersed repeats: 16724 bp 2.20 %

Small RNA: 3 1357 bp 0.18 %

Simple repeats: 237 12658 bp 1.67 %Low complexity: 366 15594 bp 2.05 %

The query species was assumed to be "Drosophila fruit fly genus".

Homo sapiens ( 4.08 %)Anopheles genus ( 4.52 %)

RepeatMasker

Page 7: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Full-length cDNA sequencing• Full length cDNAs for G. m morsitans

(RIKEN) will be constructed and Sanger will perform a few hundred full length sequences on these. RIKEN will do some 5´ end sequencing.• Full-length cDNA libraries were

prepared by Junichi Watanabe (Univ. Tokyo)

• Sequencing of 9,462 cDNA clones (5' one pass) was recently completed

Page 8: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Whole-genome shotgun sequencing

• RIKEN has applied to Japanese sources for funding for a further 3 million shotgun sequences (~3X coverage).• We failed to get the funding• At present, we have no money for WGS

or additional BAC finishing• Will try for more

• Japanese-African collaborative projects looking somewhat hopeful

Page 9: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

LibraryLibrary Sample InformationSample Information SequencesSequences

TCTC Fat Body/Milk GlandFat Body/Milk Gland 3,0593,059

GMSGGMSG Salivary GlandSalivary Gland 7,4937,493

GMREGMRE ReproductiveReproductive 1,5021,502

GMMGMM MidgutMidgut 7,0157,015

cDNAcDNA Full Length cDNA SequencesFull Length cDNA Sequences 190190

TUM/TUFTUM/TUF Tsetse Fly Whole Genome Tsetse Fly Whole Genome cDNA LibrariescDNA Libraries

9,4629,462

Total Number of SequencesTotal Number of Sequences 28,72128,721

Dataset containing ESTs and partial cDNA sequences

Page 10: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Strategy and results obtained from preliminary Strategy and results obtained from preliminary analysisanalysis28,721 sequences were assembled into contigs and identified singletons28,721 sequences were assembled into contigs and identified singletons

Total Contigs made=3,857; Total Singletons= 10,213Total Contigs made=3,857; Total Singletons= 10,213

Translated contigs and singletons into Six Reading FramesTranslated contigs and singletons into Six Reading Frames

Homology searched in SwissProt and NR protein databasesHomology searched in SwissProt and NR protein databases

Annotated Annotated 2,5692,569 ORFs out of 3,857 contigs ORFs out of 3,857 contigsAnnotated Annotated 2,7832,783 ORFs out of 10,213 singletons ORFs out of 10,213 singletons

CAP3CAP3

3,857 3,857 contigscontigs

30,942 30,942 ORFsORFs

TranseqTranseq 10,213 10,213 singletonssingletons

TranseqTranseq 57,860 57,860 ORFsORFs

33% sequence identity33% sequence identityBLATBLAT

Selected continuous ORFs containing atleast 50 amino acidsSelected continuous ORFs containing atleast 50 amino acids

Page 11: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Drosophila (84%)

Anopheles (2%)

Aedes (3%)Others (6%)

Glossina (5%)

A large percent of ORFs from TseTse A large percent of ORFs from TseTse fly contigs resemble those of ‘fruit fly contigs resemble those of ‘fruit

fly’fly’

Page 12: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

A large percent of ORFs from TseTse A large percent of ORFs from TseTse fly Singletons resemble those of fly Singletons resemble those of

‘fruit fly’‘fruit fly’

Drosophila (81%)

Anopheles (2%)

Aedes (5%)Others (9%)

Glossina (3%)

Page 13: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

METABROWSER : a resource to analyse the METABROWSER : a resource to analyse the metagenomemetagenome

GENE GENE PREDICTIONPREDICTION

FUNCTIONAL FUNCTIONAL ANNOTATIONANNOTATION

Metagenome Metagenome Analysis Analysis PipeLinePipeLine

USERUSER

INPUTINPUT

Genomic Genomic Contigs & Contigs & SequencesSequences

Query the Query the Metagnome Data Metagnome Data

BrowserBrowser

BROWSEBROWSE

ADVANCED ANALYSISADVANCED ANALYSIS

Predicted Predicted GenesGenes

AnnotatedAnnotatedGenesGenes

GLIMMERGLIMMER

GENEMARKGENEMARK

GETORFGETORF

CRITICACRITICA

MetaGeneMetaGene

BLASTBLAST

INTERPROINTERPROSCANSCAN

PLHOSTPLHOST

PROSITEPROSITESCANSCAN

COGsCOGs

Manatee (GO)Manatee (GO)

FingerPRINTscanFingerPRINTscan

JAFA ?JAFA ?

HT-GO-FATHT-GO-FAT

PubSearchPubSearch

BLIMPS (BLOCKS)BLIMPS (BLOCKS)

PfamPfam

MetabolicMetabolicPathwaysPathways

ComparativeComparativeGenomicsGenomics

PhylogeneticPhylogeneticClassificationClassification

ProteinProteinInteractionInteraction

EnzymeEnzymeClassificationClassification

16s ribosomal16s ribosomalRNA analysisRNA analysis

TaxonomicTaxonomicClassificationClassification

PathogenicityPathogenicityindexindex

Origin ofOrigin ofReplicationReplication

SecondarySecondaryStructureStructurePredictionPrediction

Fold PredictionFold Prediction

Other Other AnalysisAnalysis

Page 14: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Metagenome Metagenome Data Browser Data Browser : Data from : Data from our internal our internal

projectsprojects

METABROWSER : a resource to analyse the metagenomeMETABROWSER : a resource to analyse the metagenome

Metagenome Metagenome Data BrowserData Browser

GenesGenes

ProteinsProteins

NovelNovelPathwaysPathways

ComparativeComparativeAnalysisAnalysis

DownloadDownload

SequenceSequence

Novel Novel GenomesGenomes

NovelNovelProteinsProteins

Other RelatedOther RelatedInformationInformation

Page 15: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Current & Future Plans• Sequencing

• More if funding allows

• Analysis• We can contribute to the informatics of

the Glossina genome, including cDNA analysis and annotation

• But we don’t want to duplicate anyone’s efforts

• Also BES mapping and comparative analysis with Drosophila, mosquito, etc.

• ???

Page 16: The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Acknowledgements

• Informatics (RIKEN)• Tulika Prakash Srivastava• Vineet K. Sharma• Todd D. Taylor

• Sequencing & Data Access• Atsushi Toyoda (RIKEN)• Junichi Watanabe (Univ. Tokyo)• Hiroyuki Wakaguri (Univ. Tokyo)• Yamashita (Kitasato Univ.)• Serap Aksoy (Yale)• Geoff Attardo (Yale)

• Other• Masahira Hattori (Univ. Tokyo/RIKEN)• Yoshiyuki Sakaki (RIKEN)