Upload
abner-malone
View
221
Download
2
Tags:
Embed Size (px)
Citation preview
Next Generation Sequencing Next Generation Sequencing in Virus and Parasite Researchin Virus and Parasite Research
Sanger Read
>800bp
GS-FLX read
~250bp 500 bp
100Mb|
500Mbper run
WGS
Annotation
PopulationDiversity
PathogenDiscovery
Applications Presented
Four main projectsIn the lab
Brugia malayi Genome ProjectParasitic nematode, causes lymphatic filariasis
• Total scaffolds: ~8250• Longest scaffold: 6.5 Mb• Total bases in scaffolds: 71 Mb• Total span of scaffolds: 80 Mb
Genome size ~100Mb
6 chromosomes in 8250 pieces
Sanger(cloning bias)
Closing the
Genome
Next-generation sequencing
Fingerprint maps
Curating the Data
DATABASEMapping 5’ and 3’UTRs
Functional annotation
Re-assemble genome Re-annotate
Brugia malayi Genome ProjectPHASE II – Use Next-Gen Data
(Hybrid Sanger-GSFLX assembly) (Confirm UTRs by GSFLX)
Mix of random reads and paired readsAvg read length: ~220bp
~100 Mb
GS-FLX Sequencing of WormgDNA and cDNA
5 runs= 5X coverage of the genome
5’UTR 3’UTR SL gDNA
Paired-Ends and WGS UTRs
Whole Plate 4-well gasket
Mapping of paired and non-paired reads onto genomic assembly
SEQUENCE ASSEMBLYhits100%
||
80%Paired-ends
No apparent Bias
20Mb of Brugia reads = ~0.25X coverage
Sequencing UTRs of B. malayi
mRNA
PAAAA
CIPTAPRNA ligase
AAAA
RT-PCR
RNA oligoMmeI site
NlaIII
SAGE Tag
Unique sequence
Concatenated SAGE Tags
AAAA
DITAGS
(variable length)
Sequencing Results
One sequence run
~50Mb of data in ~400,000 reads
5’UTR 3’UTR SL
Data processingRaw Data
RemoveLinker, Small tags(<10),
Identical, Junk
Blast against
Genome EST Exon CDS
Unmatched tags
Blast against
Small contigs
Mitochondrion Bacterial singletons
EST
3’-tag
SL-tag
5’-tag
40S ribosomal protein S18
Mapping of Tags
Intra-Host Diversity of Influenza A Virus
Antigenic variants Drug resistant and Sensitive variants
HA1 HA2566aa1,757nt
Amplicons:
Mapped GS-FLX Sequence Readson antigenic domain of Hemagglutinin
450bp
Mapped Translated GS-FLX Reads on Epitopes of HA1 Domain
E D A B D B D D E C
Patterns:Non-Synonymous mutations are predominantly
in epitope regions(13/19 sites)
BBAAAAD#reads23
1221
12212
4137
421
1717811114111
35
Identifying rare variants:Drug resistance mutation
Resistant H1N11/437=0.2%
agt (S) aat (N)
N31S
#reads
Matrix segment in H1N1 isolate
SNP Analyses: Probability that Polymorphism is Real
Base# A C G N T GAP SNP probability
pbShort(polybayes)- Marth Lab, Boston College
Error Correction(homopolymer tracks)
Signal Processing: Length Distribution adjusting the stringency of quality filters
Changes length distributionReads slightly shorter BUT Average quality is higher
Default
Higher stringency
Read length
75,000 – avg ln 20070,000 – avg ln 195
Signal Processing: Quality Distribution
Reduce the # of basesBUTIncrease the proportion ofbases of HIGH QUALITY
Default
Higher stringency
Quality Score
15 Million bp14 Million bp
Whole Virus Genome Sequencing
Limitation of read length BUT:
- Isolate single genome (limited dilution, other?)- Random prime or specific primers with barcodes- use barcode to amplify- Multiplex: 20 barcodes, 16-well gasket = 320 samples
Virus Genomic Library Construction- Discovery -
RNA
RT
PCR
cDNA or
ssDNA
Klenow Exo-DNA polymerase
dsDNA
Select 500 bp amplicons for emulsion PCR and
pyrosequencing
NNNN
NNNN
NNNNNNNN
NNNNNNNNNNNN
NNNNNNNN
NNNNNNNN
1a Reversetranscription
1b DNAextension fromrandom primers
2Amplification
from tags
3Size selection& Sequencing
Multiplexing by Barcoding
Pools
Barcodes mapped onto readsNUCMER
MySQL db
BLASTNBLASTX
Post-Processing Pipeline
Reads clusteredand reduced to a unique set
26,750 contigs BLASTN 56% match human DNA12, 889 contigs BLASTX 120 match viruses
Periodontal Disease Caries
VIR
AL
VIR
AL
VIR
AL
VIR
AL
BA
CT
ER
IAL
BA
CT
ER
IAL
BA
CT
ER
IAL
BA
CT
ER
IAL
Pool 1
Family FamilyFamilyFamily
BU128
WV409
BK026
BR095
HIGH LOW HIGH LOW
TagA
TagB
TagC
TagD
5 2 3 76 84
BU128
WV409
BK026
BR095
WV001
WV213
BK044
BU130
WV001
WV213
BK044
BU130
BR009
WV597
WV631
BU133
BR009
WV597
WV631
BU133
BR023
WV041
BU137
WV628
BR023
WV041
BU137
WV628
Oral Microbiome Project
Bacterial Diversity Heat Maps:
Sequencing of 16S rRNA variable
region
Sequencing of PCR Amplicons 250bp in size
AcknowledgmentsAcknowledgments
School of Dental School of Dental MedicineMedicineMary Marazita
Ghedin LabGhedin LabSchool of MedicineSchool of MedicineJay DePasseAdam FitchXu Zhang
Graduate School of Graduate School of Public healthPublic healthRobert FerrellMike Barmaba
Funding:Funding:
NIDCR/NIHNIDCR/NIH
CTSICTSI
JDRFJDRF
Burroughs-Burroughs-Wellcome FundWellcome Fund
GPCLGPCLDebby Hollingshead Paul WoodJanette Lamb