59
FIND MEANING IN COMPLEXITY © Copyright 2014 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in diagnostic procedures. Meredith Ashby, August 2015 The most comprehensive view of the human genome

Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

FIND MEANING IN COMPLEXITY

© Copyright 2014 by Pacific Biosciences of California, Inc. All rights reserved.For Research Use Only. Not for use in diagnostic procedures.

Meredith Ashby, August 2015

The most comprehensive view of the human genome

Page 2: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

2

Human Genome

The Human Genome contains:• Over 3 billion DNA base pairs• Organized into 23 chromosomes• With 2 copies of each chromosome

(one maternal , one paternal )• Carrying 20,000 genes• Each encoding an average of 3 proteins

Accessing variation in the human genome enables dis covery research.

“Much of the missing heritability (the 'dark matter' of the genome) will probably turn up as the technology advances.”- Francis Collins

Nature 464, 674-675 (1 April 2010)

Source: NHGRI Fact Sheet

Page 3: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Human Genetics Research from Past to Future

1st Genome Wide Association Studies (GWAS)Finding Disease Genes by associating SNPs

Exome (Re-) SequencingFinding Disease Genes by identifying protein-coding SNPs

Genome (Re-) Sequencing Finding Disease Genes by identifying SNPs genome-wide

More Comprehensive Genome (Re-) Sequencing Comprehensive profiling of Disease Genes(SNPs, SVs, Haplotypes, Splice Variants)

Whole-Genome de novo SequencingComprehensive profiling of the entire genome (SNPs, SVs, Haplotypes, Splice Variants, Epigenetic s)

Technology Waves after Initial Human Genome Sequenc ing: P

ast

Fut

ure

2nd

3rd

4th

5th

Page 4: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Draft genomes, even with trio data, provide insufficient information to resolve the cause of Mendelian disorders in 40% of cases

4

“Overall, we identified disease-causing variants in 21% of cases, with the proportion increasing to 34% (23/68) for Mendelian disorders and 57% (8/14) in family trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, although only 4 were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis but also highlight many outstanding challenges.”

“To assess factors influencing the success of whole-genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases or families across a broad spectrum of disorders in whom previous screening had identified no pathogenic variants.”

Page 5: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

5Eichler et al., Nature Reviews Genetics 11, 446-450 (June 2010) | doi:10.1038/nrg2809

Page 6: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Levy et al. (2007) PLoS Biology 5: e254

Structural Variation is the Most Predominant Form of Sequence Polymorphism in the Human Genome

“Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases . This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure.”

Page 7: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

For Research Use Only. Not for use in diagnostic procedures.

Whole vs. Hole Genome Sequencing

7

Page 8: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

8

Short Reads and the Cost per Genome Dilemma: Quantity vs. Quality

Sequencing cost is down, but assembling a de novo human genome that meets the same scientific standard as the initial work does NOT follow Moore’s law.

HGP, N50 ~100kb

NCBI-34Contig N50 29Mb

HuRef: 107kb

BGI YH: 7.4kb

KB1: 5.5kb

NA12878: 24kb

CHM1: 144kb

RP11: 127kb

According to NHGRI website, the definition of “sequencing a genome” changed in 2008, when resequencing became favored, trading quality for lower cost.

The 1000 Genomes Project starts in 2008.

Page 9: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Why are High Quality Genomes Important?

To get to a medical grade genome , or a genome that can be used for clinical diagnostic purposes, we need to have the most accurate and complete genome for each individual. We believe that the PacBio SMRT machines will help us reach this goal.”

For the Asian Genome Project, Macrogen and GMI investigators are seeking a more complete Asian reference genome to pursue detailed analyses of the populations in Asia.

"Our goal is to make a complete Asian reference genome for future medical practice," Macrogen's Seo said, noting that the team is pursuing a "medical grade" genome sequence that is highly accurate and can serve as a referenc e in both research and clinical settings.

9

Page 10: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

10

Figure 2. Worldwide Frequency Distribution of APOBE C3B Deletion

PLoS Genet. 2007 Apr; 3(4): e63.

Using a population specific reference improves the mapping accuracy of short read data

Page 11: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

PacBio SMRT Sequencing Has Unique Strengths

1. Contiguity

– Average read lengths up to 12 kb

– Some reads >40 kb

2. Uniformity

– Lack of GC content or sequence complexity bias

3. Accuracy

– Achieves >99.999% (QV50)

– Lack of systematic sequencing errors

4. Native DNA

– No DNA amplification

– Epigenome characterization

Page 12: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Latest P6-C4 Chemistry Read Length Performance

12

Data per SMRT Cell: 0.5 – 1 Gb

20 kb size-selected human library4 hour movieP6-C4 chemistry

Page 13: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

PCR-free Sample Preparation Workflow Means Less Bias

Ross et al. (2013) Characterizing and measuring bias in sequence data. Genome Biology, May 29;14(5):R51

“Pacific Biosciences coverage levels are the least biased ”

Page 14: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

SMRT® Sequencing Accuracy – Errors are Random

1st & 2nd gen (conceptual)

SMRT sequencing

14

Page 15: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Long, Unbiased, Highly Accurate Reads Resolve ‘Difficult-to-Sequence’ Regions

• Resolve long palindromes• Identify structural variants• Obtain accurate microsatellite lengths

• Span homopolymeric, low-complexity, and highly repetitive regions

• Delineate tandem repeats

15

Loomis et al. (2013) Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene. Genome Research, 23(1):121-8

Fragile X gene with >2 kb of repeat regions

PacBio reads span extreme CGG repeats and AT-rich

regions

Page 16: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Year Technology Assembler Sample

2007 ABI 3730 Celera HuRef

2009 Illumina GASOAP

de novoBGI YH

2010454 GS Flx

TitaniumNewbler KB1

2010 Illumina GA ALLPATHS-LG NA12878

2013454 GS, HiSeq,

MiSeqNewbler RP11_0.7

2014HiSeq, BAC

clonesReference-

guidedCHM1

2014 PacBio RS II FALCON CHM1

2015 PacBio RS II FALCON CHM13

2015 PacBio RS II FALCON AK1

2015 PacBio RS II FALCON HuRef

2015 PacBio RS II FALCON PC-9*

2015 PacBio RS II FALCON SK-BR-3*

Human Genome De Novo Assemblies Comparison

Data sources: HuRef (Venter) (http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0050254); BGI YH (http://genome.cshlp.org/content/20/2/265.abstract Table II); KB1 (http://www.nature.com/nature/journal/v463/n7283/full/nature08795.html); NA12878 (http://www.pnas.org/content/ early/2010/12/20/1017351108.abstract Table3); CHM1 Illumina (http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2/)

*cancer cell lines

0.11

0.007

0.006

0.024

0.13

0.14

4.38

12.98

7.28

10.38

3.58

2.56

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Contig N50 (Mb)

Page 17: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Short Read Data Contains Coverage Gaps that Thwart Assembly

PacBiocoverage

PacBioreads

HiSeqcoverage

HiSeqreads

Chromosome

Genes

PacBio RS II data at http://datasets.pacb.com/2014/Human54x/fast.html

Illumina HiSeq data at http://www.ncbi.nlm.nih.gov/sra/SRR642636

Page 18: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

SHANK3 – Involved in Autism

HiSeq data miss 1st exon and upstream region entirely

PacBiocoverage

PacBioreads

HiSeqcoverage

HiSeqreads

Chromosome

Genes

Page 19: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

SHANK3 – Involved in Autism

85% GC-rich

CCCCGTCACAGCCCCCCAGACCCCCGCCCCGTGGCTCGGCCCCCGCCCTCCGCACACACCT

CCCGCCCCCACCCGGGACCCCGCAAGTAACCCCCCAGCACTGGCCCTGAGCCCTCCCGGCC

CCCGCCTCCGGCGCAGCCCCCTCGCCACCCCCGCTTCCCTCCCGTCTCAGGCCCCCTCCCCC

CGCCGCCCCCGCCCCCGGGGAAGGCAGGCGCCGAGCTGAGCCGGGGCCGATGCAGCTG

AGCCGCGCCGCCGCCGCCGCCGCCGCCGCCCCTGCGGAGCCCCCGGAGCCGCTGTCCCCC

GCGCCGGCCCCGGCCCCGGCCCCCCCCGGCCCCCTCCCGCGCAGCGCGGCCGACGGGGC

TCCGGCGGGGGGGAAGGGGGGGCCGGGGCGCCGCGCGCGGAGTCCCCGGGCGCTCCG

TTCCCCGGCGCGAGCGGCCCCGGCCCGGGCCCCGGCGCGGGGATGGACGGCCCCGGGG

CCAGCGCCGTGGTCGTGCGCGTCGGCATCCCGGACCTGCAGCAGACGGTGAGCCCCG

Genes

PacBiocoverage

PacBioreads

HiSeqcoverage

HiSeqreads

Chromosome

Genes

Page 20: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

For Research Use Only. Not for use in diagnostic procedures.

Uncovering Structural Variation in Gold Standard Genomes

Page 21: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Types of Structural Variation (SV) in a Human Genome

Hurles et al. (2008) Trends in Genetics 24: 238-245

Page 22: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Behavioral Diseases Associated with Structural Variation

Girirajan & Eichler (2010) Human Molecular Genetics 19: R176-187

Page 23: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Pang et al. (2014) G3 (Bethesda) 4: 63-65

Structural Variation Detection by 2nd Gen Technologies

“Generating longer reads … can mitigate these shortcomings.”

“We observed that current high-throughput sequencing approaches only detected a fraction of the full size-spectrum of insertions, deletions, and copy number variants compared with a previously published, Sanger-sequenced human genome.”

Page 24: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Human Genetic Variation Sequencing with PacBio Long Reads

1 10 100 1 kb 10 kb 100 kb 1 Mb 10 Mb 100 Mb

Size of Variant

Variant Type

SNPs

Small Indels

STRs & VNTRs

Large Insertions,

Deletions, CNV

Mobile Elements

Complex Variants

Phased SV’s

Indels

Repeat Expansions

One PacBio Read Spans Most Variants Structural VariantsPhased Alleles

Complex Regions

Assembled PacBio Reads Span Euchromatic Genome Varia tion

L1, Alu, SVA

Copy Number Variation

Inversions

Phased SNVs Phased Alleles

Very Large SV’s

Phased Haplotypes

Large Structural Rearrangement

Page 25: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Human Genome Sequencing with the PacBio RS II

Chaisson et al. (2014) Nature doi:10.1038/nature13907

Page 26: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Updated CHM1 Assembly With PacBio’s Newest Chemistry

26

Chr.2

Chr.20

Chr.6

Chr.21

Hydatidiform Mole (CHM1) Assembly Contig Alignments to Selected Chromosomes

From ~50x CHM1 data with the latest P6 chemistry, we getan assembly of

Total = 2.9 Gbp

N50 = 27.9 Mbp

Max = 109.3 Mb

Two contigs ~ 109Mb

Assembly contigs ~ Chromosome arms

Assembler code available at https://github.com/PacificBiosciences/FALCON

Page 27: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

PacBio CHM1 Data vs. GRCh37 & 1000 Genomes Project

• Resolved 26,079 euchromatic structural variants at the base-pair level• ~22,000 (85%) of these are novel• 6,796 of the events map within 3,418 genes• Validation rate of 97%, only a fraction of which are detectable with short reads• Closes/extends 55% of the remaining gaps in human reference genome

Chaisson et al. (2014) Nature doi:10.1038/nature13907

Page 28: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

PacBio CHM1 Data vs. GRCh37 & 1000 Genomes Project

• The percentage repeat composition (x axis) of1-kb sequences flanking insertion sites for Alu, L1 and SVA mobile element insertions.

• Insertion calls from the 1000 Genomes Project (pink)21 compared to calls from CHM1 using PacBio reads (blue) show short read data misses repeat-rich insertion events.

Chaisson et al. (2014) Nature doi:10.1038/nature13907

Page 29: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

SV Insertion Sequence Data Missing from GRCh37 Reference

29

An additional net insertional bias of 3.9 megabases(Mb) of additional sequence is revealed in the PacBio CHM1 when compared to the human reference.

Chaisson et al. (2014) Nature doi:10.1038/nature13907

Page 30: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Short Tandem Repeats (STR’s) Easier to Detect with Long Reads

• 6,007 STR’s

• 2,285 of these expanded STRs occur within genes

• 2,760 Tandem Repeats30

Page 31: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Neurologic Diseases Associated with STRs Occurring within Genes

http://www.dialogues-cns.com/publication/a-glossary-of-relevant-genetic-terms/

Page 32: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

For Research Use Only. Not for use in diagnostic procedures.

Revealing the Complexity of Cancer Genomes

32

Page 33: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

SK-BR3 Cancer Genome Sequencing, Dick McCombie, CSHL

Davidson et al, 2000

Cancer genomes can have tremendous structural variation, with chromosomal fragmentation, rearrangement, and large duplication events.

Most commonly used Her2-amplified breast cancer cell line

W. Richard McCombie, CSHLhttp://schatzlab.cshl.edu/presentations/2015/2015.02.27.AGBT%20PacBio%20SKBR3.pdf

Page 34: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Genome-wide Alignment Coverage of PacBio Reads Reveals the Highly Aneuploid Nature of this Cancer Genome

• Genome-wide coverage averages around 54X • Coverage per chromosome varies greatly as expected in this highly aneuploid

genome

Her2

W. Richard McCombie, CSHLhttp://schatzlab.cshl.edu/presentations/2015/2015.02.27.AGBT%20PacBio%20SKBR3.pdf

Total Assembly: 2.64 GbpContig N50: 2.56 MbpMax Contig: 23.5 Mbp

Page 35: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Complex Structural Variant Discovery is Facilitated By Long Reads

Chromosome AChromosome A

Chromosome BChromosome B

Chromosome A

Chromosome B

• Alignment-based split read analysis efficiently captures most structural variation events

• Follow up with the Parliament workflow is planned

W. Richard McCombie, CSHLhttp://schatzlab.cshl.edu/presentations/2015/2015.02.27.AGBT%20PacBio%20SKBR3.pdf

Page 36: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

• 94 high-confidence inter-chromosomal translocations were identified in the SKBR3 genome assembly

• 1200 Intra-chromosomal translocations were found

A High Quality Assembly Enables the Direct Detection of Interchromosomal Translocations

Her2

W. Richard McCombie, CSHLhttp://schatzlab.cshl.edu/presentations/2015/2015.02.27.AGBT%20PacBio%20SKBR3.pdf

Page 37: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

PacBio

Her2

chr17

A Gold Standard Assembly Enables Cancer Lesion Reconstruction

By comparing the proportion of reads that are spanning or split at breakpoints we can begin to infer the history of the genetic lesions.

1. Healthy diploid genome2. Original translocation into chromosome 83. Duplication, inversion, and inverted duplication within chromosome 84. Final duplication from within chromosome 8

W. Richard McCombie, CSHLhttp://schatzlab.cshl.edu/presentations/2015/2015.02.27.AGBT%20PacBio%20SKBR3.pdf

Page 38: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Oncogeneamplifications

ErbB2(Her2/neu)

≈20X

MYC ≈27X

MET ≈8XKnown Gene fusions Confirmed by PacBio reads?

TATDN1 GSDMB Yes

RARA PKIA Yes

ANKHD1 PCDH1 Yes

CCDC85C SETD3 Yes

SUMF1 LRRFIP2 Yes

WDR67 (TBC1D31) ZNF704 Yes

DHX35 ITCH Yes

NFS1 PREX1 Yes *read-through transcription

CYTH1 EIF3H Yes *nested inside 2 translocations

Gold Standard Assemblies Reveal the Locations of Oncogene Duplications and Gene Fusion Events

W. Richard McCombie, CSHLhttp://schatzlab.cshl.edu/presentations/2015/2015.02.27.AGBT%20PacBio%20SKBR3.pdf

Page 39: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Her2+ Breast Cancer Reference Genome is Available for Download

http://schatzlab.cshl.edu/data/skbr3/

Available now under the Toronto Agreement:• Fastq & BAM files of aligned reads• Interactive Coverage Analysis with BAM.IOBIO

Available soon• Whole-genome assembly and methylation analysis• Comparison to single cell analysis of >100 individual cells• Iso Seq Data

W. Richard McCombie, CSHLhttp://schatzlab.cshl.edu/presentations/2015/2015.02.27.AGBT%20PacBio%20SKBR3.pdf

Page 40: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Comprehensive View of a Cancer Genome & Epigenome

• PC-9 non-small cell lung cancer cell line

• Apply PacBio sequencing for:– De novo long-read assembly of a cancer

genome

– Characterization of gene fusions

– Genome-wide methylomecharacterization

• Sequenced both drug-sensitive as well as drug-resistant sample

Drug-sensitive

Sustained treatment with erlotinib

Drug-resistant

Stably drug-

resistant

Mutations

Drug-sensitive

MT inhibitor

Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri(Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

Page 41: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

PC-9 Cancer Genome De Novo Assembly

PacBio

# of contigs 12,359

Contig N50 1.044 Mb

Max contig length 26.6 Mb

PacBio assembly performed by J. Chin. Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

Page 42: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

PC-9 Cancer Genome De Novo Assembly

PacBio Short-read 1 Improvement

# of contigs 12,359 424,605 34x

Contig N50 1.044 Mb 0.018 Mb 58x

Max contig length 26.6 Mb 0.28 Mb 95x

PacBio assembly performed by J. Chin. Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

1Nagarajan et al. (2012) Whole-genome reconstruction and mutational signatures in gastric cancer. Genome Biology 13: R115.

Page 43: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Mapping against Human Reference Sequence

• Observing unusual sequencing coverage across chromosomes in both samples (CHM1 as control):

• Example: chromosome 3:

PC-9: CHM1:

43

Note: the gap in the center is the centromere (no reference sequence available)

Page 44: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Long Reads Reveal Gene Fusions

• Example: CPSF3-ASAP2

CPSF3

Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

Page 45: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Gene Fusion Mapping Result: CPSF3-ASAP2

• Exact break points identified:

ASAP2CPSF3

Same read, split-mapped

Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Susuki, S. Morishita (U of Tokyo)

Break point 1: Chr2: 9,391,888

Break point 2: Chr2: 9,438,556

Page 46: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Detection of DNA Base Modifications Using Kinetics

Example: N6-methyladenine

Flusberg et al. (2010) Nature Methods 7: 461-465

A G C T mA G T T Template strand

C G A G C T AG TTC A T G T Template strand

• SMRT Sequencing uses kinetic information from each nucleotide addition to call bases

• This same information can be used to distinguish modified and native bases by comparing results of SMRT Sequencing to an in silico kinetic reference for incorporation dynamics without modifications.

Page 47: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Detection of DNA Base Modifications by SMRT Sequencing

Flusberg et al. (2010) Nature Methods 7: 461-465

SMRT Portal v1.3.3+ can recognize and annotate multi-site modified-base signatures.

5-mC

4-mC

6-mA

Calculation of IPD ratios across the reference gives information about base modification at every position.

Page 48: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Detecting Hypo-methylated CpG Islands in Eukaryotic DNA

• 16-fold per-strand coverage maximizes the accuracy of the results • Algorithm is freely available at https://github.com/hacone/AgIn

48

• The signal strength for 5mC is weaker than for other methylated bases and so requires higher coverage to identify. However, in eukaryotes methylation is a regional phenomenon.

• Prof. Shinichi Morishitadeveloped a method to differentiate hypo- and hyper-methylated regions by integrating the signals across CpG islands

http://www.ashg.org/2014meeting/abstracts/fulltext/f140120432.htm

Page 49: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Epigenome Characterization

• Global methylation status:

Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

• More

hypermethylated

CpG islands in

drug-resistant

sample

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 2260

65

70

75

80

85

% H

yper

met

hyla

ted

CpG

Isla

nds

Chromosome

Drug-sensitive Drug-resistant

Page 50: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Epigenome Characterization

• Methylation status of CpG islands (https://github.com/hacone/AgIn)

• Chr4: ANKRD17 (breast cancer)

Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

Page 51: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Epigenome Characterization

• Methylation status of CpG islands (https://github.com/hacone/AgIn)

• Chr4: WHSC1 (myelomas)

Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

Page 52: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Epigenome Characterization

• Methylation status of CpG islands (https://github.com/hacone/AgIn)

• Chr4: FGFR3 (fibroblast growth factor 3)

Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

Page 53: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Epigenome Characterization

• Methylation status of CpG islands (https://github.com/hacone/AgIn)

• Chr4: RASSF6 (tumor suppressor gene)

Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

PacBio Long-read Sequencing

Allows for Detailed, De Novo Characterization

of Cancer Genome & Epigenome

Page 54: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

For Research Use Only. Not for use in diagnostic procedures.

Take Structural Variant Discovery to ‘11’ with Low PacBio Coverage

54

Page 55: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Hybrid Structural Variant Calling Using Multiple Data Sources

Data sources include whole-genome PacBio Long-Read sequencing at 10x coverage (10,000 bp read length)

English et al (2015). Assessing structural variation in a personal genome—towards a human reference diploid genome BMC Genomics

Page 56: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

Structural Variant Discovery with 10X PacBio Coverage

PBHoney alone, with only 10x PacBio coverage, identifies 4,268 SVs supported by hybrid assembly, representing events “invisible” to PE data.

English et al (2015). Assessing structural variation in a personal genome—towards a human reference diploid genome BMC Genomics

“Applying multiple Parliament workflows, we demonst rate that while method integration is optimal for SV detectio n in Illuminapaired-end data, the addition of long-read data can more than triple the number of SVs detectable in a personal g enome.”

Page 57: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

SV Calling: Illumina-only method comparison

Limitations of Illumina-only methods for SV Calling:

High False Discovery Rate (FDR)- Ranges from 3% to > 80%

Low Sensitivity:- Ranges from < 8% to a max of

57%

English et al (2015) BMC Genomics

“Despite these benefits of a multi-algorithm approach, Illumina-only discovery still only recovers approximately half of the 9,777 SVs identified by multi-source Parliament”

Page 58: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

PacBio Provides the Most Comprehensive View of Genetic Variation in a Human Genome

“We now have access to a whole new realm of genetic variation that was opaque to us before.”

“Knowing all the variation is going to be a game changer .“

- Professor Evan Eichler, University of Washington

Source: http://www.sciencenewsline.com/articles/2014111017120055.html

Page 59: Meredith Ashby, August 2015cgs.hku.hk/portal/files/GRC/Events/Seminars/2015... · 2007 ABI 3730 Celera HuRef 2009 Illumina GA SOAP de novo BGI YH 2010 454 GS Flx Titanium Newbler

For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seqare trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.

59