Upload
kalila
View
63
Download
0
Embed Size (px)
DESCRIPTION
Whole Genome Sequencing for Colorectal Cancer. Ulrike ( Riki ) Peters Fred Hutchinson Cancer Research Center University of Washington. Overview. Significance and rationale Current efforts on rare and less frequent variants Specific aims and design of whole genome sequencing grant. - PowerPoint PPT Presentation
Citation preview
WHOLE GENOME SEQUENCING FOR COLORECTAL CANCER
Ulrike (Riki) PetersFred Hutchinson Cancer Research CenterUniversity of Washington
Overview• Significance and rationale• • Current efforts on rare and less frequent variants
• Specific aims and design of whole genome sequencing grant
Structure Biology Biology Advancing Improvingof genomes of genomes of diseases medicine healthcare &
prevention
1990-2003Human Genome Project
2004 - 2010
2011- 2020
Beyond 2020
Progress of Genomic Research(adapted from Green and Guyer Nature 2011)
Examples of GWAS for Drug TargetsDrug Drug target Drug indication GWAS traitStatins HMGCR Hypercholesterolemia LDL, cholesterolZnt8 agonists SLC30A8 Type 2 diabetes Type 2 diabetes
Ustekinumab IL12B Psoriasis, Crohn’s disease Psoriasis, Crohn’s
For additional examples, see Sanseau et al. Nat Biotechnol 2012
Drug Drug target Current drug indication GWAS traitNepicastat DBH Post-traumatic stress
disorder Smoking cessation
Denosumab/AMG-162
TNFSF11 Osteoporosis/bone cancer Crohn’s disease
Biib-003 LINGO-1 Multiple sclerosis Essential tremor
Examples of GWAS for Drug Repositioning
Use of GWAS Findings to Inform Screening Decisions (using breast cancer as example)
So et al. Am J Hum Genet 2010
Colors show 10-year risk of breast cancer at different risk percentiles based on 13 GWAS loci
Average 10-year risk of breast cancer for a 50-year-old woman is 2.4%
What is Known About the Genetic Contribution of Colorectal Cancer
Scandinavian Twin Registry, Lichtenstein et al. New Engl J Med 2000
Cancer Site
Heritable Factors
Environmental FactorsShared Non-shared
Prostate 0.42 (0.29-0.50) 0 (0-0.09) 0.58 (0.50-0.67)
Colorectal 0.35 (0.10-0.48) 0.05 (0-0.23) 0.60 (0.52-0.70)
Bladder 0.31 (0.00-0.45) 0 (0-0.28) 0.69 (0.53-0.86)
Breast 0.27 (0.04-0.41) 0.06 (0-0.22) 0.67 (0.56-0.76)
Lung 0.26 (0.00-0.49) 0.12 (0-0.34) 0.62 (0.51-0.73)
Colorectal Cancer GWAS 21 GWAS loci Each SNP associated
with a modest increase in risk
Published and Newly Discovered Colorectal Cancer Susceptibility Loci
Houlston Nat Genet 2010; Tomlinson Nat Genet 2008; Zanke Nat Genet 2007; Haiman Nat Genet 2007; Hutter. BMC Cancer 2010; Tomlinson Nat Genet 2008;Tenesa Nat Genet 2008; Tomlinson Nat Genet 2011; COGENT Nat Genet 2008; Jaeger Nat Genet 2008; Broderick Nat Genet 2007; Peters, Hunter Hum Genet 2011; Dunlop Nat Genet 2012; Peters Gastroenterol (submitted)
Identified within GECCO
Estimated Total Number of GWAS HitsPhenotype Estimated
number of GWAS hits
(95%CI)
Total genetic variance explained (95%CI)
Height 201 (75, 494) 16.4 (10.6, 30.6)
Crohn’s disease 142 (71, 244) 20.0 (15.7, 28.0)
Breast, Prostate and Colorectal Cancer 67 (31,173) 17.1 (11.6, 35.8)
Park et al. Nat Genet 2010
=> Known familial syndromes, such as FAP and Lynch Syndrome explain less than 3-5%
Estimated Total Number of GWAS HitsPhenotype Estimated
number of GWAS hits
(95%CI)
Total genetic variance explained (95%CI)
Height 201 (75, 494) 16.4 (10.6, 30.6)
Crohn’s disease 142 (71, 244) 20.0 (15.7, 28.0)
Breast, Prostate and Colorectal Cancer 67 (31,173) 17.1 (11.6, 35.8)
Park et al. Nat Genet 2010
=> Known familial syndromes, such as FAP and Lynch Syndrome explain less than 3-5%
What Explains Missing Heritability of Cancer?• Additional familial syndromes• Heritable epigenomic variability • Gene-gene and gene-environment interaction• Less frequent and rare variants • Structural variations/ Copy number variation (CNV)• Others or heritability may be overestimated
What Explains Missing Heritability of Cancer?• Additional familial syndromes• Heritable epigenomic variability • Gene-gene and gene-environment interaction• Less frequent and rare variants • Structural variations/ Copy number variation (CNV)• Others or heritability may be overestimated
Most Genetic Variation is Rare
Green ESPOrange ENCODEBlue HapMap
GWAS only investigated ~15% of genetic variation
Next-Generation sequencing can identify rare variants
Minor allele frequency
all rare variants all rare variants
Feasibility to Identify Genetic Variants by Risk Allele Frequency and Strength of Genetic Effect
Manolio et al. Nature 2009
Feasibility to Identify Genetic Variants by Risk Allele Frequency and Strength of Genetic Effect
Manolio et al. Nature 2009
Overview• Significance and rationale
• Current efforts on rare and less frequent variants
• Specific aims and design of whole genome sequencing grant
16
Current efforts in GECCO to Search for Less Frequent and Rare Variants (Genetics and Epidemiology of Colorectal Cancer Consortium)
The global view of genetic contribution to colorectal cancer
GECCOCoordinati
ng Center
WHI
ARCTIC
VITAL
DACHS
PLCO
CPS
ASTERISK
DALSColo2&3
MEC
PHS
HPFS
NHS
CCFR
MECC
NGCC
HRT-CCFR
FHCRC Coordinating
Center
~30,000 subjectsU01 and X01, Peters, 2009-2013
• Imputation to 1000 Genomes Project in ~28,000 samples with GWAS
• Exome chip genotyping • On about 25,000
samples• CIDR Pilot
• Whole exome sequencing on 130 high risk colorectal cancer cases + 30 controls
17
NHLBI - Exome Sequencing Project• Whole Exome Sequencing of 7,000 European and African
Americans to identify rare variants associated with common complex diseases
• Sequencing centers• Broad • University of Washington
• Cohorts• Women’s Health Initiative• HeartGo
• ARIC, CARDIA, CHS, FHS, JHS, MESA
• LungGo
• Phenotypes • Early On-set MI• Early onset/FH+ Stroke• Extreme BMI/T2D• Extreme Lipids• Extreme Blood pressure• COPD• Pulmonary hypertension • Cystic fibrosis
18
Whole Exome vs Whole Genome• Exome covers only 1-2% of genome
• 88% of all GWAS findings are outside of the well-studied protein-coding regions• 78% of GWAS findings with MAF<5%
19
Junk No More: ENCODE Project Finds "Biochemical Functions for 80% of the
Genome“
The ENCODE Project Consortium, “An integrated encyclopedia of DNA elements in the human genome" Nature 2012
Overview• Significance and rationale• • Current efforts in GECCO on rare and less frequent variants
• Specific aims and design of whole genome sequencing grant
Aims of the U01 Sequencing Grant• Aim 1. To identify novel CRC susceptibility variants
across the genome, mainly variants with allele frequency 0.1-5% • Rare variants <1%• Less frequent variants 1-5%• Common variants >5%
• Aim 2. To investigate whether known environmental risk factors for CRC modify genetic susceptibility to CRC (Gene-Environment interactions)
Study Design Overview
R01; PI: Peters
Funding Information • 17% Budget Cut • 4 year instead of 5 year• U01 designation• Expected start date: before 9/31/12
Total budget cut 33%
Whole Genome SequencingN=1,600 cases, 1,600 controls
Imputation of WGS DataN=9,129 cases, 11,728 controls
Aim 1.1 Aim 1.2
FReplicationN=3,100 cases, 3,100 controls; ~3,000 variants
Gene-Environment Interaction Analyses2-Stage Screening, Weighted Hypothesis,
Empirical Bayes
Association Testing Individual & Aggregated Variants
Aim
1A
im 2
N=10,729 cases, 13,328 controls; ~18M variants
Aim 1.3
Aim 2Aim 1.2
Total sample size is 13,829 cases and 16,428 controls
Classes of Genetic Variants Being Examined
Variant Type Definition in This Proposal
Expected #
Single nucleotide variant (SNV)
Single base pair change with MAF>0.1% & <5%
~13- 15M
Single nucleotide polymorphism (SNP)
Single base pair change with MAF>5%
~5 M
Insertion/deletion (indel) Insertion/deletion or inversion <50bp
~1.5- 2M
Copy number variant (CNV)
Insertion/deletion or inversion >5kb
~20K
StudiesStudy Cases Controls GWAS #SNPs
Studies with GWAS (sequencing and imputation)ARCTIC 850 800 100K, 500KDACHS 2,900 2,400 300KDALS 1,100 1,200 300K, 550K, 610KHPFS 850 850 730KMEC 400 400 300kNHS 500 900 730KPHS 400 400 730KPLCO 1,200 1,800 300K, 610K, 500KASTERISK 1,000 1,000 300KVITAL 300 300 300KWHI 1,300 2,200 300K, 550KStudies with no GWAS (replication)North German CCS 4,000 4,000 N/ACPS-II 1,000 1,000 N/AMECC 3,400 3,000 N/ANon-whites 400 650Total 20,000 21,000
Data Harmonization of Environmental Risk Factors
• Collecting 74 variables in 11 categories
• Multi-step collaborative process leading to common data elements with standardized definitions, permissible values and coding
Meta-analysis across 15 studies
Sequencing and Genotyping• At Genome Science, University of Washington• Whole genome-sequencing
• At lower depth • Illumina HiSeq• In years 1 to 3• Total ~1,600 cases and 1,600 controls
• Year 1: ~600• Year 2: ~1,000• Year 3: ~1,700
• Replication genotyping • In years 3 and 4 • 6,200 samples for 3000 SNPs• 2,400 samples for 384 SNPs
Variant Calling Based on Sequencing Data
• Variant calling • Depended on depth of sequencing• Multi-sample calling improves accuracy and, hence, we will call in
batches of increasing # of samples
• Structural variation/copy number variant (CNV) calling
• Indel and CNV calling is error prone and requires genotyping follow up• Follow-up genotyping on 384 SNPs in 1,600 samples
Imputation of Sequencing data into GWAS
• Imputation• Use whole genome sequencing data as
reference panel to impute into samples with only GWAS data
• Important points raise:• Imputation accuracy improves with increasing
sample size of reference panel (samples with whole genome sequencing data)
• Imputation accuracy improves with increasing denser GWAS platform
• Follow-up genotyping on 384 SNPs in 800 samples
Whole genome sequence
3200 samples~18M variants
GWAS 19,000 samples
Statistical Analysis • Marginal and burden testing
• Single variant test• Aggregated tests to test all rare variants across defined region,
such as a gene• Motivation:
• Mendelian diseases show that multiple different mutations can lead to disease • Rare variants tested individually have limited power to show association
(unless highly penetrant)
• Gene-environment interaction testing
Advisory Committee• NCI
• Stephen Chanock• Daniela Seminara• Peggy Tucker
• Suggestions for external investigators• Mike Boehnke (U of Michigan)• Elaine Mardis (Washington U in St. Lois) • Nicole Soranzo (Wellcome Trust / Sanger Inst) • Stephen Thibodeau (Mayo Clinic, Rochester)
Timeline
Activities Yr 1 Yr 2 Yr 3 Yr 4 Yr 5Sample preparation and QA/QC
Whole genome sequencing and variant calling (Aim 1.1)Imputation and association testing (Aim 1.2)
Replication genotyping (Aim 1.3)
GxE analysis (Aim 2)
Preparation of manuscripts