19
Analyzing Copy Number Variation in the Human Genome Jeff Bailey S5-432

Analyzing Copy Number Variation in the Human Genome

  • Upload
    landen

  • View
    133

  • Download
    0

Embed Size (px)

DESCRIPTION

Jeff Bailey S5-432. Analyzing Copy Number Variation in the Human Genome. Continuum of Genomic Variation. Forms of genetic variation. Nucleotide. Single base-pair changes Point mutations (1 per 800 bp) Small insertions/deletions Frameshift, microsatellite, minisatellite Mobile elements - PowerPoint PPT Presentation

Citation preview

Page 1: Analyzing Copy Number Variation in the Human Genome

Analyzing Copy Number Variationin the Human Genome

Jeff BaileyS5-432

Page 2: Analyzing Copy Number Variation in the Human Genome

Forms of genetic variation.

Cytogenetics

Nucleotide

Continuum of Genomic VariationSingle base-pair changes

Point mutations (1 per 800 bp)Small insertions/deletions

Frameshift, microsatellite, minisatellite Mobile elements

Retroelement insertions (300bp -10 kb)Large-scale genomic copy number variation (>10 kb)

Large-scale DeletionsSegmental Duplications

Local Rearangements

Chromosomal variation

Translocation, inversion, fusion

Stru

ctural V

ariants (S

V)

Co

py N

um

ber V

ariation

Page 3: Analyzing Copy Number Variation in the Human Genome

(blue line)

>green

>red

Gain

Loss

Gain

METHOD 1: Copy Number Variation:Array Comparative Genomic Hybridization

30% CNVs overlap duplicated regions (variant SD = CNV) ( Sebat et al. Science 2004)

Modified:Feuk et al. Nat Rev Genet 2006

Two genomic surveys of normal individuals identified 76 and 255 CNV regions by array CGH ( Sebat et al. Science 2004; Iafrate et al. Nat Genet 2004)

Page 4: Analyzing Copy Number Variation in the Human Genome

Segmental Duplications (SD)

Bailey and Eichler (2006) Nat Rev Genet

Properties:•Clustered•Complex regions

99.1% identical over 180 kb (VCF/DiGeorge Syndrome in 1 in 3000 births)

5.4% of the genome (>90% identity and >1 kb)chr22

Page 5: Analyzing Copy Number Variation in the Human Genome

SDs predispose to copy number variation

Cen TelI

D D’

CenI D’D

Tel

Tel

Cen

Cen

GAMETES

D D’I I

Change in Dosage Sensitive Genes → phenotype or disease

Dynamic Regions – predisposed to further rearrangements

Non-allelic Homologous Recombination (Lupski, 1999)

D’- D

D - D’

Page 6: Analyzing Copy Number Variation in the Human Genome

Complex disease associations

CNV Disease Association

CCL3L1 Decreased copies cause HIV/AIDS susceptibility (Gonzalez et al. 2005). Increased copies increase risk of rheumatoid arthritis.(Mckinney et al. 2008)

FCGR3B Decreased copies increases risk for lupus nephritis (Aitman et al. 2006)

APP Duplication leading to (Rovelet,Lecrux et al. 2006)UGTB17 Deletion associated with 2-fold increased risk of osteoporosis (Yang et al. 2008)

Synuclein Triplication causes Parkinson Disease (Singleton et al 2003)

DEFB4 More than 5 copies of beta-defensins associated with 1.7-fold increased risk of psoriasis (Hollox et al. 2008). Less than 4 copies is associated with 3-fold increased risk for Crohn disease.(Fellermann et al. 2006)

LCE3B & LCE3C

Multigene deletion of late cornified envelope genes are associated with psoriasis (de Cid, et al. 2009)

1) Recurrent germline rearrangements causing congenital disease2) Rare CNVs causing disease in a small proportion of affected individuals

in a Mendelian fashion3) Common CNVs that are responsible for a proportion of complex genetic risk in many individuals

Page 7: Analyzing Copy Number Variation in the Human Genome

< 32 kb Putative Insertionwithin fosmid

>48 kb Putative Deletionwithinfosmid

Method 2: End-Sequence Pair (ESP) Analysis

~1.1 million fosmid end-sequence pairs derived from a single donor (sequenced by MIT to help close gaps in the reference genome)

InversionsInsertion Deletion

Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8X genome coverage)

Results:

Fosmid insert size tightly distributed around mean (40 kb)

Compare fosmid optimal placements to detect deviations from expected.

fosmid

insert

ConcordantFosmid:

ReferenceGenome

Tuzun*, Bailey*, Sharp* et al. Nat. Genet 2005

Page 8: Analyzing Copy Number Variation in the Human Genome

Fosmid SV Project Fosmid End Sequencing 8 HapMap Individuals

1695 structural variants 525 novel insertion sequences

(Kidd et al. 2008 453:56)NAHR-non-allelic homologous recombination NHEJ-- repair of double strand breaksVNTR-- strand slippageRetrotransposition-- insertion of L1, SVA or Alu element

Page 9: Analyzing Copy Number Variation in the Human Genome

Method 3: Whole Genome Sequencing

Genome Resequencing Studies SNPs: 3,2 M bases Non-SNP: 9.1 M bases

22% events, 74% variant bases

(Levy et al Plos Biol 2007:e266) Read Depth, Mismapping Pairs Future: Perfect Whole Genome Assembly

Page 10: Analyzing Copy Number Variation in the Human Genome

Summary of Human Genome Copy Number Variation (12/2006)

Summary of recent analyses of structural variation in the human genome (12/06).Reference Analysis # Individuals # Events Av. Bp Median (bp) Total Mbp

Mills, 2006 Align trace data 36 415434 20 2 8.36Hinds, 2006 Oligo arrayCGH 1000 1379 947 0.14McCarrol, 2006 HapMap SNP genotyping 269 538 16874 6887 9.08Conrad, 2006 HapMap SNP genotyping 180* 609 34996 17217 21.31Tuzun, 2005 Paired End-sequence 1 269 55706 25230 14.98Redon, 2006 Affyx 500 K data 269 980 165996 63140 162.68Iafrate, 2004 BAC Array-CGH 55** 246 146189 150395 35.96Sharp, 2006 BAC Array-CGH 47 124 170019 164704 21.08Wong, 2006 BAC Array-CGH 105 1365*** 185504 175314 253.21Sebat, 2004 ROMA-CGH 20 72 350670 199800 25.25Redon, 2006 BAC Array-CGH 269 913 349880 227889 319.44All Vars NA NA 323573 1901 2 615.10All Vars > 1 kb NA NA 4131 148578 93356 613.77

*- effectively independent individuals equal to number of trios

** - 39 healthy controls, 16 with karyotype abnormalities

*** - accounting for only those sites that showed in 2 or more individuals

20% of the human genome is CNV? 3000+ genes with exons in these regions CNV?

(Currently 30% of genome and 9473 genes)

Page 11: Analyzing Copy Number Variation in the Human Genome

How many genes are truly CNV?

Lack of Breakpoint Precision? BACs: 150-250 kb clones of which

only a part of the sequence may be CNV

False positives? Multiple studies: Increase

the proportion of false positives since true positivestend to overlap

BAC

geneCNV

Study#1#2#3

TP FP

Page 12: Analyzing Copy Number Variation in the Human Genome

Design of Custom oligonucleotide aCGH

1 3

Select genomic regions to target for probe design

Select oligonucleotide probe sequences (average 12/exon) and place on microarray

Merge overlapping regions

2

•Equal number of probes per exon (exon size 3 bp – 10 Kb).•Limitation: NimbleGen algorithm creates equally spaced probes across a region.

Bailey et al. Cytogenet Genome Res 2008

Page 13: Analyzing Copy Number Variation in the Human Genome

+1.1 SD +1.4 SD

Step #1: Seed

Mean intensity

difference-0.2 SD +1.1 SD +1.4 SD +0.6 SD +1.2 SD

Detection Method

Exon 1

Exon 2 Exon 3 Exon 4 Exon 5

4-exon Partial-gene CNV

Log2 probeintensity

Probe Regions

ExonStructure

Hybridization

Step #2: Extension

+0.6 SD +1.2 SD-0.2 SD

Bailey et al. Cytogenet Genome Res 2008

Page 14: Analyzing Copy Number Variation in the Human Genome

CNV in RHD

Gene Model

GM18507

GM18517

GM18956

GM19129

GM12156

GM18502

GM19240

ExonsProbe Regions

GM12878

GM18555

SegmentalDuplications

Chr 1 (kb)2525,350 25,370 25,390

Bailey et al. Cytogenet Genome Res 2008

Page 15: Analyzing Copy Number Variation in the Human Genome

Detecting >500 bp and >5% freq

Conrad, et al. 2009 Nature

8,599 CNV regions: 3.7% of genome (112.7 Mb)2 genomes: 1,098 CNVs 0.78% (24 Mb)

Page 16: Analyzing Copy Number Variation in the Human Genome

Causal CNVs

Conrad, et al. 2009 Nature

Page 17: Analyzing Copy Number Variation in the Human Genome

Infectious Disease Genetics

Complex interplay that results in infectious disease phenotype Potential host defense responses and pathogen virulence are encode

in respective genomes.

SD and CNV represent key mechanisms for adaptation and diversification of responses for both host and pathogen.

The study of SD and CNV is necessary to fully understand the genetics and biology of infectious disease pathogenesis.

Human Genome

Pathogen Genome

Environment Vector Genome

Page 18: Analyzing Copy Number Variation in the Human Genome

Human CNV typing and association studies

Comprehensive CNV Typing Chip (1st generation)

Collaboration with the Eichler Lab Preferentially targeting gene CNVs

(5,000 CNVs → 1000 genic regions → 30% host defense) Agilent and NimbleGen oligoarray platforms

Defining copy number responsive probes Defining copy specific probes to remove cross-

hybridization Case-control studies to examine infectious disease

and immune phenotypes for association with CNVs

Page 19: Analyzing Copy Number Variation in the Human Genome

Human Malaria

Malaria: 2-3 million deaths per year “strongest known force for evolutionary selection in

the recent history of the human genome” (Kwitkowski 2005 Am J Hum Genet)

HbS, HbC, HbE, thalassemia, ABO, Duffy null, SE Asian ovalocytosis, IL-4, CR1, HLA-DRB ...

Hypothesis: Strong selection will have impacted CNVs

Testing case-control samples for CNV associations with resistance to infection and cerebral malaria.