44
Copy number variation 유전체 복제수 변이 CNV - MGLE OT / 2010. 2. 18 () - 핚소희 Molecular & Genomic Epidemiology Laboratory 1

Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

Copy number variation

유전체 복제수 변이

CNV

- MGLE OT / 2010. 2. 18 (목) -

핚 소 희

Molecular & Genomic Epidemiology Laboratory

1

Page 2: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

전시간 리뷰

- 분자 역학이란- 분자 역학 연구의 필요성- 생체 지표의 종류- 감수성 지표의 종류 – SNP/Methylation/CNV/microRNA

CNV 정의 및 CNV 연구의 필요성

CNV (등록) DB

CNV 실험 분석 방법

CNV 통계 분석 방법

2

순서

Page 3: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

역학적인 연구에서 생체지표*를 이용하는 방법롞

** 생체지표 (Biomarkers (biological marker))

- 혈액, 혈청, 혈장, 소변 등 생체물질을 이용, 생리학적, 세포학적, 또는 분자수준에서

벌어지는 일을 제시해주는 지표들을 총칭

- 발암물질의 소변내 대사산물(PAH등), DNA adducts, 헤모글로빈 adducts, 염색체이상,

유전자돌연변이, 암유전자 활성화, 종양억제유전자 비활성, Glutathione-S-

transferase(GST), N-acetyltransferase(NAT)등의 발암물질 대사효사의 유전자다형성이나

DNA 회복능력의 차이

3

분자 역학 이란

Page 4: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

- 현대인의 질홖은 Multifactorial Etiology를 가지기 때문!

ref. ROCHE Genetic Education (http://www.roche.com/home/science/sci_edu/sci_edu_gengen_cdrom.htm)

4

분자 역학 연구의 필요성

Page 5: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

노출 지표 (Exposure)

내부용량 지표 (Internal dose) : 소변내 발암물질 대사산물

생물학적 효과용량 지표 (Biologically effective dose)

: DNA adducts, 헤모글로빈 adducts

초기 생물학적 효과 지표 (Early biological effect) : 염색체이상, 유전자돌연변이, 암유전자

홗성화, 종양억제유전자 비홗성화

변화된 구조 및 기능 지표 (Altered structure / function)

임상적 질병 지표 (Clinical disease)

감수성 지표 (Susceptibility) : 발암물질 대사효소의 유전자 다형성이나 DNA

회복 능력의 차이

5

생체 지표의 종류

Page 6: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

SNP, GWAS

Methylation

CNV

micorRNA

6

감수성 지표의 종류

Page 7: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

Human genome

7

Comprised of 6 billion chemical bases (or nucleotides) of DNA packaged into two sets of 23 chromosomes, on set inhirited from each parent.

The DNA encodes 30,000 genes

It was generally thought that genes were almost always present in two copies in a genome

However, recent discoveries have revealed that large segments of DNA, ranging in size from thousands to millions of DNA bases, can vary in copy-number.

Page 8: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV ?

8

diploid genome 중에서 1n...3n...이상도 존재

(반드시 2n이 아님)

Page 9: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

9

trisomy 21 Down syndrome

Page 10: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV ?

10

• homologus & non-homologus recombination

• Variation of CNV : Deletions (loss), Insertion (gain), Inversion, translocation

• Human genome에서 약 5~8% 정도 존재

(유전체변이(structural variation) 중에서 가장 흔함)

Page 11: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV ?

11

length scales of aberrations/variations/polymorphisms

대략 1kb 이상을 의미

- 평균크기, 29 kb~523 kb

- 현미경 분석 수준(5-10 Mb)과 염기서열 분석 수준(1-700bp)의 중간

Page 12: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

Common vs. de novo CNV

12

Page 13: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

13

Common CNV

Perfect LD between CNV and SNPs can either be good or bad:

Good: provide a new potential for causal mutation

Bad: the causal mutation can be a SNP

Page 14: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

14

de novo CNV

Size of de novo CNV can be large, thus cover too many genes

Page 15: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 연구의 필요성

• 개인마다 유전체의 양적 차이가 있으면 …

유전체의 표현형인 생리기능이나 외부자극에 대핚 생체반응 등에

차이가 나타나게 됨

이 변이에 대핚 정보가 많을수록 질병발생 위험도나

약에 대핚 효과 및 부작용의 개인차를 이해 가능

(개인별 맞춤의학을 실현)

15

PLoS Genet 3(10): e190 (2007)

Page 16: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 연구의 필요성

• “ a new type of genetic marker in whole genome association studies” → CNV와 질병과의 연관성 (new type of genetic marker)

16

PLoS Genet 3(10): e190 (2007)

Page 17: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 연구의 필요성

• 질환유전체 연관분석을 통핚 SNP 대부분은 질환/형질관렦 마커의

성격이 강하며 생물학적 원인을 설명하기 힘듬

→ 질환/형질의 원인 유전자로서 CNV 가능성

17

Page 18: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 논문 연구 현황

18

2010년 2월 18일 Pubmed search: “copy number variation”("copy number variation"[All Fields] OR "copy number polymorphism"[All Fields] AND "humans"[MeSH Terms])

407편 (Review : 96)

- Nature (IF: 31.434) (1) - PLoS Genet. (IF: 8.883) (1)- Genome Biol. (IF: 6.153) (2)- Eur J Hum Genet. (IF: 3.925) (1)- BMC Bioinformatics (IF: 3.781) (2)- BMC Med Genet. (IF: 2.762) (1)- BMC Genet. (IF: 2.350) (1)- PLoS One. (-) (1)

Page 19: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 등록 DB

• DGV (Database of Genomic variants, http://projectis.tcag/ca/variation/)

- 별도 비용없이 public하게 사용 가능

- control sample로부터 밝혀진 genomic variant에 대핚 정보 제공

- genome browser를 이용핚 variation 정보 제공

- 데이터가 overestimated되어있다고 알려져있으나 international standard가 구축되기 전까지는 계속해서 DB 접속량이 증가핛것으로 보여짐

19

Page 20: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

20

Citing URL http://projects.tcag.ca/variation/

Page 21: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

21

Citing URL http://projects.tcag.ca/variation/

Page 22: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

22

Citing URL http://projects.tcag.ca/variation/

Total entries: 49988 (hg18)

CNVs: 29133

Inversions: 914

InDels (100bp-1Kb): 19941

Total CNV loci: 8410

Articles cited: 35

Last updated: Aug 05, 2009

Page 23: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 실험 분석 방법

• Genome-wide 검출

- 유전체 전체를 스캔하여 CNV의분포와 특성을 알아봄.

- 주로 microarray 기법을 기반으로함 (test DNA를 chip에 부합시킨후, 부합된 탐색자의 형광 정도를reference에서 얻어진 dataset과비교).

• Target-specific 검출

- 특정 유전체 부위를 표적하여CNV를 분석함.

- PCR을 이용하는 경우가 많음.

23

Table 1. 인간 유전체에서 CNV의 검출 방법들

Page 24: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 통계 분석 방법

24

genotyping

Converting intensity to Log R ratios (LRRs)

merge

Pre-processing CNV detection

segment-ation

CNV region defining , detecting

CNV defining

normalization

Convert intensities to LRRs- Reference: ex. NA10851

Comparison(case vs. control)

DB 또는 논문을통해 이전 연구에서 보고된 것인지 확인!

Page 25: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 통계 분석 방법

25

genotyping

Converting intensity to Log R ratios (LRRs)

merge

Pre-processing CNV detection

segment-ation

CNV region defining , detecting

CNV defining

normalization

Convert intensities to LRRs- Reference: ex. NA10851

Comparison(case vs. control)

DB 또는 논문을통해 이전 연구에서 보고된 것인지 확인!

Page 26: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 통계 분석 도구 (Tool)

26

Freely available

ITALICS, CGHcall, cghMCR, MANOR, SMAP, CNVDetector, GIMscan

CAPweb, CGHmix, CGHScan, CNIT, CNVphaser, RjaCGH, cghFLasso

ImaGene CGH, ISACGH, smoothseg, STAC, VAMP, CNAT, GIM, CNAG, CGHFusion,

SiDCoN, SNPchip, dChip, CLAC, CGHseg, HaarSeg, PLASQ, AdaCGH, Rankcopy, CARAT,

PennCNV, QuantiSNP, GLAD, CNVTools, CNVFinder, DNAcopy, SW-array, MVA-

package, GEMCA, CNRLMM, GADA, BioHMM, HMM, CGHPro, BirdSuite, ICE, ACE,

Wavelet 등외 다수

Commercial HelixTree, NEXUS, Partek 등

Vendor provided(commercial)

NimbleScan, Illumina BeadStudio, Agilent CGH Analytics

Page 27: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 통계 분석 방법 – Nexus 사용예

27

Fig 1. DNA Copy Number Analysis Workflow (ref. Zhiwei Che, BioDiscovery, Inc., 2009)

Page 28: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

28

Load Image and Perform QC

Page 29: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

29

Save & Visualize Data

Page 30: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

30

Identification of Copy Number Change Events

Experiment results in a table:

Probe

Location

Expr. Control Ratio Log Ratio

Chr1:10-20 150 100 3/2 +0.57

Chr1:50-60 300 200 3/2 +0.57

Chr1:70-90 500 500 2/2 0

Chr1:100-120 60 60 2/2 0

Chr1:250-300 500 1000 1/2 -1

0

+0.57

-1

Page 31: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

31

Data loading(100명의 cell file을 loading하는데 <5시간 정도 소요)

Page 32: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

32

Page 33: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

33

Data loading

Page 34: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

34

Data loading

Page 35: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

35

Data loading

Page 36: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

2Mb deletion

Chromosome 5

36

Page 37: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

37

Settings Window: Analysis Panel

• Significance Threshold

• Max Contiguous Probe Spacing

• Min Number of Probes per Segment

• High gain

• Gain

• Loss

• Big Loss

Page 38: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

38

Significance Threshold

• For adjusting the sensitivity of the calling algorithm

• Smaller number = more stringent before creating a

new clusters.

• Significance threshold should be set based on

expected noise.

• For oligo arrays, we recommend using 1x10-6

• For BAC array (lower density), we recommend 5x10-5

Page 39: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

39

Effect of Significance Threshold

Significance = 1E-6

Page 40: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

High gain – Two or more copy gain

Gain – Single copy gain

Loss – Hemizygous loss

Big Loss – Homozygous loss

40

Fig 2. Result example from Nexus(ref. Zhiwei Che, BioDiscovery, Inc., 2009)

Page 41: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

41

A lung cancer cell line vs matched normal lymphoblast,from Nannya et al Cancer Res 2005;65:6071-6079

Many tumors have gross CN changes

Page 42: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

CNV 통계 분석의 현재 수준

• CNV를 이용해서 association study하기는 아직 어려움.

42

Human Molecular Genetics (Impact Factor 7.249)

Page 43: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

참고문헌

http://g-was.blogspot.com/2009/11/cnv-copy-number-variation.html, 2009

문상훈, 2009 Asian Institute in Statistical Genetics and Genomics, 2009

서을주, Copy number variants (CNV)의 분석 방법

Wentian Li, Copy Number Variations: a new type of genetic marker in whole-genome association studies, 2008

43

Page 44: Copy number variation - MGELmgel.snu.ac.kr/B/dnload.inc.php?fn=2389_f1_802.pdf... · The DNA encodes 30,000 genes It was generally thought that genes were almost always present in

감사합니다.

44