57
Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA [email protected]

Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA [email protected]

Embed Size (px)

Citation preview

Page 1: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Genetics in Epidemiology

Nazarbayev UniversityJuly 2012

Jan Dorman, PhDUniversity of PittsburghPittsburgh, PA, [email protected]

Page 2: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Genetics in Epidemiology

• Is important because– It focuses on heritable & non-modifiable

determinants of disease– It allows examination of gene-gene & gene-

environment interactions– It can contribute to personalized medicine

• Is being transformed because– Human Genome Project is complete– Genetic variation can be now examined across the

entire genome at a very low cost– Contribution of GWAS has been enormous in terms

of identifying disease-susceptibility genes

Page 3: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Human Genome Project

• February 2010 marked the 10th anniversary of the completion of the human genome project

• Initial sequence was finished early because of advancements in genome sequence technology

• Resulted in drastically reduced labor & delivery costs

Page 4: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Human Genome Sequencing Costs

• 2000 – Human Genome Project– $3 billion

• 2007 – James Watson– $2 million

• 2009 – Illumina & Helicos– $50,000

• 2010 – Illumina HiSeq– $10,000

• 2014 – Multiple companies– $1,000

Page 5: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu
Page 6: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Genetics in Epidemiology

• Is there evidence of familial aggregation of the disorder (phenotype)?– Is a positive family history an independent risk

factor for the disorder?• For many chronic disorders, a positive family

history is associated with odds ratios between 2-6

• Is there evidence of heritability?– A heritability of 50% indicates that ~ ½ of the

variation in disease risk in a population is due to genetics

Page 7: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

J Intern Med 2008;263:16

Page 8: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Candidate Gene Approach

• Are there potential candidate genes?– Genes that are selected based on known

biological, physiological, or functional relevance to the phenotype under investigation

– Approach is limited by its reliance on existing knowledge about the biology of disease

– Associations may be population-specific

• E.g., type 2 diabetes– Genes encoding molecules known to primarily

influence pancreatic β-cell or insulin action• ABCC8 (sulphonylurea receptor), INS, INSR, etc.

PLOS Bio 2003;1:41

Page 9: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Alternative Approach

• Genome-wide association studies (GWAS) – Hypothesis: common genetic variants (>5%) ; common

diseases (traits)• Limited number of variants, each with a small effect• No a priori hypotheses• Power to identify rare variants (1-5%) is limited

– First publication was in 2005• Complement factor H & age-related macular degeneration

– Require• Large, well-characterized populations• Genotyping across the entire genome• Sophisticated data analysis – collaborate on this!!

Page 10: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Monogenetic vs. Common Disorders

Page 11: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

GWAS

• 2 tiered approach– 1st tier: genotyping identifies the ‘discovery set’– 2nd tier: discovery set genotyped in another population

• Replication is a requirement for publication

– 3rd tier: rule out false positives & false negatives• Requires consortia

• Possible because– High-density genotype platforms

• By 2007 – chips contained 500,000 – 1,000,000 markers

– DNA samples were available from well-characterized epidemiological cohorts

Page 12: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

GWAS Example

NEJM 2010; 362:166

Page 13: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

GWAS Example

NEJM 2010; 362:166

Page 14: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

GWAS Example

NEJM 2010; 362:166

Page 15: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

GWAS

• Have identified novel gene-disease (trait) associations– Most alleles are common (>5%)– Most have small effect sizes (OR ~1.5)

• Are providing insights into pathways of complex diseases

Page 16: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Published Genome-Wide Associationspublished for 249 traits

NHGRI GWA Catalogwww.genome.gov/GWAStudies

Page 17: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Genetics Review

Page 18: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Anatomy of the Cell

Page 19: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Chromosomes, Genes & DNA

• Somatic cells are diploid - 46 chromosomes– 22 pairs autosomes; 1 pair sex chromosomes

• Each pair of autosomes is homologous– Contains the same genes in the same order– 1 is maternal, the other is paternal

• Chromosome are composed of deoxyribonucleic acid (DNA)– Genome contains 3 billion base pairs (haploid)– ~1% encode proteins

• Genes are located on chromosomes

Page 20: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Human Karyogram

Page 21: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Figure of a Chromosome

Page 22: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

DNA Double Helix

Page 23: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Base Pairs of a Double Helix

T

A

C

G

Page 24: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Structure of a Gene

A gene is a functional unit that includes introns, exons enhancer & promoter sequences & untranslated sequences at the 5’ & 3’ ends

Page 25: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Transcription Results in mRNA

Page 26: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Primary Transcript

Page 27: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

mRNA Processing

Page 28: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

From Genes to Proteins via mRNA

• Proteins consist of 1+ polypeptide chains• Polypeptides chains are made of amino acids• There are 20 amino acids

– Their order in is determined by the mRNA sequence read in triplet

• Genetic code– 64 combinations of 3 bases called codons– 3 are stop codons (UAA, UGA, UAG)

• Genetic code is degenerate• Genetic code is universal

Page 29: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Genetic Code

Page 30: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

mRNA Determines AA Sequence

Page 31: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Translation is Protein Synthesis

Page 32: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Post-Translation Modifications

Page 33: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Advancements in Biotechnology

Page 34: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Original Method for DNA Sequencing

Page 35: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Polymerase Chain Reaction (PCR)

• Revolutionized molecular genetics• Exploits the in vivo processes of DNA

replication to copy short DNA fragments in vitro within a few hours

• Exponential increase of target DNA sequences

• Highly sensitive – need small amount of template DNA

• DNA ‘photocopier’

Page 36: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

PCR - Cycle 1

5’ A C G T T A C C G T G A A C G T C T T A 3’

3’ T G C A A T G G C A C T T G C A G A A T 5’

Denaturation, ~30 seconds

H bonds dissolve at 95oC

Page 37: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

PCR - Cycle 1

5’ A C G T T A C C G T G A A C G T C T T A 3’

3’ T G C A A T G G C A C T T G C A G A A T 5’

3’ C A G A AT 5’

5’ A C G T T A 3’

Anneal primers, ~30 seconds at 35-65oC

Temperature determined by sequence / length

Page 38: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

PCR - Cycle 1

5’ A C G T T A C C G T G A A C G T C T T A 3’

3’ T G C A A T G G C A C T T G C A G A A T 5’

T G C A A T G G C A C T T G C A G A A T 5’

5’ A C G T T A C C G T G A A C G T C T T A

Extension of Primers, ~30 seconds at 70-75oC

Taq polymerase - thermostable

Page 39: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Post-Genome Era

Page 40: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Human Genetic Variation

• Single nucleotide polymorphisms (SNPs)• Tandem repeat Sequences

– Microsatellites (<8 bp)– Minisatellites (VNTRs; 8-100 bp)

• Copy number variants (CNVs; 1Kb – 1Mb)• Insertions – deletions (indels; 100bp – 1Kb)

• Note: size limitations are arbitrary – no biological basis & definitions are not consistent across studies

Page 41: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

SNPs

• ~10 million SNPs in human genome & counting• Most common type of genetic variation• 2 alleles; e.g., A → T • Occurs across the entire genome & in stable

regions• Many SNPs are in linkage disequilibrium

– SNPs close together are more likely to travel together in a block than SNPs far apart

– Can use 1 ‘tagging’ SNP per block – cost effective

Page 42: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Linkage Disequilibrium

Haplotype Block

NEJM 2007; 356:1094

Page 43: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

SNPs ‘Tag’ Haplotype Blocks

NEJM 2007; 356:1094

Page 44: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

International HapMap

• Emerged as next logical step after sequencing human genome

• Goal was to create a public genome-wide database of common genetic variants

• Genotyped SNPs from 270 samples from: – Nigeria, Utah, Han Chinese, Japanese

• Phase I– Typed 1 million common SNPs (>5%) to

characterize LD patterns

• Phase II– Typed 3 million rare SNPs (1-5%)

Page 45: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

DNA Microarray

Used to genotype 500,000 – 1+ million SNPs

Page 46: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

International HapMap

• Where are the SNPs?– 12% occur in protein coding regions– 8% occur in gene regulatory regions– 40% occur in non-coding introns– 40% occur in intergenic sequences– Regions of high linkage disequilibrium are similar

across populations

• HapMap was instrumental in facilitating GWAS

Page 47: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Tandem Repeat Sequences

• 100,000+ TRSs in human genome• Microsatellites (VNTRs)

– Repeat units (8 – 100 bp)

• Minisatellites – Repeat units (2 – 8 bp)– Eg., CAGCAGCAGCAGCAGCGACAG

• More than 200 diseases genes indentified – E.g., Huntington’s disease, Fragile X syndrome

Page 48: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Copy Number Variants

• Size is 1 Kb to 1 Mb– Duplications or deletions

• Less is known about CNV– Term was introduced in 2004

• Are ubiquitous & reflect 12% of human genome• May span multiple genes• May change gene dosage or effect transcription

and translation• Are creating a CNV map along with HapMap• Associated with autism, schizophrenia, lupus,

Crohn’s disease, rheumatoid arthritis

Page 49: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Copy Number Variants

Page 50: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Indels

• Insertions & deletions– Size 100 bp to 1 Kb– Millions in genome– Introduced in 2006

• Phenotype may depend on gene dosage• May occur within genes or in promoter• Also creating an indel map

Page 51: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Consequences of Genetic Variation

• No change – In a non-coding region– In a coding region - genetic code is degenerate

• Change in 1 amino acid of a protein• Change in multiple amino acids of a protein• A truncated protein• Change in gene expression

– In a regulatory region or splice site

• Next generation GWAS will be based on markers other than SNPs– Tandem repeats, CNV, indels

Page 52: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Genetic Variation Databases

Database Content Address

dbSNP SNPs covering the human genome

http://www.ncbi.nlm.nih.gov/projects/SNPs

HapMap Catalog of variants from HapMap Project

http://hapmap.org

1000 Genome Project Extension of HapMap – aim to catalog 95% of variants with 1% freq

www.1000genomes.org

UCSC Genome Bioinformatics

Reference human genome sequence with annotation

http://genome.ucsc.edu

Ensembl Genome browser, annotation, comparative genomics

http://www.ensembl.org/index.html

Page 53: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Genetic Variation Databases

Database Content Address

GeneCards Database of human genes linked to relevant databases

http://www.genecards.org

PharmGKB SNPs involved in drug metabolism

http://www.pharmgkb.org

DGV Database of Genomic Variants, including CNV

http://projects.tcag.ca/variation

SCAN SNP & CNV annotation based on gene function & expression

http://www.scandb.org/newinterface

OMIM Online Mendelian Inheritance in Man – over 12,000 genes

http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim

Page 54: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Genetic Variation Databases

Database Content Address

HuGE navigator Human genome epidemiology knowledge base

http://hugenavigator.net/HuGENavigator/home.do

Best Pract Res Clin Endo Metab, 2012. 26:119.

Page 55: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Collecting DNA

• Sources of DNA– Blood samples – Buccal brushes – Saliva samples – Dried blood spots

• Depends on– Conditions at time of collection– Resources available to process samples– What other biological samples will be collected– Long & short term storage– Quality control

Page 56: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Saliva vs. Blood Samples

• Considerations– Lower cost– More convenient & acceptable to patients– Increases compliance– Lower mean yield of DNA– But quality is comparable– No difference in success from high throughput

genotyping

Page 57: Genetics in Epidemiology Nazarbayev University July 2012 Jan Dorman, PhD University of Pittsburgh Pittsburgh, PA, USA jsd@pitt.edu

Other Considerations

• Informed consent- What analysis can be performed now?- What analysis can be performed in the future?- Who has control of the specimen?- Do you need to ‘re-consent’ the participants due

to IRB changes?- Will you inform participants of results?