63
SNP Resources: Finding SNPs, SNP Resources: Finding SNPs, Databases and Data Extraction Databases and Data Extraction Debbie Nickerson Debbie Nickerson [email protected] du SeattleSNPs SeattleSNPs

SNP Resources: Finding SNPs, Databases and Data Extraction

  • Upload
    tayte

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

SNP Resources: Finding SNPs, Databases and Data Extraction. Debbie Nickerson [email protected] SeattleSNPs. Complex inheritance/disease. Many Other Genes. Variant Gene. Environment. Disease. DiabetesHeart DiseaseSchizophrenia ObesityMultiple SclerosisCeliac Disease - PowerPoint PPT Presentation

Citation preview

Page 1: SNP Resources:  Finding SNPs, Databases and Data Extraction

SNP Resources: Finding SNPs,SNP Resources: Finding SNPs,Databases and Data ExtractionDatabases and Data Extraction

Debbie NickersonDebbie [email protected]

SeattleSNPsSeattleSNPs

Page 2: SNP Resources:  Finding SNPs, Databases and Data Extraction

Complex inheritance/disease

Variant Gene

Disease

Diabetes Heart Disease SchizophreniaObesity Multiple Sclerosis Celiac DiseaseCancer Asthma Autism

Many OtherGenes

Environment

Two hypotheses:Two hypotheses:1- common disease/common variant?1- common disease/common variant?2- common disease/many rare variants?2- common disease/many rare variants?

Page 3: SNP Resources:  Finding SNPs, Databases and Data Extraction

Copy-Number Variants

Genomic VariationFr

equ

ency

Size

Single Nucleotide Polymorphisms

Small indels

cytogenetic

structural variation

duplications

deletionsinsertions

inversions

Human Genetic Variation

• Gene-rich, eg immune response, drug metabolism

• Abundant1 bp 1 chr

Page 4: SNP Resources:  Finding SNPs, Databases and Data Extraction

Total sequence variation in humans

Population size: 6x109 (diploid)

Mutation rate: 2x10–8 per bp per generation

Expected “hits”: 240 for each bp

Every variant compatible with life exists in the population

BUT: Most are vanishingly rare

Compare 2 haploid genomes: 1 SNP per 1331 bp*

*The International SNP Map Working Group, Nature 409:928 - 933 (2001)

Page 5: SNP Resources:  Finding SNPs, Databases and Data Extraction

Building Maps of Single Nucleotide Polymorphisms

(SNPs)

ATTCGGCATGAAATTCGGGATGAA

Developed in two overlapping phases:Developed in two overlapping phases:1)1) SNP DiscoverySNP Discovery2)2) SNP GenotypingSNP Genotyping

Page 6: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: Sequence-based SNP Mining

RANDOM Sequence Overlap - SNP Discovery

GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC

Genomic Genomic

RRSRRSLibraryLibrary

ShotgunShotgunOverlapOverlap

BACBACLibraryLibrary

BACBACOverlapOverlap

DNASEQUENCING

mRNAmRNA

cDNAcDNALibraryLibrary

ESTESTOverlapOverlap

RandomRandomShotgunShotgun

Align toAlign toReferenceReference

> 11 Million SNPs

G

C

Validated - 5.6 MILLON SNPS

Page 7: SNP Resources:  Finding SNPs, Databases and Data Extraction

Increasing Sample Size Improves SNP DiscoveryIncreasing Sample Size Improves SNP Discovery

GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC{{2 chromosomes2 chromosomes

0.0 0.2 0.3 0.4 0.50.10.0

0.5

1.0

Minor Allele Frequency (MAF)

2

88

4824

16

8

96

HapMapHapMapBased on Based on ~ 6-8 ~ 6-8 ChromosomesChromosomesrandomrandom

Candidate Candidate GeneGeneSequencingSequencing

New 1000 Genome

Program

Fra

ctio

n o

f S

NP

s D

isc

ove

red

Page 8: SNP Resources:  Finding SNPs, Databases and Data Extraction

Genotype - Phenotype Studies

What SNPs are available?

How do I find the common SNPs?How do I find the common SNPs?

What is the validation/quality of the SNPs?What is the validation/quality of the SNPs?

Are these SNPs informative in my population/samples?Are these SNPs informative in my population/samples?

What can I download information?What can I download information?

How do I pick the “best” SNPs? - Dana CrawfordHow do I pick the “best” SNPs? - Dana Crawford

You have candidate gene/region/pathway of interest and samples ready to study:

Page 9: SNP Resources:  Finding SNPs, Databases and Data Extraction

Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization

• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC

• dbSNP reference SNP # (rs #)• Where is the SNP mapped? Exon, promoter, UTR, etc• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>5%), rare• Are other SNPs associated - redundant?• Is genotyping data for control populations available?

Page 10: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: Databases and Extraction

How do I find and download SNP data for analysis/genotyping?

1. SeattleSNPs - Candidate gene website

2. Other web applications 2. Other web applications GVSGVS

HapMap Genome BrowserHapMap Genome Browser

3. Entrez Gene- dbSNP- Entrez SNP

Page 11: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction

How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?

1. SeattleSNPs - Candidate gene website1. SeattleSNPs - Candidate gene website

2. Other web applications 2. Other web applications GVSGVS

HapMap Genome BrowserHapMap Genome Browser

3. Entrez Gene3. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP

Page 12: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: Seattle SNPs Candidate GenesFinding SNPs: Seattle SNPs Candidate Genes

pga.gs.washington.edupga.gs.washington.edu

Page 13: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: SeattleSNPs Candidate Genes Finding SNPs: SeattleSNPs Candidate Genes

Example - PCSK9

Page 14: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: SeattleSNPs Candidate Genes Finding SNPs: SeattleSNPs Candidate Genes

Page 15: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: SeattleSNPs Candidate Genes Finding SNPs: SeattleSNPs Candidate Genes

Page 16: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 17: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 18: SNP Resources:  Finding SNPs, Databases and Data Extraction

AD

ED

Page 19: SNP Resources:  Finding SNPs, Databases and Data Extraction

SNP_pos <tab> Ind_ID <tab> allele1 <tab> allele2SNP_pos <tab> Ind_ID <tab> allele1 <tab> allele2Repeat for all individualsRepeat for all individualsRepeat for next SNPRepeat for next SNP

Page 20: SNP Resources:  Finding SNPs, Databases and Data Extraction

PolyPhen - PolyPhen - PolyPolymorphism morphism PhenPhenotypingotypingStructural protein characteristics and evolutionary comparisonStructural protein characteristics and evolutionary comparison

SIFT = Sorting Intolerant From TolerantSIFT = Sorting Intolerant From TolerantEvolutionary comparison of non-synonymous SNPsEvolutionary comparison of non-synonymous SNPs

Page 21: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: SeattleSNPs Candidate GenesFinding SNPs: SeattleSNPs Candidate Genes

pga.gs.washington.edupga.gs.washington.edu

Page 22: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction

How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?

1. SeattleSNPs - Candidate gene website1. SeattleSNPs - Candidate gene website

2. Other web applications 2. Other web applications GVSGVS

HapMap Genome BrowserHapMap Genome Browser

3. Entrez Gene3. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP

Page 23: SNP Resources:  Finding SNPs, Databases and Data Extraction

Provides rapid analysis of 4.5 million Provides rapid analysis of 4.5 million genotypedgenotyped SNPs from SNPs from dbSNP and the HapMap dbSNP and the HapMap

Mapped to human genome build 36 (hg18)Mapped to human genome build 36 (hg18) Displays genotype data in text and image formatsDisplays genotype data in text and image formats

Displays tagSNPs or clusters of informative SNPs in text and Displays tagSNPs or clusters of informative SNPs in text and image formatsimage formats

Displays linkage disequilibrium (LD) in text and image Displays linkage disequilibrium (LD) in text and image formatsformats

Online tutorial provided at OpenHelix.comOnline tutorial provided at OpenHelix.com

GVS: Genome Variation Serverhttp://gvs.gs.washington.edu/GVS/http://gvs.gs.washington.edu/GVS/

Page 24: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation Server

http://gvs.gs.washington.edu/GVS/http://gvs.gs.washington.edu/GVS/

LDLR

Page 25: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 26: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation Server

Page 27: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation Server

•Table of genotypes•Image of visual genotypes

Page 28: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation ServerGenotypes displayed in prettybase table and visual genotype graphicGenotypes displayed in prettybase table and visual genotype graphic

Page 29: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation Server

Page 30: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation ServerDense genotypes around a candidate gene can be integratedDense genotypes around a candidate gene can be integratedwith broader HapMap genotypeswith broader HapMap genotypes

= Seattle \SNP discovery (1/200 bp)

= HapMap SNPs (~1/1000 bp)

High Density Genic Coverage (SeattleSNPs)

Low Density Genome Coverage (HapMap)

Page 31: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation ServerDense genotypes around a candidate gene can be integratedDense genotypes around a candidate gene can be integratedwith lower-density HapMap genotypeswith lower-density HapMap genotypes

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Page 32: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation Server

CombinedCombined

CommonCommon

A.A. Common samples-Common samples-combined variationscombined variations

B. Combined samples- B. Combined samples- common variationscommon variations

C.C. Combined samples- Combined samples- combined variationscombined variations

Page 33: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation Server

A.A. Common samples- combined variationsCommon samples- combined variations

Combined variationsCombined variations

-Com

mon

sam

ples

--C

omm

on s

ampl

es-

Page 34: SNP Resources:  Finding SNPs, Databases and Data Extraction

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

GVS: Genome Variation ServerB. Combined samples- common variationsB. Combined samples- common variations

-Com

bine

d sa

mpl

es-

-Com

bine

d sa

mpl

es-

Hap

Map

Hap

Map

Se

att

leS

NP

s

Page 35: SNP Resources:  Finding SNPs, Databases and Data Extraction

GVS: Genome Variation Server

C. Combined samples- combined variationsC. Combined samples- combined variations-C

ombi

ned

sam

ples

--C

ombi

ned

sam

ples

-

Combined variationsCombined variations

Page 36: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 37: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 38: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction

How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?

1. SeattleSNPs - Candidate gene website1. SeattleSNPs - Candidate gene website

2. Other web applications 2. Other web applications GVSGVS

HapMap Genome BrowserHapMap Genome Browser

3. Entrez Gene3. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP

Page 39: SNP Resources:  Finding SNPs, Databases and Data Extraction

www.hapmap.orgwww.hapmap.org

Page 40: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser

Page 41: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser

1.1. HapMap data sets are useful because individual genotype HapMap data sets are useful because individual genotype data in deeply sampled populations can be used to data in deeply sampled populations can be used to determine optimal genotyping strategies (tagSNPs) or determine optimal genotyping strategies (tagSNPs) or perform population genetic analyses (linkage perform population genetic analyses (linkage disequilbrium)disequilbrium)

2.2. Data are specific to the HapMap project (not all dbSNP)Data are specific to the HapMap project (not all dbSNP) HapMap data is available in dbSNPHapMap data is available in dbSNP

3.3. Visualization of data and direct access to Visualization of data and direct access to SNP data, individual genotypes, and LD analysisSNP data, individual genotypes, and LD analysis

possible in the browser and formats can be saved possible in the browser and formats can be saved for Haploviewfor Haploview

Page 42: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction

How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?

1. SeattleSNPs - Candidate gene website1. SeattleSNPs - Candidate gene website

2. Other web applications 2. Other web applications GVSGVS

HapMap Genome BrowserHapMap Genome Browser

3. Entrez Gene3. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP

Page 43: SNP Resources:  Finding SNPs, Databases and Data Extraction

NCBI - Database ResourceNCBI - Database Resource

www.ncbi.nlm.nih.gov

PCSK9

Page 44: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs using NCBI databasesFinding SNPs using NCBI databaseshttp://www.ncbi.nlm.nih.gov/

Page 45: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 46: SNP Resources:  Finding SNPs, Databases and Data Extraction

DefaultDefaultView cSNPsView cSNPs

Page 47: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs using NCBI databasesFinding SNPs using NCBI databaseshttp://www.ncbi.nlm.nih.gov/

Page 48: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 49: SNP Resources:  Finding SNPs, Databases and Data Extraction

PCSK9

Page 50: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 51: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 52: SNP Resources:  Finding SNPs, Databases and Data Extraction

Finding SNPs - Entrez SNP SummaryFinding SNPs - Entrez SNP Summary

1.1. dbSNP is useful for investigating detailed information on a dbSNP is useful for investigating detailed information on a small number SNPs - and it’s good for a picture of the genesmall number SNPs - and it’s good for a picture of the gene

2.2. Entrez SNP is a direct, fast database for querying SNP dataEntrez SNP is a direct, fast database for querying SNP data

3.3. Data from Entrez SNP can be retrieved in batches for many SNPsData from Entrez SNP can be retrieved in batches for many SNPs

4.4. Entrez SNP data can be “limited” to specific subsets of SNPsEntrez SNP data can be “limited” to specific subsets of SNPsand formatted in plain text for easy parsing and manipulationand formatted in plain text for easy parsing and manipulation

5.5. More detailed queries can be formed using specific “field tags” More detailed queries can be formed using specific “field tags” for retrieving SNP data for retrieving SNP data

Page 53: SNP Resources:  Finding SNPs, Databases and Data Extraction

SummarySummary Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction

Reviewing candidate genes using views and resources in Reviewing candidate genes using views and resources in

- SeattleSNPs- SeattleSNPs

Integration of dense, gene-centric SNP maps with genomic Integration of dense, gene-centric SNP maps with genomic HapMap SNPs HapMap SNPs

- GVS - GVS

HapMap viewerHapMap viewer

NCBI databases through Entrez portal-Entrez Gene, dbSNP, Entrez SNP-many ways to retrieve and format data

Page 54: SNP Resources:  Finding SNPs, Databases and Data Extraction

Genome Variation Server: GVS

GWAS AsthmaMoffatt et al Nature 448: 470-473, 2007

Page 55: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 56: SNP Resources:  Finding SNPs, Databases and Data Extraction
Page 57: SNP Resources:  Finding SNPs, Databases and Data Extraction

New Variation to Consider - Structural Variation

Types of Structural Variants

Insertions/DeletionsInversions DuplicationsTranslocations

Size:Large-scale (>100 kb) intermediate-scale (500 bp–100 kb)Fine-scale (1–500 bp) More than 10% of

the genome sequence

Nature 447: 161-165, 2007

Page 58: SNP Resources:  Finding SNPs, Databases and Data Extraction

Detection of Outliers of the Distribution

X-linked SNPX-linked SNP Unknown SNPUnknown SNP

Page 59: SNP Resources:  Finding SNPs, Databases and Data Extraction

Genetic Strategy - New Insights

allele frequency HIGHLOW

effectsize

WEAK

STRONG

LINKAGE ASSOCIATION

??

Ardlie, Kruglyak & Seielstad (2002) Nat. Genet. Rev. 3: 299-309

Common DiseaseMany Rare Variants

Page 60: SNP Resources:  Finding SNPs, Databases and Data Extraction

High Density Lipoprotein (HDL)

Sequencing Known Candidate Genes for Functional VariationFrom Individuals at the Tails of the Trait Distribution

Low HDL High HDLInd

ivid

uals

Page 61: SNP Resources:  Finding SNPs, Databases and Data Extraction

ABCA1 and HDL-C

• Observed excess of rare, nonsynonymous variants in low HDL-C samples at ABCA1

• Demonstrated functional relevance in cell culture

–Cohen et al, Science 305, 869-872, 2004

Many examples emerging

Common Disease Rare Variants

Page 62: SNP Resources:  Finding SNPs, Databases and Data Extraction

Personalized Human Genome Sequencing

Solexa - an example

Page 63: SNP Resources:  Finding SNPs, Databases and Data Extraction