Upload
tayte
View
52
Download
0
Tags:
Embed Size (px)
DESCRIPTION
SNP Resources: Finding SNPs, Databases and Data Extraction. Debbie Nickerson [email protected] SeattleSNPs. Complex inheritance/disease. Many Other Genes. Variant Gene. Environment. Disease. DiabetesHeart DiseaseSchizophrenia ObesityMultiple SclerosisCeliac Disease - PowerPoint PPT Presentation
Citation preview
SNP Resources: Finding SNPs,SNP Resources: Finding SNPs,Databases and Data ExtractionDatabases and Data Extraction
Debbie NickersonDebbie [email protected]
SeattleSNPsSeattleSNPs
Complex inheritance/disease
Variant Gene
Disease
Diabetes Heart Disease SchizophreniaObesity Multiple Sclerosis Celiac DiseaseCancer Asthma Autism
Many OtherGenes
Environment
Two hypotheses:Two hypotheses:1- common disease/common variant?1- common disease/common variant?2- common disease/many rare variants?2- common disease/many rare variants?
Copy-Number Variants
Genomic VariationFr
equ
ency
Size
Single Nucleotide Polymorphisms
Small indels
cytogenetic
structural variation
duplications
deletionsinsertions
inversions
Human Genetic Variation
• Gene-rich, eg immune response, drug metabolism
• Abundant1 bp 1 chr
Total sequence variation in humans
Population size: 6x109 (diploid)
Mutation rate: 2x10–8 per bp per generation
Expected “hits”: 240 for each bp
Every variant compatible with life exists in the population
BUT: Most are vanishingly rare
Compare 2 haploid genomes: 1 SNP per 1331 bp*
*The International SNP Map Working Group, Nature 409:928 - 933 (2001)
Building Maps of Single Nucleotide Polymorphisms
(SNPs)
ATTCGGCATGAAATTCGGGATGAA
Developed in two overlapping phases:Developed in two overlapping phases:1)1) SNP DiscoverySNP Discovery2)2) SNP GenotypingSNP Genotyping
Finding SNPs: Sequence-based SNP Mining
RANDOM Sequence Overlap - SNP Discovery
GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC
Genomic Genomic
RRSRRSLibraryLibrary
ShotgunShotgunOverlapOverlap
BACBACLibraryLibrary
BACBACOverlapOverlap
DNASEQUENCING
mRNAmRNA
cDNAcDNALibraryLibrary
ESTESTOverlapOverlap
RandomRandomShotgunShotgun
Align toAlign toReferenceReference
> 11 Million SNPs
G
C
Validated - 5.6 MILLON SNPS
Increasing Sample Size Improves SNP DiscoveryIncreasing Sample Size Improves SNP Discovery
GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC{{2 chromosomes2 chromosomes
0.0 0.2 0.3 0.4 0.50.10.0
0.5
1.0
Minor Allele Frequency (MAF)
2
88
4824
16
8
96
HapMapHapMapBased on Based on ~ 6-8 ~ 6-8 ChromosomesChromosomesrandomrandom
Candidate Candidate GeneGeneSequencingSequencing
New 1000 Genome
Program
Fra
ctio
n o
f S
NP
s D
isc
ove
red
Genotype - Phenotype Studies
What SNPs are available?
How do I find the common SNPs?How do I find the common SNPs?
What is the validation/quality of the SNPs?What is the validation/quality of the SNPs?
Are these SNPs informative in my population/samples?Are these SNPs informative in my population/samples?
What can I download information?What can I download information?
How do I pick the “best” SNPs? - Dana CrawfordHow do I pick the “best” SNPs? - Dana Crawford
You have candidate gene/region/pathway of interest and samples ready to study:
Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization
• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC
• dbSNP reference SNP # (rs #)• Where is the SNP mapped? Exon, promoter, UTR, etc• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>5%), rare• Are other SNPs associated - redundant?• Is genotyping data for control populations available?
Finding SNPs: Databases and Extraction
How do I find and download SNP data for analysis/genotyping?
1. SeattleSNPs - Candidate gene website
2. Other web applications 2. Other web applications GVSGVS
HapMap Genome BrowserHapMap Genome Browser
3. Entrez Gene- dbSNP- Entrez SNP
Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction
How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?
1. SeattleSNPs - Candidate gene website1. SeattleSNPs - Candidate gene website
2. Other web applications 2. Other web applications GVSGVS
HapMap Genome BrowserHapMap Genome Browser
3. Entrez Gene3. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP
Finding SNPs: Seattle SNPs Candidate GenesFinding SNPs: Seattle SNPs Candidate Genes
pga.gs.washington.edupga.gs.washington.edu
Finding SNPs: SeattleSNPs Candidate Genes Finding SNPs: SeattleSNPs Candidate Genes
Example - PCSK9
Finding SNPs: SeattleSNPs Candidate Genes Finding SNPs: SeattleSNPs Candidate Genes
Finding SNPs: SeattleSNPs Candidate Genes Finding SNPs: SeattleSNPs Candidate Genes
AD
ED
SNP_pos <tab> Ind_ID <tab> allele1 <tab> allele2SNP_pos <tab> Ind_ID <tab> allele1 <tab> allele2Repeat for all individualsRepeat for all individualsRepeat for next SNPRepeat for next SNP
PolyPhen - PolyPhen - PolyPolymorphism morphism PhenPhenotypingotypingStructural protein characteristics and evolutionary comparisonStructural protein characteristics and evolutionary comparison
SIFT = Sorting Intolerant From TolerantSIFT = Sorting Intolerant From TolerantEvolutionary comparison of non-synonymous SNPsEvolutionary comparison of non-synonymous SNPs
Finding SNPs: SeattleSNPs Candidate GenesFinding SNPs: SeattleSNPs Candidate Genes
pga.gs.washington.edupga.gs.washington.edu
Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction
How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?
1. SeattleSNPs - Candidate gene website1. SeattleSNPs - Candidate gene website
2. Other web applications 2. Other web applications GVSGVS
HapMap Genome BrowserHapMap Genome Browser
3. Entrez Gene3. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP
Provides rapid analysis of 4.5 million Provides rapid analysis of 4.5 million genotypedgenotyped SNPs from SNPs from dbSNP and the HapMap dbSNP and the HapMap
Mapped to human genome build 36 (hg18)Mapped to human genome build 36 (hg18) Displays genotype data in text and image formatsDisplays genotype data in text and image formats
Displays tagSNPs or clusters of informative SNPs in text and Displays tagSNPs or clusters of informative SNPs in text and image formatsimage formats
Displays linkage disequilibrium (LD) in text and image Displays linkage disequilibrium (LD) in text and image formatsformats
Online tutorial provided at OpenHelix.comOnline tutorial provided at OpenHelix.com
GVS: Genome Variation Serverhttp://gvs.gs.washington.edu/GVS/http://gvs.gs.washington.edu/GVS/
GVS: Genome Variation Server
http://gvs.gs.washington.edu/GVS/http://gvs.gs.washington.edu/GVS/
LDLR
GVS: Genome Variation Server
GVS: Genome Variation Server
•Table of genotypes•Image of visual genotypes
GVS: Genome Variation ServerGenotypes displayed in prettybase table and visual genotype graphicGenotypes displayed in prettybase table and visual genotype graphic
GVS: Genome Variation Server
GVS: Genome Variation ServerDense genotypes around a candidate gene can be integratedDense genotypes around a candidate gene can be integratedwith broader HapMap genotypeswith broader HapMap genotypes
= Seattle \SNP discovery (1/200 bp)
= HapMap SNPs (~1/1000 bp)
High Density Genic Coverage (SeattleSNPs)
Low Density Genome Coverage (HapMap)
GVS: Genome Variation ServerDense genotypes around a candidate gene can be integratedDense genotypes around a candidate gene can be integratedwith lower-density HapMap genotypeswith lower-density HapMap genotypes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
GVS: Genome Variation Server
CombinedCombined
CommonCommon
A.A. Common samples-Common samples-combined variationscombined variations
B. Combined samples- B. Combined samples- common variationscommon variations
C.C. Combined samples- Combined samples- combined variationscombined variations
GVS: Genome Variation Server
A.A. Common samples- combined variationsCommon samples- combined variations
Combined variationsCombined variations
-Com
mon
sam
ples
--C
omm
on s
ampl
es-
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
GVS: Genome Variation ServerB. Combined samples- common variationsB. Combined samples- common variations
-Com
bine
d sa
mpl
es-
-Com
bine
d sa
mpl
es-
Hap
Map
Hap
Map
Se
att
leS
NP
s
GVS: Genome Variation Server
C. Combined samples- combined variationsC. Combined samples- combined variations-C
ombi
ned
sam
ples
--C
ombi
ned
sam
ples
-
Combined variationsCombined variations
Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction
How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?
1. SeattleSNPs - Candidate gene website1. SeattleSNPs - Candidate gene website
2. Other web applications 2. Other web applications GVSGVS
HapMap Genome BrowserHapMap Genome Browser
3. Entrez Gene3. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP
www.hapmap.orgwww.hapmap.org
Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser
Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser
1.1. HapMap data sets are useful because individual genotype HapMap data sets are useful because individual genotype data in deeply sampled populations can be used to data in deeply sampled populations can be used to determine optimal genotyping strategies (tagSNPs) or determine optimal genotyping strategies (tagSNPs) or perform population genetic analyses (linkage perform population genetic analyses (linkage disequilbrium)disequilbrium)
2.2. Data are specific to the HapMap project (not all dbSNP)Data are specific to the HapMap project (not all dbSNP) HapMap data is available in dbSNPHapMap data is available in dbSNP
3.3. Visualization of data and direct access to Visualization of data and direct access to SNP data, individual genotypes, and LD analysisSNP data, individual genotypes, and LD analysis
possible in the browser and formats can be saved possible in the browser and formats can be saved for Haploviewfor Haploview
Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction
How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?
1. SeattleSNPs - Candidate gene website1. SeattleSNPs - Candidate gene website
2. Other web applications 2. Other web applications GVSGVS
HapMap Genome BrowserHapMap Genome Browser
3. Entrez Gene3. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP
NCBI - Database ResourceNCBI - Database Resource
www.ncbi.nlm.nih.gov
PCSK9
Finding SNPs using NCBI databasesFinding SNPs using NCBI databaseshttp://www.ncbi.nlm.nih.gov/
DefaultDefaultView cSNPsView cSNPs
Finding SNPs using NCBI databasesFinding SNPs using NCBI databaseshttp://www.ncbi.nlm.nih.gov/
PCSK9
Finding SNPs - Entrez SNP SummaryFinding SNPs - Entrez SNP Summary
1.1. dbSNP is useful for investigating detailed information on a dbSNP is useful for investigating detailed information on a small number SNPs - and it’s good for a picture of the genesmall number SNPs - and it’s good for a picture of the gene
2.2. Entrez SNP is a direct, fast database for querying SNP dataEntrez SNP is a direct, fast database for querying SNP data
3.3. Data from Entrez SNP can be retrieved in batches for many SNPsData from Entrez SNP can be retrieved in batches for many SNPs
4.4. Entrez SNP data can be “limited” to specific subsets of SNPsEntrez SNP data can be “limited” to specific subsets of SNPsand formatted in plain text for easy parsing and manipulationand formatted in plain text for easy parsing and manipulation
5.5. More detailed queries can be formed using specific “field tags” More detailed queries can be formed using specific “field tags” for retrieving SNP data for retrieving SNP data
SummarySummary Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction
Reviewing candidate genes using views and resources in Reviewing candidate genes using views and resources in
- SeattleSNPs- SeattleSNPs
Integration of dense, gene-centric SNP maps with genomic Integration of dense, gene-centric SNP maps with genomic HapMap SNPs HapMap SNPs
- GVS - GVS
HapMap viewerHapMap viewer
NCBI databases through Entrez portal-Entrez Gene, dbSNP, Entrez SNP-many ways to retrieve and format data
Genome Variation Server: GVS
GWAS AsthmaMoffatt et al Nature 448: 470-473, 2007
New Variation to Consider - Structural Variation
Types of Structural Variants
Insertions/DeletionsInversions DuplicationsTranslocations
Size:Large-scale (>100 kb) intermediate-scale (500 bp–100 kb)Fine-scale (1–500 bp) More than 10% of
the genome sequence
Nature 447: 161-165, 2007
Detection of Outliers of the Distribution
X-linked SNPX-linked SNP Unknown SNPUnknown SNP
Genetic Strategy - New Insights
allele frequency HIGHLOW
effectsize
WEAK
STRONG
LINKAGE ASSOCIATION
??
Ardlie, Kruglyak & Seielstad (2002) Nat. Genet. Rev. 3: 299-309
Common DiseaseMany Rare Variants
High Density Lipoprotein (HDL)
Sequencing Known Candidate Genes for Functional VariationFrom Individuals at the Tails of the Trait Distribution
Low HDL High HDLInd
ivid
uals
ABCA1 and HDL-C
• Observed excess of rare, nonsynonymous variants in low HDL-C samples at ABCA1
• Demonstrated functional relevance in cell culture
–Cohen et al, Science 305, 869-872, 2004
Many examples emerging
Common Disease Rare Variants
Personalized Human Genome Sequencing
Solexa - an example