Upload
stanley-douglas
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Toward the genetic basis of adaptation using arrays
Justin BorevitzEcology & EvolutionUniversity of Chicagohttp://naturalvariation.org/talks
Arabidopsis thaliana• Genome Sequence 2000 (120Mb),• 20 strains by Perlegen, Weigel, Nordborg, Ecker• ~1% sequence variation, ~3000 collected lines• A. lyrata, Capsella rubella sister species JGI 2006• >5300 Research Labs (17th annual conference)• Fields study data to come, Annie Schmidt et al• 340k Sequence Indexed collection of KO lines• Gene Expression Atlas >300 tissues, time points• 15,000 full length cDNAs in recombination clones
Widely Distributed
http://www.inra.fr/qtlat/NaturalVar/NewCollection.htm
Olivier Loudet
Aranzana, et al PLOS genetics (2005), Sung Kim, Keyan Zhao
Local Population Variation
Scott HodgesIvan Baxter
Seasonal Variation
Matt Horton
Megan Dunning
Developmental Plasticity == Behavior
Seasons in the Growth Chamber
• Changing Day length• Cycle Light Intensity• Cycle Light Colors• Cycle Temperature
Sweden Spain
Seasons in the Growth Chamber
• Changing Day length
• Cycle Light Intensity
• Cycle Light Colors
• Cycle Temperature
Day Length
0:00
2:00
4:00
6:00
8:00
10:00
12:00
14:00
16:00
18:00
20:00
22:00
sep
oct
nov
dec
jan
feb
mar
apr
may jun jul
aug
month
hour
s
Sweden
Spain
standard
standard
Light Intensity
0
200
400
600
800
1000
1200
1400se
p
oct
nov
dec
jan
feb
mar
apr
may jun jul
aug
month
W/m
2
Sweden
Spain
standard
Temperature
-10
-5
0
5
10
15
20
25
30
35
sep
oct
nov
dec
jan
feb
mar
apr
may jun jul
aug
monthde
gree
s C
Spain High
Spain Low
Sweden High
Sweden Low
standard
Talk Outline
• Single Feature Polymorphisms (SFPs)– Methylation
– Potential deletions
– Genetic Mapping
• Resequencing/ Haplotypes
• Variation Scanning
• Single Feature Polymorphisms (SFPs)– Methylation
– Potential deletions
– Genetic Mapping
• Resequencing/ Haplotypes
• Variation Scanning
Tiling Arrays vs Resequencing Arrays
• AtTILE1, universal whole genome array
25mer every ~35bp, > 6.5 Million features
single array, many individuals.
• Re-sequencing array 120Mbp*8features
~1 Billion features, 8 wafers
20 Accessions available mid year
Perlegen, Max Planck (Weigel),
USC (Nordborg), Salk (Ecker)
GeneChip
RNA DNA
Universal Whole Genome Array
Transcriptome AtlasExpression levelsTissues specificity
Transcriptome AtlasExpression levelsTissues specificity
Gene DiscoveryGene model correctionNon-coding/ micro-RNAAntisense transcription
Gene DiscoveryGene model correctionNon-coding/ micro-RNAAntisense transcription
Alternative SplicingAlternative Splicing Comparative GenomeHybridization (CGH)
Insertion/Deletions
Comparative GenomeHybridization (CGH)
Insertion/Deletions
MethylationMethylation
ChromatinImmunoprecipitation
ChIP chip
ChromatinImmunoprecipitation
ChIP chip
Polymorphism SFPsDiscovery/Genotyping
Polymorphism SFPsDiscovery/Genotyping
Control for hybridization/genetic polymorphismsto understand true EXPRESSION polymorphismsTrue cis variation == Allele Specific Expression
SNP SFP MMMMM MSFP
SFP
MMMMM M
Chromosome (bp)
con
serv
atio
n
SNP
ORFa
start AAAAA
Tra
nsc
ripto
me
Atla
s
ORFb
deletion
Improved Genome Annotation
Potential Deletions
Delta p0 FALSE Called FDR
1.00 0.95 18865 160145 11.2%
1.25 0.95 10477 132390 7.5%
1.50 0.95 6545 115042 5.4%
1.75 0.95 4484 102385 4.2%
2.00 0.95 3298 92027 3.4%
SFP detection on tiling arrays
Intergenic Exon intron
SFPs 60770 23519 17216
total 685575 665524 301648
% 8.86% 3.53% 5.71%
SFPs/gene 0 >=1 >=2 >=3 >=4 >=5
genes 16322 9146 4304 2495 1687 1121
Methods for labeling
• Extract genomic 100ng DNA (single leaf)
• Digest with either msp1 or hpa2 CCGG
• Label with biotin random primers
• Hybridize to array
• Fit model
methylated features and mSFPs
>10,000 of 100,000 at 5% FDR
Enzyme effect, on CCGG features GxE
276 at 15% FDR
mQTL?
SFP Resequencing
• Advantages– Discovery and typing tool– Indels, rare variants, HMM tool– Quantitative score– Good for low polymorphism < 1%
• Caveats– No SNP knowledge, synonymous?– Bad for high polymorphism > 1%
• Rearrangements, Reference sequence
Natural Variation on Tiling Arrays
Potential Deletions
>500 potential deletions45 confirmed by Ler sequence
23 (of 114) transposons
Disease Resistance(R) gene clusters
Single R gene deletions
Genes involved in Secondary metabolism
Unknown genes
Potential Deletions Suggest Candidate Genes
FLOWERING1 QTL
Chr1 (bp)
Flowering Time QTL caused by a natural deletion in FLM
FLM
FLM natural deletion
(Werner et al PNAS 2005)
Chip genotyping of a Recombinant Inbred Line
29kb interval
Map bibb100 bibb mutant plants100 wt mutant plants
Array Mapping
Hazen et al Plant Physiology 2005
eXtreme Array Mapping
Histogram of Kas/Col RILs Red light
hypocotyl length (mm)
cou
nts
6 8 10 12 14
02
46
81
01
2
15 tallest RILs pooled vs15 shortest RILs pooled
LOD
eXtreme Array Mapping
Allele frequencies determined by SFP genotyping. Thresholds set by simulations
0
4
8
12
16
0 20 40 60 80 100cM
LO
D
Composite Interval Mapping
RED2 QTL
Chromosome 2
RED2 QTL 12cM
Red light QTL RED2 from 100 Kas/ Col RILs
Drosophila, Chao-Qiang Lai -Tufts University
Array Haplotyping
• What about Diversity/selection across the genome?
• A genome wide estimate of population genetics parameters, θw, π, Tajima’D, ρ
• LD decay, Haplotype block size• Deep population structure?• Col, Lz, Bur, Ler, Bay, Shah, Cvi, Kas,
C24, Est, Kin, Mt, Nd, Sorbo, Van, Ws2Fl-1, Ita-0, Mr-0, St-0, Sah-0
Array Haplotyping
Inbred lines
Low effectiverecombinationdue to partialselfing
Extensive LDblocks
Col Ler Cvi Kas Bay Shah Lz Nd
Chr
omos
ome1
~50
0kb
(-4,-3.5] (-3,-2.5] (-2,-1.5] (-1,-0.5] (0,0.5] (1,1.5] (2,2.5] (3,3.5]
T statistic
fre
qu
en
cy
0
e+
00
4
e+
04
8
e+
04
Distribution of T-stats
null (permutation)actual
Not Col ColNA NA duplications
32,427Calls
208,729
12,250 SFPs
Sequence confirmation of SFPs
SFP SNP Total FPR FDR Sensitivity
bay 44 61 1375 0.8% 25.0% 54.1%
bur 47 57 1320 1.1% 29.8% 57.9%
cvi 69 92 1325 1.2% 21.7% 58.7%
ler 41 51 1466 0.6% 22.0% 62.7%
lz 37 40 1441 0.5% 18.9% 75.0%
mr 67 87 1191 1.1% 17.9% 63.2%
mt 46 48 1413 0.9% 26.1% 70.8%
sorbo 37 53 1317 0.9% 29.7% 49.1%
ws 29 47 1369 0.3% 13.8% 53.2%
SFPs for reverse genetics
http://naturalvariation.org/sfp
14 Accessions 30,950 SFPs`
Chromosome Wide Diversity
Diversity 50kb windows
Tajima’s D like 50kb windows
RPS4 unknown
R genes vs bHLH
(-1,-0.8] (-0.6,-0.4] (-0.2,0] (0.2,0.4] (0.6,0.8]
Selection
Tajima's D like statistic
freq
uen
cy
01
02
03
04
05
06
07
0
RgenesbHLH
Experimental Design of Association Study
• Sample > 2000 wild strains, ~50-100 SNPs• Select 384 unstructured reference fine
mapping set• SFP resequencing of 384 lines• Haplotype map/ LD recombination blocks• Scan Genome for variation/selection• Measure phenotype in Seasonal Chambers• Associate Quantitative phenotypes with
HapMap
Review
• Single Feature Polymorphisms (SFPs) can be used to
• Methylation Variable sites• Potential deletions (candidate genes)• eXtreme Array Mapping
• Haplotyping• Diversity/Selection
• Association Mapping
NaturalVariation.orgNaturalVariation.orgUSC
Magnus NordborgPaul Marjoram
Max Planck
Detlef Weigel
Scripps
Sam Hazen
University of Michigan
Sebastian Zollner
University of Chicago
Xu ZhangEvadne SmithKen Okamoto
Michigan State
Shinhan Shui
PurdueIvan Baxter
University of Guelph, Canada
Dave Wolyn
Sainsbury Laboratory
Jonathan Jones
University of Chicago
Xu ZhangEvadne SmithKen Okamoto
Michigan State
Shinhan Shui
PurdueIvan Baxter
University of Guelph, Canada
Dave Wolyn
Sainsbury Laboratory
Jonathan Jones
USC
Magnus NordborgPaul Marjoram
Max Planck
Detlef Weigel
Scripps
Sam Hazen
University of Michigan
Sebastian Zollner