44
Casey M. Bergman Faculty of Life Sciences University of Manchester [email protected] Inferring functional constraints on Drosophila noncoding DNA from patterns of sequence evolution.

Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Casey M. Bergman

Faculty of Life SciencesUniversity of Manchester

[email protected]

Inferring functional constraints on Drosophila noncoding DNA from patterns of sequence evolution.

Page 2: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Outline of Talk

• Noncoding DNA, cis-regulatory annotation and Drosophila as a system

• Conserved noncoding sequences are selectively constrained.

• Spatial constraints on noncoding sequences

Page 3: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Higher organisms have ahigher proportion of noncoding DNA

Bacteria15 %

Yeast30 %

Worm70 %

Fly75 %

Page 4: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

The function of most noncoding DNA is unknown & unannotated

Bioinformatic & functional analysis of noncoding DNA ⇒

Genome organization

Transcriptional regulation

= Exon

Mef2

Mef2

Mef2

Mef2

Mef2

CG15863

CG12130

CG1418

CG12133

Adam

CG12134

CG12134

eve

TER94

TER94

Pka-R2

Pka-R2

Pka-R2

CG12128

BS 1360

Page 5: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

(A)n

Mef2

Mef2

Mef2

Mef2

Mef2

CG15863

CG12130

CG1418

CG12133

Adam

CG12134

CG12134

eve

TER94

TER94

Pka-R2

Pka-R2

Pka-R2

CG12128

BS 1360

Enhancers

AR3/7

2

APRCQ4/6

mes

15RP2

Transposable elements

Goal: comprehensive functional annotation of noncoding sequences in Drososphila

Page 6: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Why is annotation of cis-regulatory sequences important?

• Better understand development

• Better understand mechanisms of transcription

• Provide material for forward genetics

• Provide material for evolutionary biology

• Generate data for systems biology

Page 7: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Why Drosophila as a model system?

~120 Mb of euchromatin~15,000 genes

75% noncoding

Compact, deletion bias

Page 8: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

“Pseudogenes” decay rapidly by deletion in Drosophila

Petrov and Hartl (1998) Mol. Biol. Evol. 15:293-302

Page 9: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Genes with complex expression have longer intergenic regions in compact genomes

Nelson, Hersh & Carroll (2004) Genome Biology 5:R25

Page 10: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Longer introns & intergenic regions have slower rates of sequence evolution in Drosophila

Halligan & Keightley (2006) Genome Research 16:875-884

Page 11: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

A wealth of comparative genomic data exists for the genus Drosophila

http://species.flybase.nethttp://rana.lbl.gov/drosophila

Page 12: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

image from Pavel Tomancak (MPI-Dresden)

Thousands of candidate expression patterns:BDGP embryonic in situ database

http://www.fruitfly.org/cgi-bin/ex/insitu.pl

Page 13: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Systematic annotation of cis-regulatory datain Drosophila: FlyReg & REDfly databases

Bergman et al. (2005) Bioinformatics 21:1747-1749Gallo et al. (2006) Bioinformatics 22:381-383

chr2R: 5485000 5490000 5495000 5500000FlyBase Protein-Coding Genes

FlyReg: Drosophila DNase I Footprint Database

Regulatory elements from ORegAnno

CG12134CG12134

eve TER94TER94

eveUnspecified

evettk

UnspecifiedUnspecified

knihbhbknihbknihbhbknihbhbknihbhb

hbhb

KrKrKr

bcdKrgt

bcdgtKrKrKr

bcdKrKr

bcdKrgthbKr

bcdKrhb

Krhb

UnspecifiedUnspecifiedUnspecified

ttk

Unspecified

ttk

Unspecified

prdeve

UnspecifiedUnspecified

eveprd

UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecified

panMedMedMedpanpan

tinpan

tinMed

tinMedzfh1pan

tinpanpan

OREG0005984OREG0005985

OREG0005981OREG0005968

OREG0005972

OREG0005967OREG0005970

OREG0005971OREG0005969

OREG0005974OREG0005976OREG0005979OREG0005977

OREG0005980OREG0005978

OREG0005973

OREG0005975

OREG0005982OREG0005983

Page 14: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

ORegAnno: Open Regulatory Annotation

Montgomery et al. (2006) Bioinformatics 22:637-640

Page 15: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

shnAbd-A

fkh

ko

Dll

dpp

mus209

tsh

bcd

salm

Antp

dl

Ubx

zen

kni

ftz

eve

hb

tll

Kr

Trl

grh

cad

h

en

gt

ttk

cis-regulatory annotation & systems biology

Ashburner & Bergman (2005) Genome Research 15:1661-1667

Page 16: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Outline of Talk

• Noncoding DNA, cis-regulatory annotation and Drosophila as a system

• Conserved noncoding sequences are selectively constrained.

• Spatial constraints on noncoding sequences

Page 17: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

mel

sim yak ere tak ana pse

500 bp spacer

Pattern of noncoding sequence evolution in Drosophila: the eve stripe 2 enhancer

block

Page 18: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Are conserved blocks functionally constrained or simply mutational cold spots?

Bergman & Kreitman (2001) Genome Research 11:1335-1345

Clark (2001) Genome Research 11:1319-1320

median: 19 bp

Page 19: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Using population genetics to test of the mutational cold-spot hypothesis

1. Excess of rare derived mutations in blocks relative to spacers

(Non-parametric test - blocks vs. spacers, frequency spectrum)

If blocks are functionally constrained we predict the following:

Page 20: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Using population genetics to test of the mutational cold-spot hypothesis

1. Excess of rare derived mutations in blocks relative to spacers

(Non-parametric test - blocks vs. spacers, frequency spectrum)

2. Excess of mutations in blocks relative to fixed differences

(“MK” test - blocks vs. spacers, polymorphism & divergence)

If blocks are functionally constrained we predict the following:

Page 21: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

0 ! 0.1 0.1 ! 0.2 0.2 ! 0.3 0.3 ! 0.4 0.4 ! 0.5 0.5 ! 0.6 0.6 ! 0.7 0.7 ! 0.8 0.8 ! 0.9 0.9 ! 1.0

Derived Allele Frequency

0.0

2.0

4.0

6.0

Fra

ction o

f S

NP

s

1. Excess of rare derived mutations in blocks relative to spacers

(Non-parametric test - blocks vs. spacers, frequency spectrum)

2. Excess of mutations in blocks relative to fixed differences

(“MK” test - blocks vs. spacers, polymorphism & divergence)

spacer

If blocks are functionally constrained we predict the following:

Using population genetics to test of the mutational cold-spot hypothesis

Page 22: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

0 ! 0.1 0.1 ! 0.2 0.2 ! 0.3 0.3 ! 0.4 0.4 ! 0.5 0.5 ! 0.6 0.6 ! 0.7 0.7 ! 0.8 0.8 ! 0.9 0.9 ! 1.0

Derived Allele Frequency

0.0

2.0

4.0

6.0

Fra

ction o

f S

NP

s

blockspacer

If blocks are functionally constrained we predict the following:

Using population genetics to test of the mutational cold-spot hypothesis

1. Excess of rare derived mutations in blocks relative to spacers

(Non-parametric test - blocks vs. spacers, frequency spectrum)

2. Excess of mutations in blocks relative to fixed differences

(“MK” test - blocks vs. spacers, polymorphism & divergence)

Page 23: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Conserved blocks in humans are not mutational cold-spots

Drake et al. (2005) Nat. Genet. 38:223-7

Yoruba (African)

American (European)

Beijing+Tokyo (Asian)

block

spacer

Page 24: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

If blocks are functionally constrained we predict the following:

block blockspacer

Divergence

Polymorphism

div.

π

1. Excess of rare derived mutations in blocks relative to spacers

(Non-parametric test - blocks vs. spacers, frequency spectrum)

2. Excess of mutations in blocks relative to fixed differences

(“MK” test - blocks vs. spacers, polymorphism & divergence)

Using population genetics to test of the mutational cold-spot hypothesis

Page 25: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Harvesting data from GenBank using PDA: a pipeline to study polymorphism

Casillas & Barbadilla (2004) Nucl. Acids Res. 32:W166-W169

Get sequences & annotations

Input from sequencesfrom Genbank,

corresponding to theDrosophila genus

Minimum of 2sequences per category

MSAparameters

Gene, CDS, exon,intron, 5’UTR,

3’UTR, promoter

Group byspecies & gene

Sequences &annotations

1b

Muscle

Sequencesorganized incategories

2

Alignmentvalidation

Alignmentswith Scores

3

Sequencessubgroups

4

Read geneannotations

8

Extract generegions

Sequences,positions and orientations

9 Alignmentssubgroups

56

Polymorphism

Syn & Non-synpolymorphisms

Linkagedisequilibrium

Codon bias

Diversity AnalysisModule

7Web-based

output

Alignments

Jalview

Output

1a

MySQLdatabase

Seq. manipulations

External programs

OutputDiversity analysis

Low qualitysequences

excluded

Alignqualityvalues

Page 26: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Highly conserved noncoding sequences -the UCSC PhastCons track

chr2R:

Conservation

d_simulansd_sechellia

d_yakubad_erecta

d_ananassaed_pseudoobscura

d_persimilisd_willistoni

d_virilisd_mojavensisd_grimshawi

5485000 5490000 5495000 5500000FlyBase Protein-Coding Genes

12 Flies, Mosquito, Honeybee, Beetle Multiz Alignments & phastCons Scores

PhastCons Conserved Elements (12 Flies, Mosquito, Honeybee, Beetle)

CG12134CG12134

eve TER94TER94

lod=13lod=13lod=48lod=16lod=34lod=24lod=22lod=14lod=13lod=59lod=27lod=38lod=22lod=21lod=31lod=27lod=43

lod=109lod=11lod=41lod=31lod=15

lod=23

lod=15lod=46lod=45

lod=15lod=11

lod=10lod=86lod=153

lod=258

lod=71

lod=112

lod=25

lod=190

lod=119

lod=127

lod=73

lod=111

lod=19lod=121

lod=194lod=78lod=85

lod=112

lod=69

lod=28lod=35lod=72

lod=279

lod=17

lod=36

lod=44lod=12lod=11lod=36

lod=57

lod=12lod=39lod=176

lod=11lod=35

lod=20lod=25

lod=107lod=17lod=23lod=84

lod=22

lod=148

lod=27

lod=91lod=139

lod=42lod=65

lod=91

lod=68lod=125

lod=16lod=182

lod=324

lod=14

lod=13lod=57lod=55lod=17lod=18lod=21lod=81

lod=81

lod=101lod=47

lod=110lod=698

lod=18lod=13

lod=44

lod=19

lod=17

lod=168

lod=98lod=146

lod=37

lod=30lod=15lod=12

lod=27

lod=14lod=69

lod=65

lod=67

lod=76

lod=54

lod=18

lod=96

lod=32

lod=11

lod=144lod=54lod=14

lod=164

lod=20lod=72lod=46

lod=82

lod=103

lod=92

lod=19lod=70

lod=13

lod=150

lod=15

lod=15

lod=50

lod=171

lod=104

lod=116lod=14

lod=91

lod=153

lod=18

lod=32lod=118

lod=177

lod=108lod=14lod=24

lod=23

lod=261lod=61lod=24

lod=102

lod=40

lod=48lod=16

lod=19

lod=18

lod=23lod=36

lod=12lod=18

lod=209

lod=138lod=13

lod=147lod=18lod=25lod=17

lod=41lod=180

lod=65

lod=156

lod=17

lod=163

lod=110lod=99

lod=126lod=214

lod=17lod=98

lod=31lod=22lod=73lod=109

lod=26

lod=44

lod=18

lod=85lod=10

lod=319

lod=32lod=11

lod=124lod=11

lod=10lod=124

lod=34lod=46

lod=22lod=39lod=166

lod=20lod=44lod=103

lod=21

lod=71lod=20

lod=13lod=42

lod=16lod=13

lod=11lod=12

lod=11lod=26lod=14

lod=292lod=1111

lod=1301lod=1091

Page 27: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

The data: alignments of ~12 D. melanogaster alleles with reference sequence and 1 D. simulans allele

High frequency derived spacer allele

Low frequency derived block alleles

Page 28: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Glinka (2003) + Ometto (2005)

African

Glinka (2003) + Ometto (2005)

European

Orengo (2004)

European

Intronic 167 173 28

Intergenic 90 93 80

Total loci 257 266 108

# Alleles 11.7 11.8 12.7

bp block 30,683 33,292 28,721

bp spacer 79,317 87,379 47,590

Summary of the polymorphism data sets

Page 29: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

0

1,250

2,500

3,750

5,000

polymorphism divergence

Single nucleotide polymorphisms & fixed differences are reduced in conserved blocks

66% reduction in polymorphism

77% reduction in divergence

3334

437

4854

374

Obs

erve

d nu

mbe

r

blockspacer

Casillas, Barbadilla & Bergman (2007) Mol. Biol. Evol. 24:2222-2234

Page 30: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

0

0.13

0.26

0.39

0.52

0.65

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

blockspacer

Conserved blocks have an excess of rare derived point mutations

KS test: p<6x10-11

Derived allele frequency (DAF)

Freq

uenc

y

Casillas, Barbadilla & Bergman (2007) Mol. Biol. Evol. 24:2222-2234

Page 31: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

0

0.375

0.750

1.125

1.500

Excess of polymorphism in conserved blocks relative to fixed differences between species

Poly

mor

phis

m :

dive

rgen

ce χ2 test:p<5x10-13

Block Spacer

Poly. 437 3334

Div. 374 4854

block spacer

Casillas, Barbadilla & Bergman (2007) Mol. Biol. Evol. 24:2222-2234

Page 32: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

0

0.375

0.750

1.125

1.500

Excess polymorphism is observed in both intergenic and intronic conserved blocks

χ2 test:p<5x10-5

0

0.375

0.750

1.125

1.500

Poly

mor

phis

m :

dive

rgen

ce χ2 test:p<3x10-9

block spacer

Intergenic Intronic

Casillas, Barbadilla & Bergman (2007) Mol. Biol. Evol. 24:2222-2234

Page 33: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Blocks and spacers are more constrained than 4-fold degenerate “silent” sites in genes.

0

0.13

0.26

0.39

0.52

0.65

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

blockspacer4-fold

Derived allele frequency (DAF)

Freq

uenc

yBlock vs 4-fold: P = 2.48e−12Spacer vs 4-fold: P = 0.00471

Casillas, Barbadilla & Bergman (2007) Mol. Biol. Evol. 24:2222-2234

Page 34: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Conserved noncoding sequences in Drosophila are selectively constrained for point mutations

• Reduction in polymorphism and divergence in blocks

• Excess of rare alleles in blocks

• Excess of polymorphism relative to divergence in blocks

• Not due to use of spacers as inappropriate control sequences, differences in GC content, or alignment error

• Both intergenic and intronic blocks are constrained

Page 35: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Outline of Talk

• Noncoding DNA, cis-regulatory annotation and Drosophila as a system

• Conserved noncoding sequences are selectively constrained.

• Spatial constraints on noncoding sequences

Page 36: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

mel

sim yak ere tak ana pse

500 bp

Pattern of noncoding sequence evolution in Drosophila: the eve stripe 2 enhancer

| slope | ~ 1

Page 37: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Spacing between conserved noncoding sequences is maintained in divergent Drosophila species

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

D. melanogaster

spacer interval length (log[bp])

D.

pse

ud

oo

bscu

ra

spacer

inte

rval le

ngth

(lo

g[b

p])

r = 0.85 p < 10-6

Bergman et al. (2002) Genome Biology 3:0086.

Page 38: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Indels are under constraint in mamamlian noncoding DNA

Lunter et al. (2005) PLoS Comp. Biol 2:e5

Log1

0 (F

requ

ency

) Unique noncoding

Ancestral Repeat

Distance between indels

Page 39: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Similar constraints on insertions and deletions in blocks and spacers

0

0.11

0.22

0.33

0.44

0.550.

10.

20.

30.

40.

50.

60.

70.

80.

91.

0

Freq

uenc

y

Casillas, Barbadilla & Bergman (2007) Mol. Biol. Evol. 24:2222-2234

0

0.375

0.750

1.125

1.500

Poly

mor

phis

m :

dive

rgen

ce χ2 test:p=0.029

block spacer

χ2 test:p=0.568

Page 40: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Block Spacer

Poly. 66 380

Div. 107 901

χ2 test:p=0.029

Different selective constraint on indels not due to low power or small sample size

Indel Observed

Block Spacer

Poly. 437 3334

Div. 374 4854

χ2 test:p<5x10-13

Point Mutation Observed

Block Spacer

Poly. 51 394

Div. 72 935

χ2 test:p=0.007

Point Mutation Rescaled to Indel Observed

Page 41: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

A molecular interpretation of conservation in Drosophila noncoding regions

= Conserved noncoding sequence

= Spacer intervals

= Transcription factors

Page 42: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

A molecular interpretation of conservation in Drosophila noncoding regions

= Conserved noncoding sequence

= Spacer intervals

= Transcription factors

Page 43: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

A hierarchical model of spatial constraints on cis-regulatory regions

binding site binding site

modulemodule

enhancer

cis-regulatory region

//

~ 5 bp

~ 100 bp

~ kbpenhancer

after Ondek et al. (1988) Nature 333:40-45

Page 44: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/...Conservation d_simulans d_sechellia d_yakuba d_erecta d_ananassae d_pseudoobscura

Acknowledgements

Marty Kreitman

Michael Ashburner

Sue Celniker, Gerry Rubin,Eddy Rubin

Sonia Casillas, Antonio Barbadilla

Stephen Montgomery, Obi GriffithsMarc Halfon, Steve Gallo