Alternative splicing: A playground of evolution

Preview:

DESCRIPTION

Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems. Alternative splicing of human (and mouse) genes. Evolution of alternative exon-intron structure human-mouse - PowerPoint PPT Presentation

Citation preview

Alternative splicing: A playground of evolution

Mikhail Gelfand

Research and Training Center for Bioinformatics

Institute for Information Transmission Problems

Alternative splicing of human(and mouse) genes

5% Sharp, 1994 (Nobel lecture)

35% Mironov-Fickett-Gelfand, 1999

38% Brett-…-Bork, 2000 (ESTs/mRNA)

22% Croft et al., 2000 (ISIS database)

55% Kan et al., 2001 (11% AS patterns conserved in mouse ESTs)

42% Modrek et al., 2001 (HASDB)

~33% CELERA, 2001

59% Human Genome Consortium, 2001

28% Clark and Thanaraj, 2002

all? Kan et al., 2002 (17-28% with total minor isoform frequency > 5%)

41% (mouse) FANTOM & RIKEN, 2002

60% (mouse) Zavolan et al., 2003

• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse– human SNPs

• Alternative splicing and protein structure

Data and Methods (routine)

• known alternative splicing– HASDB (human, ESTs+mRNAs)– ASMamDB (mouse, mRNAs+genes)

• additional variants– UniGene (human and mouse EST clusters)

• complete genes and genomic DNA– GenBank (full-length mouse genes)– human genome

• TBLASTN (initial identification of orthologs: mRNAs against genomic DNA)

• BLASTN (human mRNAs against genome)• Pro-EST (spliced alignment, ESTs and mRNA against

genomic DNA)

• Pro-Frame (spliced alignment of proteins against genomic DNA)– confirmation of orthology:

• same exon-intron structure for at least one isoform• >70% identity over the entire protein length

– analysis of conservation of human alternative splicing in the mouse genome: align human protein to mouse genomic DNA; the isoform is conserved if• all exons or parts of exons are conserved• all sites are conserved

– same procedure for mouse proteins and human DNA

We do not require that the isoform is actually observed as mRNA or ESTs

166 gene pairs

42 84 40

human mouse

Known alternative splicing:

126 124

Elementary alternatives

Cassette exon

Alternative donor site

Alternative acceptor site

Retained intron

Human genes

mRNA EST

cons. non-cons. cons. non-cons.

Cassette exons 56 25 74 26

Alt. donors 18 7 16 10

Alt. acceptors 13 5 19 15

Retained introns 4 3 5 0

Total 96 30 114 51

Total genes 45 28 41 44

Conserved elementary alternatives: 69% (EST) - 76% (mRNA)

Genes with all isoforms conserved: 57 (45%)

Mouse genes

mRNA EST

cons. non-cons. cons. non-cons.

Cassette exons 70 5 39 9

Alt. donors 24 6 17 6

Alt. acceptors 15 6 16 9

Retained introns 8 7 10 4

Total 117 24 82 28

Total genes 68 22 30 26

Conserved elementary alternatives: 75% (EST) - 83% (mRNA)

Genes with all isoforms conserved: 79 (64%)

Real or aberrant non-conserved AS?

• 24-31% human vs. 17-25% mouse elementary alternatives are not conserved

• 55% human vs 36% mouse genes have at least one non-conserved variant

• denser coverage of human genes by ESTs: – pick up rare (tissue- and stage-specific) => younger

variants– pick up aberrant (non-functional) variants

• 17-24% mRNA-derived elementary alternatives are non-conserved (compared to 25-32% EST-derived ones)

Comparison to other studies.Modrek and Lee, 2003: skipped exons

• inclusion level is a good predictor of conservation– 98% constitutive exons are conserved– 98% major form exons are conserved– 28% minor form exons are conserved

• inclusion level of conserved exons in human and mouse is highly correlated

• Minor non-conserved form exons are errors? No:– minor form exons are supported by multiple ESTs– 28% of minor form exons are upregulated in one specific tissue– 70% of tissue-specific exons are not conserved– splicing signals of conserved and non-conserved exons are similar

• Evolution of alternative exon-intron structure – human-mouse

– Drosophila and Anopheles• Evolution of alternative splicing sites: MAGE-A family of CT

antigens• Evolutionary rate in constitutive and alternative regions

– human-mouse– human SNPs

• Alternative splicing and protein structure

Fruit fly and mosquito

• Technically more difficult than human-mouse:– incomplete genomes– difficulties in alignment, especially at gene

termini– changes in exon-intron structure irrespective of

alternative splicing (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

Methods

• Pro-Frame: Align Dme protein isoforms to Dps and Aga genes

• coding segments: regions in Dme genes between Dme intron shadows

• We follow the fate of Dme exons and coding segments in Dps and Aga genomes

• slices: regions between all exon-exon junctions (intron shadows) from all three genomes (Dme, Dps, Aga) mapped to Dme isoforms

• slice is conserved if it aligns with 35% identity

Conservation of coding segments

constitutive segments

alternative segments

D. melanogaster – D. pseudoobscura

97% 75-80%

D. melanogaster – Anopheles gambiae

77% ~45%

Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes

blue – exactgreen – divided exonsyellow – joined exonorange – mixedred – non-conserved

• retained introns are the least conserved

• mutually exclusive exons are as conserved as constitutive exons

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes

blue – exactgreen – divided exonsyellow – joined exonsorange – mixedred – non-conserved

• ~30% joined, ~10% divided exons (less introns in Aga)

• mutually exclusive exons are conserved exactly

• cassette exons are the least conserved

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

CG1517: cassette exon in Drosophila, alternative acceptor site in Anopheles

Dme, Dps

Aga

a)

CG31536: cassette exon in Drosophila, shorter cassette exon and alternative

donor site in Anopheles

Dme, Dps

Aga

CG1587: alternative acceptor site in Drosophila, candidate retained intron

in intronless gene of Anopheles

Dme

Aga

Dps

• Evolution of alternative exon-intron structure

– human-mouse

– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse

– human SNPs

• Alternative splicing and protein structure

Alternative splicing in a multigene family: the MAGEA family of

cancer/testis specific antigens

• A locus at the X chromosome containing eleven recently duplicated genes: two subfamilies of four genes each and three single genes

• Retrogene: one protein-coding exon, multiple different 5’-UTR exons

• Mutations create new splicing sites or disrupt existing sites

Birth of donor sites (new GT in alternative intial exon 5)

Ancestral gene: GCCAGGCACGCGGATCCTGACGTTCACATCTAGGGCTMAGEA3 GCCAGGCACGTGAGTCCTGAGGTTCACATCTACGGCTMAGEA6 GCCAGGCACGTGAGTCCTGAGGTTCACATCTACGGCTMAGEA2 GCCAAGCACGCGGATCCTGACGTTCACATGTACGGCTMAGEA12 GCCAAGCACGCGGATCCTGACGTTCACATCTGTGGCTMAGEA1 GCCAGGCACTCGGATCTTGACGTCCCCATCCAGGGCTMAGEA4 --CAGGCACTCGGATCTTGACATCCACATCGAGGGCTMAGEA5 GACAGGCACACCCATTCTGACGTCCACATCCAGGGCT

Birth of an acceptor site (new AG and polyY tract in

MAGEA8-specific cassette exon 3)

MAGEA3 TTGAGGGTACC-----------CCTGGGA---CAGAATGCGGAMAGEA6 TTGAGGGTACC-----------CCTGGGA---CAGAATGCGGAMAGEA2 TTGAGGGTACT-----------CCTGGGC---CAGAATGCAGAMAGEA12 TTGAGGGTACC-----------CCTGGGC---CAGAACGCTGAMAGEA1 CTGAGGGTACC-----------CCAGGAC---CAGAACACTGAMAGEA4 TTGAGGGTACC-----------ACAGGGC---CAGAACGCAGAMAGEA5 TTGAGGGCACC-----------CTTGGGC---CAGAACACAGAMAGEA8 TTGAGGGTACCCTCGATGGTTCTCCTAGCAGGCAAAAAACAGAMAGEA9 TCGAGGGTACC-----------TCCAGGC---CAGAGAAACTCMAGEA10 CTGAGGGTACC-----------CCCAGCC---CATAACACAGAMAGEA11 TTGAGGGTTCC-----------TCCTGGC---CAGAACACAGA

Birth of an alternative donor site (enhanced match to the consensus (AG)

in cassette exon 2)

Ancestral gene: GAGCTCCAGGAACmAGGCAGTGAGGCCTTGGTCTGMAGEA3 GAGCTCCAGGAACAAGGCAGTGAGGACTTGGTCTGMAGEA6 GAGCTCCAGGAACAAGGCAGTGAGGACTTGGTCTGMAGEA2 GAGCTCCAGGAACCAGGCAGTGAGGCCTTGGTCTGMAGEA12 GAGTTCCAAGAACAAGGCAGTGAGGCCTTGGTCTGMAGEA1 GAGCTCCAGGAACCAGGCAGTGAGGCCTTGGTCTGMAGEA4 GAGCTCCAGGAACAAGGCAGTGAGGCCTTGGTCTGMAGEA5 GAGCTCCAGGAAACAGACACTGAGGCCTTGGTCTGMAGEA8 GAGCTCCAGGAACCAGGCTGTGAGGTCTTGGTCTGMAGEA9 GAGCTCCAGGAA----GCAGGCAGGCCTTGGTCTGMAGEA10 GAGCTCCAGGGACTGTGAGGTGAGGCCTTGGTCTAMAGEA11 AAGCTCCAAAAACTGAGCAGTGAGGCCTTGGTCTC

Birth of an alternative acceptor site (enhanced polyY tract in cassette exon 4)

Ancestral gene: AGGGGCCCCCATGTGGTCGACAGACACAGTGGMAGEA3 AGGGGCCCCTATGTGGTGGACAGATGCAGTGGMAGEA6 AGGGGCCCCTATGTGGTGGACAGATGCAGTGGMAGEA2 AGGGGCCCCCATCTGGTCGACAGATGCAGTGGMAGEA12 AGGGGCCCCCATGTAGTCGACAGACACAGTGGMAGEA1 AGGGACCCCCATCTGGTCTAAAGACAGAGCGGMAGEA4 AGGGACCCCCATCTGGTCTACAGACACAGTGGMAGEA5 AGGGGCCCCCATCTGGTGGATAGACAGAGTGGMAGEA8 AGGGACCCCCATGTGGGCAACAGACTCAGTGGMAGEA9 AGGGAGGCCC-TGTGTTCGACAGACACAGTGGMAGEA10 AGGGAACCCC-TCTTTTCTACAGACACAGTGGMAGEA11 AAAGAGCCCCATATGGTCCACAACTACAGTGG

• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse– human SNPs

• Alternative splicing and protein structure

Concatenates of constitutive and alternative regions in all genes: different evolutionary rates

Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end

0,176

0,1990,187

0,301

0,00

0,10

0,20

0,30

Constitutive N-endalternative

Internalalternative

C-endalternative

dN/d

S

0,8860,874 0,878

0,807

0,7

0,8

0,9

Constitutive N-endalternative

Internalalternative

C-endalternative

Amin

o-ac

id id

entit

y• Relatively more non-synonimous

substitutions in alternative regions (higher dN/dS ratio)

• Less amino acid identity in alternative regions

Individual genes: the rate of non-synonymous to synonymous substitutions dn/ds tends to be larger

in alternative regions (vertical acis) than in constitutive regions (horizontal acis)

0 .0 0 1 0 .0 1 0 .1 1 1 0

0 .0 0 1

0 .0 1

0 .1

1

1 0

С

A

dn/ds (con) – dn/ds (alt)

N-terminal regions

complete genes

internal regions

C-terminal regions

• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse

– human SNPs• Alternative splicing and protein structure

Na/Ns (alternative) > Na/Ns (constitutive)for all evidence levels

0,7

0,8

0,9

1

1,1

1,2

1,3

1,4

EST-1 EST-2 EST-3 EST-4 EST-5 mRNA protein

Na/

Ns

const

alt

average(Zhaoet al.)

• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse

– human SNPs

• Alternative splicing and protein structure

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

Alternative splicing avoids disrupting domains (and non-domain units)

Control:

fix the domain structure; randomly place alternative regions

… and this is not simply a consequence of the (disputed) exon-domain correlation

0

1

Ra

tio

(ob

serv

ered

/ex

pec

ted

)

Mouse Human Mouse Human Mouse Human

nonAS_Exons AS_Exons AS

AS&Exon boundaries and SMART domains

inside domains

outside domains

Positive selection towards domain shuffling (not simply avoidance of disrupting domains)

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

b)

Domains completely

Non-domain units

completely

No annotated

units affected

Expected Observed

Short (<50 aa) alternative splicing events within domains target protein functional sites

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

c)

Prosite

patterns

unaffected

Prosite

patterns

affected

FT

positions

unaffected

FT

positions

affected

Expected Observed

An attempt of integration• AS is often young (as opposed to degenerating)• young AS isoforms are often minor and tissue-specific• … but still functional

– although unique isoforms may be result of aberrant splicing

• AS often arises from duplication of exons• … or point mutations creating splicing sites• … or intron insertions• AS regions show evidence for positive selection

– excess non-synonymous and damaging SNPs– excess non-synonymous codon substitutions

• AS tends to shuffle exons and target functional sites in proteins

• Thus AS may serve as a testing ground for new functions without sacrificing old ones

Acknowledgements

• Discussions– Vsevolod Makeev (GosNIIGenetika)– Eugene Koonin (NCBI)– Igor Rogozin (NCBI)– Dmitry Petrov (Stanford)– Dmitry Frishman (GSF, TUM)

• Data– King Jordan (NCBI)

• Support– Ludwig Institute of Cancer Research– Howard Hughes Medical Institute– Russian Academy of Sciences

(program “Molecular and Cellular Biology”)– Russian Fund of Basic Research

Authors

• Andrei Mironov (Moscow State University) – spliced alignment• Ramil Nurtdinov (Moscow State University) – human/mouse,

data• Irena Artamonova (GSF/MIPS) – human/mouse, MAGE-A• Dmitry Malko (GosNIIGenetika, Moscow) –

mosquito/drosophila• Ekaterina Ermakova (Moscow State University) –

evolution of alternative/constitutive regions• Vasily Ramensky (Institute of Molecular Biology, Moscow) –

SNPs• Shamil Sunyaev (EMBL, now Harvard University Medical

School) – protein structure • Eugenia Kriventseva (EBI, now EMBL) – protein structure

Recommended