28
A Non-EST-Based Method for Exon-Skipping P rediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August 2004 楊楊楊

A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Embed Size (px)

Citation preview

Page 1: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

A Non-EST-Based Method for Exon-Skipping Prediction

Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir

Genome Research August 2004

楊佳熒

Page 2: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Homologous human and mouse exon are, on the average, 85% identical intheir sequences, but introns are more pooly conserved. (Waterston et al. Nature,2002)

Segments and blocks >300kbin size with conserved in humanare superimposed on the mousegenome

Page 3: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Reference

• Sorek, R. et al. Intronic Sequences Flanking Alternatively Spliced Exons Are Conserved Between Human and Mouse. Genome Research, 2003.

• Sorek, R. et al. How prevalent is functional alternative splicing in the human genome. TRENDS in Genetics, 2004.

• Sorek, R. et al. A Non-EST-Based Method for Exon-Skipping Prediction. Genome Research, 2004.

Page 4: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

What is Exon-Skipping ?

dbESTs

exon1 exon2 exon3 exon4 exon5 exon6gene

est2

est3

est4

est1

Page 5: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Intronic Sequences Flanking Alternatively Spliced Exons Are Conserved Between Human and Mouse

Rotem Sorek and Gil Ast

Genome Research July 2003

Page 6: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Objective and Result

1. Alternatively spliced conserved exons

2. Constitutively spliced conserved exons

exon1 exon2 exon3Human est1

Human est2

Alternatively spliced internal exons

Mouse est1

Mouse est2

Alternatively spliced conserved exons3583

243

Human gene

Human gene

Human est1

exon1 exon2 exon3

Human est2

Human est3

Human est4

Constitutively spliced internal exons7557

Mouse est

Constitutively spliced conserved exons1966

Mouse gene exon1 exon2 exon3

Mouse gene exon1 exon2 exon3

A1 B1

A2 B2

D1C1

C2 D2

223/243=92% 199/243=82% 188/243=77%

886/1966=45% 691/1966=35% 343/1966=17%

Page 7: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Per-position conservation near alternatively and constitutively spliced exons

Page 8: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

<Example> Human KCND3 gene (exon 4~8) Refseq:NM_004980

Page 9: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

KCDN3 gene exon information

Page 10: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

KCDN3 gene exon 6 sequences (bold)(alternatively spliced exon)

Page 11: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Compare to chimpanzee genome (NM_004980)

Page 12: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Compare to chimpanzee genome (NM_172198)

Page 13: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Review : Finding exon-skipping events that are conserved between human and mouse

243 Conserved exon skipping events (25%)737(980-243) Non-Conserved exon skipping events(75%)

Page 14: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

How prevalent is functional alternative splicing in the human genome ?

Rotem Sorek, Ron Shamir and Gil Ast

TRENDS in Genetics Vo1.20 February 2004

Page 15: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Motivation

1. How many of there predicted splice variants are functional?

2. How many are the result of aberrant splicing (noise data)?

Page 16: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

The influence of alternatively spliced exon on the protein-coding sequence.

are peptide cassettess139

73%191

are peptide cassettess109

21%510

Page 17: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Features differentiating between conserved alternatively spliced exons and non-conserved alternatively spliced exons

Features Conserved alternatively spliced exons

Non-conserved alternatively spliced exons

Average size 87 116

Percentage of exon that a multiple of three

77%(147/191) 40%(206/510)

Percentage of exons that are “peptide cassettes”

73%(139/191) 21%(109/510)

Percentage of exon insertion that result in a longer protein by a nearby stop codon

61%(27/44) 8%(25/304)

Percentage of exon insertions that result in a protein <100 amino acids

9%(4/44) 30%(91/304)

Average supporting expressed sequences

9 2.2

30% 62%

Page 18: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Conclusion

1. We show that conserved (functional) cassette exons possess unique characteristics in size, repeat content and in their influence on the protein.

2. By contrast, most non-conserved cassette exons do not share these characteristics.

3. We conclude that a portion of skipping exon evidence in EST databases is not functional, and might result from aberrant rather than regulated splicing.

Page 19: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Review : Intronic Sequences Flanking Alternatively Spliced Exons Are Conserved Between Human and Mouse

1. Alternatively spliced conserved exons

2. Constitutively spliced conserved exons

exon1 exon2 exon3Human est1

Human est2

Alternatively spliced internal exons

Mouse est1

Mouse est2

Alternatively spliced conserved exons3583

243

Human gene

Human gene

Human est1

exon1 exon2 exon3

Human est2

Human est3

Human est4

Constitutively spliced internal exons7557

Mouse est

Constitutively spliced conserved exons1966

Mouse gene exon1 exon2 exon3

Mouse gene exon1 exon2 exon3

A1 B1

A2 B2

D1C1

C2 D2

223/243=92% 199/243=82% 188/243=77%

886/1966=45% 691/1966=35% 343/1966=17%

Page 20: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Review : Features Differentiating Between Alternatively Spliced and Constitutively Spliced Exons

Alternatively spliced exons

Constitutively spliced exons

Average size 87 128

Percent exons whose length is a multiple of 3

73%(177/243) 37%(642/1753)

Percent exons with upstream intronic elements conserved in mouse

92%(223/243) 45%(788/1753)

Pervent exons with downstream intronic elements conserved in mouse

82%(199/243) 35%(611/1753)

Percent exons with both upstream and downstream intronic elements conserverd in mouse

77%(188/243) 17%(292/1753)

Page 21: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

A Non-EST-Based Method for Exon-Skipping Prediction

Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir

Genome Research August 2004

Page 22: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Objective

1. Our goal was to find a combination of features that would detect a substantial fraction of the alternative exons.

2. The features we have chosen are the following : 1) exon length

2) divisible / not divisible by 3

3) percent identity when aligned to the mouse

4) conservation in the upstream and downstream intronic sequences

Page 23: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Result

1. The best rule is : 1) at least 95% identity with mouse exon counterpart

2) exon size is a multiple of three

3) a best local alignment of at least 15 intronic nucleotides upstream of the exon with at least 85% identity

4) a perfect match of at least 12 intronic nucleotides downstream of the exon

2. The combination of features identified 76 exons, 31% of the 243 alternatively spliced exons in the training sets, whether non of 1753 constitutively spliced exons matched these features.

Page 24: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

To test this classifier in a genome-wide manner (cont.)

1. For 453(48%) of the 952 candidate alternative exon there was such skipping evidence.2. Only(17%) of the 453 exons that were classified by our rule had their exon-skipping supported by only one EST.3. The rest were supported by two or more.

108,983 human exons for which a mouse counterpartcould be identified

using these rules

108,983

952 candidate exon, ~1%, were found.

Page 25: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

To test this classifier in a genome-wide manner (cont.)

1. In comparison, skipping was supported by only a single EST in 46% of the total 7495 exons.2. This suggests that our classification rule enriches for alternatively spliced exons with higher probability of being “real” relative to alternative exons merely supported by EST evidence.

108,983 human exons for which a mouse counterpartcould be identified

search ESTs and cDNA

108,983

7% (7495 exons) out ofour entire set

Page 26: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

To test this classifier in a genome-wide manner

1. The remaining 499 candidate alternative exons (952-453) for which no EST/cDNA showing an exon skipping event was found.

2. Using the UCSC genome browser to check, we found that for 190 additional exons there was a human expressed sequence showing patterns of alternative splicing other than exon skipping cases.

1) Alternative donor/acceptor 22%

2) Intron retention17%

3) Mutually exclusive exon 7%

3. Thus, for 643(453+190 ; 68%) of the 952 candidate alternative exons identified by this method, there was independent evidence for alternative splicing in dbEST.

Page 27: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Conclusion

1. We show that a substantial fraction of the splice variants in the human genome could not be identified through current human EST or cDNA data.

2. In the future, we hope it could develop into a more general alternative splicing predictor that would identify other types of alternative splicing.

Page 28: A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August

Classification of alternative splicing

1. Skipped Exons

2. Multiple Skipped Exons

3. Alternative Donor / Acceptors

4. Retained Introns