Upload
marilynn-barker
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Alternative Splicing(a review by Liliana Florea, 2005)
CS 498 SS
Saurabh Sinha
11/30/06
What is alternative splicing?
• The first result of transcription is “pre-mRNA”• This undergoes “splicing”, i.e., introns are
excised out, and exons remain, to form mRNA
• This splicing process may involve different combinations of exons, leading to different mRNAs, and different proteins
• This is alternative splicing
Alternative splicing
• Important regulatory mechanism, for modulating gene and protein content in the cell
• Large-scale genomic data today suggests that as many as 60% of the human genes undergo alternative splicing
Significance
• Number of human genes has recently been estimated to be about 20-25 K.
• Not significantly greater than much less complex organisms
• Alternative splicing is a potential explanation of how a large variety of proteins can be achieves with a small number of genes
• Errors in splicing mechanism implicated in diseases such as cancers
What happens in alternative splicing?
• Different combinations of exons within a gene are spliced from the RNA precursor, to be included in mRNA
• The combination depends on tissue type, developmental stage, disease etc.
• Thus different proteins in these different conditions
• Different types of alternative splicing on next slide
http://bib.oxfordjournals.org/cgi/content/full/7/1/55/F1
exon inclusion/exclusion
alternative 5’ exon
alternative 3’ exon
intron retention
5’ alternative UTR
3’ alternative UTR
Bioinformatics of Alt. splicing
• Two main goals:– Find out cases of alt. splicing
• What are the different forms (“isoforms”) of a gene?
– Find out how alt. splicing is regulated• What are the sequence motifs controlling alt.
splicing, and deciding which isoform will be produced
Identification of splice variants
• All cells have same genome• But all cells don’t have the same
“transcriptome” (i.e., transcripts)– Different cells may express different
(alternative) transcripts of the same gene
• Goal of bioinformatics is to find “splice forms”, i.e., what are the alternative splicing events?
Identification of splice variants
• Direct comparison between sequences of different cDNA isoforms – Q: What is cDNA? How is this different from a
gene’s DNA?– cDNA is “complementary DNA”, obtained by
reverse transcription from mRNA. It has no introns
• Direct comparison reveals differences in the isoforms
• But this difference could be part of an exon, a whole exon, or a set of exons
Copyright restrictions may apply.
Florea, L. Brief Bioinform 2006 7:55-69; doi:10.1093/bib/bbk005
Bioinformatics methods for identifying alternative splicing
directcomparison
Identification of splice variants
• Comparison of exon-intron structures (the gene’s architecture)
• Where do the exon-intron structures come from?– Align cDNA (no introns) with genomic
sequence (with introns)– This gives us the intron and exon structure
Copyright restrictions may apply.
Florea, L. Brief Bioinform 2006 7:55-69; doi:10.1093/bib/bbk005
Bioinformatics methods for identifying alternative splicing
comparisonof exon-intronstructures
Identification of splice variants
• Alignment tools.• Align cDNA sequence to genomic sequence• Why shouldn’t this be a perfect match with
gaps (introns)?– Sequencing errors, polymorphisms, etc.
• Special purpose alignment programs for this purpose
Identifying full lengh alt. spliced transcripts
• Previous methods identified parts of alt. spliced transcript
• Much more difficult to identify full length alternatively spliced transcripts
• Such methods include “gene indices”
Gene indices
• Compare all EST sequences against one another
• Identify significant overlaps
• Group and assemble sequences with compatible overlaps into clusters
Gene indices
Problems with gene indices
• Overclustering: paralogs may get clustered together.– What are paralogs? – Related but distinct genes in the same species
• Underclustering: if number of ESTs is not sufficient
• Computationally expensive:– Quadratic time complexity
Splice graphs
• Nodes: Exons
• Edges: Introns
• Gene: directed acyclic graph
• Each path in this DAG is an alternative transcript
Splice graph
Splice graphs
• Combinatorially generate all possible alt. transcripts
• But not all such transcripts are going to be present
• Need scores for candidate transcripts, in order to differentiate between the biologically relevant ones and the artifactual ones
Splice variants from microarray data
• Affymetrix GeneChip technology uses 22 probes collected from exons or straddling exon boundaries
• When an exon is alternatively spliced, expression level of its probes will be different in different experiments
Copyright restrictions may apply.
Florea, L. Brief Bioinform 2006 7:55-69; doi:10.1093/bib/bbk005
Bioinformatics methods for identifying alternative splicing
splice variantsfrom microarray data
Part 2:Regulation of
alternative splicing
Biological mechanism
• Splicing of pre-mRNA is a complex cellular process
• “Spliceosome” is a complex of several molecules that assembles onto each intron and catalyzes the excision of the intron
• Splice sites (5’ or donor splice site and 3’ or acceptor splice site) play a major role in splicing
• More sites, apart from the splice signals, in introns and exons, contribute to splicing
Biological mechanism
• Cis-regulatory elements (again !)
• Promote (“splicing enhancers”) or repress (“splicing silencers”) the inclusion of the exon in the mRNA
• Can be located in exons or introns
Bioinformatics methods
• Goal: find the cis-regulatory elements that mediate splicing (alternative splicing)
• Early work: find consensus sequences (motifs) of splicing enhancers
• More advanced work: Position weight matrices (PWMs)
Copyright restrictions may apply.
Florea, L. Brief Bioinform 2006 7:55-69; doi:10.1093/bib/bbk005
Bioinformatics representations of splicing regulatory motifs: (a) consensus sequence and (b) position weight matrix (PWM)
Motif finding (again !)
• Statistical overrepresentation• Find k-mers that occur more often in one class of
sequences than in another;• Should be statistically significant• Exonic splicing enhancers (ESE) are more likely to
occur in exons than in introns; hence find 6-mers (k=6) statistically overrepresented in exons compared to introns
• Calculate z-score of count– (Count - mean)/(standard deviation)– Homework 1
Motif finding
• Other standard approaches of motif finding also adopted:– MEME & Gibbs sampling
• Comparative genomics– Find conserved sites in introns– Find conserved sites in exons. This has to
be done carefully. Because exons already have selective pressure.
Summary
• Alternative splicing is very important
• Bioinformatics for finding alternative spliced forms
• Bioinformatics for finding regulatory mechanisms