160
GENE Exon 1 Intron Exon 3 Intron Exon 4 Exon 2 Intron Promoter Enhancer mRNA transcript Exon 1 Intron Exon 3 Intron Exon 4 Exon 2 Intron 5’-untranslated region 5’ 3’ Poly(A) signal 3’-untranslated region Mataure mRNA Transcription Processing The Organization of an Eukaryotic Gene Exon 1 Exon 3 Exon 4 Exon 2 3’ (AAAAAA)n 7-mG cap start stop 5’

GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Embed Size (px)

Citation preview

Page 1: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

GENE

Exon 1 Intron Exon 3 Intron Exon 4Exon 2IntronPromoterEnhancer

mRNA transcript

Exon 1 Intron Exon 3 Intron Exon 4Exon 2Intron5’-untranslated

region

5’ 3’

Poly(A) signal

3’-untranslatedregion

Mataure mRNA

Transcription

Processing

The Organization of an Eukaryotic Gene

Exon 1 Exon 3 Exon 4Exon 23’(AAAAAA)n7-mG cap

start stop

5’

Page 2: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Find non-coding features of interest in the sequence

Gene identification involves Gene identification involves 4 main stages4 main stages

Determine the exon-intron organization

Identify the gene

Find the putative coding region(s) in the sequence

motif, signal and patternBlast, FASTAFunctional studies

CpG islandsTandemly and dispersed repeatsPromoter regions (TATA box, cap signal,CCAAT-box)Transcription factors, Poly-A sites

Branch point signalCT(G,A)A(C,T)

5’ and 3’ splice sites: AG/GUAAGU--------------PyPyPyPyPyPyPyPy-CAG/G

Open reading frame

Page 3: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Banbury Cross http://igs-server.cnrs-mrs.fr/igs/banburyFGENEH http://genomic.sanger.ac.uk/gf/gf.shtmlGeneID http://www1.imim.es/geneid.htmlGeneMachine http://genome.nhgri.nih.gov/genemachineGeneParser http://beagle.colorado.edu/_eesnyder/GeneParser.htlGENSCAN http://genes.mit.edu/GENSCAN.htmlGenotator http://www.fruitfly.org/_nomi/genotator/GRAIL http://compbio.ornl.gov/tools/index.shtmlGRAIL-EXP http://compbio.ornl.gov/grailexp/HMMgene http://www.cbs.dtu.dk/services/HMMgene/MZEF http://www.cshl.org/genefinderPROCRUSTES http://www-hto.usc.edu/software/procrustesRepeatMasker http://ftp.genome.washington.edu/RM/RepeatMasker.htmlSputnik http://rast.abajian.com/sputnik/

GENE FINDERS

Page 4: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

GENSCAN Web Server at MIT \\|// (o o)-. .-. .-oOOo~(_)~oOOo-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X||' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-

Gene prediction programs:

Page 5: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Accuracy per nucleotide Accuracy per exonMethod Sn Sp AC Sn Sp (Sn+Sp)

ME WE/2

GENSCAN 0.93 0.93 0.91 0.78 0.81 0.80 0.09 0.05FGENEH 0.77 0.85 0.78 0.61 0.61 0.61 0.15 0.11 GeneID 0.63 0.81 0.67 0.44 0.45 0.45 0.28

0.24 GenePa2 0.66 0.79 0.66 0.35 0.39 0.37 0.29 0.17 GenLang 0.72 0.75 0.69 0.50 0.49 0.50 0.21 0.21 GRAILII 0.72 0.84 0.75 0.36 0.41 0.38 0.25 0.10 SORFIND 0.71 0.85 0.73 0.42 0.47 0.45 0.24 0.14 Xpound 0.61 0.82 0.68 0.15 0.17 0.16 0.32

0.13

GENSCAN Performance Data

Page 6: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Length Annotated exons Predicted exonsrange (bp) No. %Exact %Part %Miss No. %Exact %Part %Wrong

<= 24 89 38 8 52 44 77 11 1125 - 49 163 58 15 25 124 76 6 18 50 - 74 248 70 12 16 204 85 9 6 75 - 99 382 85 8 6 389 84 6 10100 - 124 351 84 9 7 366 81 8 11125 - 149 425 88 8 4 460 81 10 7150 - 174 261 88 9 2 283 81 11 7175 - 199 167 91 7 2 188 81 12 7200 - 299 353 90 8 1 390 82 8 8>= 300 211 66 19 1 204 69 20 10

Total 2650 81 10 8 2678 81 10 9

Accuracy as a Function of Exon Length

Page 7: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

cDNA and genomic DNA alignment and matrix analysis:

GRAIL 210138 - 11018 +12608 - 12748 x13530 - 13923 x

GENSCAN10138 - 11018 +11268 - 11341 +11450 - 11518 +11644 - 11808 +11989 - 12144 +12360 - 12454 x12608 - 12748 x

FGENES1880 - 1908 x5061 - 5175 x5900 - 6049 x8317 - 8544 +10357 - 11018 +11268 - 11341 +11450 - 11518 +11644 - 11864 +polyA: 12521 +

Page 8: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

What to do next?The predictions by these programs is just that: a prediction.

NEVER TRUST A COMPUTER!

Page 9: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Automatic sequencer

Page 10: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 11: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 12: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

One gene --

one promoter, one transcript, one protein.

Gene structure --

promoter ; exons ; introns

Page 13: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

DNA

RNA

Protein

Page 14: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

GENE

Exon 1 Intron Exon 3 Intron Exon 4Exon 2IntronPromoterEnhancer

mRNA transcript

Exon 1 Intron Exon 3 Intron Exon 4Exon 2Intron5’-untranslated

region

5’ 3’

Poly(A) signal

3’-untranslatedregion

Mataure mRNA

Transcription

Processing

The Organization of an Eukaryotic Gene

Exon 1 Exon 3 Exon 4Exon 23’(AAAAAA)n7-mG cap

start stop

5’

Page 15: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Simple Mathematics:

Human Genome

3 x 10 9 bps

Human Genes (1.5% of the genome)

40,000 genesIn a given cell type at a certain stage, it is estimated that around 25 to 50 % of the genes are transcribed or expressed.

10,000 to 20,000 genes

Page 16: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

40,000 x 35% x 5~10 splicing=70,000 ~ 140,000

+

40,000 x 65% =26,000

96,000 ~ 166,000

Page 17: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

The subset of genes expressed in a given cell or tissue type such as the prostate may be defined as the transcriptome, the dynamic link between the genome, the proteome, and the cellular phenotype associated with physical characteristics.

Page 18: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Genome: DNA Sequence and Genes• SNPs• Splicing variants

Transcriptome: Entire mRNA Complement• Spatial/Temporal Expression• Aberrant expression profiles

Proteomics: Entire Protein Complement• Functional proteomics: profiling• Structural proteomics: 3-D structure• Protein interactions: genetic networks

Page 19: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Unknown sequence (http://www.wiley.com/legacy/products/subject/life/bioinformatics/questions_10.html)ATGGAGAATAGTCTTAGATGTGTTTGGGTACCCAAGCTGGCTTTTGTACTCTTCGGAGCTTCCTTGCTCA GCGCGCATCTTCAAGTAACCGGTTTTCAAATTAAAGCTTTCACAGCACTGCGCTTCCTCTCAGAACCTTCTGATGCCGTCACAATGCGGGGAGGAAATGTCCTCCTCGACTGCTCCGCGGAGTCCGACCGAGGAGTTCCAGTGATCAAGTGGAAGAAAGATGGCATTCATCTGGCCTTGGGAATGGATGAAAGGAAGCAGCAACTTTCAAATGGGTCTCTGCTGATACAAAACATACTTCATTCCAGACACCACAAGCCAGATGAGGGACTTTACCAATGTGAGGCATCTTTAGGAGATTCTGGCTCAATTATTAGTCGGACAGCAAAAGTTGCAGTAGCAGGACCACTGAGGTTCCTTTCACAGACAGAATCTGTCACAGCCTTCATGGGAGACACAGTGCTACTCAAGTGTGAAGTCATTGGGGAGCCCATGCCAACAATCCACTGGCAGAAGAACCAACAAGACCTGACTCCAATCCCAGGTGACTCCCGAGTGGTGGTCTTGCCCTCTGGAGCATTGCAGATCAGCCGACTCCAACCGGGGGACATTGGAATTTACCGATGCTCAGCTCGAAATCCAGCCAGCTCAAGAACAGGAAATGAAGCAGAAGTCAGAATTTTATCAGATCCAGGACTGCATAGACAGCTGTATTTTCTGCAAAGACCATCCAATGTAGTAGCCATTGAAGGAAAAGATGCTGTCCTGGAATGTTGTGTTTCTGGCTATCCTCCACCAAGTTTTACCTGGTTACGAGGCGAGGAAGTCATCCAACTCAGGTCTAAAAAGTATTCTTTATTGGGTGGAAGCAACTTGCTTATCTCCAATGTGACAGATGATGACAGTGGAATGTATACCTGTGTTGTCACATATAAAAATGAGAATATTAGTGCCTCTGCAGAGCTCACAGTCTTGGTTCCGCCATGGTTTTTAAATCATCCTTCCAACCTGTATGCCTATGAAAGCATGGATATTGAGTTTGAATGTACAGTCTCTGGAAAGCCTGTGCCCACTGTGAATTGGATGAAGAATGGAGATGTGGTCATTCCTAGTGATTATTTTCAGATAGTGGGAGGAAGCAACTTACGGATACTTGGGGTGGTGAAGTCAGATGAAGGCTTTTATCAATGTGTGGCTGAAAATGAGGCTGGAAATGCCCAGACCAGTGCACAGCTCATTGTCCCTAAGCCTGCAATCCCAAGCTCCAGTGTCCTCCCTTCGGCTCCCAGAGATGTGGTCCCTGTCTTGGTTTCCAGCCGATTTGTCCGTCTCAGCTGGCGCCCACCTGCAGAAGCGAAAGGGAACATTCAAACTTTCACGGTCTTTTTCTCCAGAGAAGGTGACAACAGGGAACGAGCATTGAATACAACACAGCCTGGGTCCCTTCAGCTCACTGTGGGAAACCTGAAGCCAGAAGCCATGTACACCTTTCGAGTTGTGGCTTACAATGAATGGGGACCGGGAGAGAGTTCTCAACCCATCAAGGTGGCCACACAGCCTGAGTTGCAAGTTCCAGGGCCAGTAGAAAACCTGCAAGCTGTATCTACCTCACCTACCTCAATTCTTATTACCTGGGAACCCCCTGCCTATGCAAACGGTCCAGTCCAAGGTTACAGATTGTTCTGCACTGAGGTGTCCACAGGAAAAGAACAGAATATAGAGGTTGATGGACTATCTTATAAACTGGAAGGCCTGAAAAAATTCACCGAATATAGTCTTCGATTCTTAGCTTATAATCGCTATGGTCCGGGCGTCTCTACTGATGATATAACAGTGGTTACACTTTCTGACGTGCCAAGTGCCCCGCCTCAGAACGTCTCCCTGGAAGTGGTCAATTCAAGAAGTATCAAAGTTAGCTGGCTGCCTCCTCCATCAGGAACACAAAATGGATTTATTACCGGCTATAAAATTCGACACAGAAAGACGACCCGCAGGGGTGAGATGGAAACACTGGAGCCAAACAACCTCTGGTACCTATTCACAGGACTGGAGAAAGGAAGTCAGTACAGTTTCCAGGTGTCAGCCATGACA

Page 20: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Find non-coding features of interest in the sequence

Gene identification involves Gene identification involves 4 main stages4 main stages

Determine the exon-intron organization

Identify the gene

Find the putative coding region(s) in the sequence

motif, signal and patternBlast, FASTAFunctional studies

CpG islandsTandemly and dispersed repeatsPromoter regions (TATA box, cap signal,CCAAT-box)Transcription factors, Poly-A sites

Branch point signalCT(G,A)A(C,T)

5’ and 3’ splice sites: AG/GUAAGU--------------PyPyPyPyPyPyPyPy-CAG/G

Open reading frame

Page 21: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 22: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 23: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 24: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 25: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 26: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 27: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 28: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 29: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 30: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 31: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 32: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 33: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 34: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 35: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 36: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 37: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Find non-coding features of interest in the sequence

Gene identification involves Gene identification involves 4 main stages4 main stages

Determine the exon-intron organization

Identify the gene

Find the putative coding region(s) in the sequence

motif, signal and patternBlast, FASTAFunctional studies

CpG islandsTandemly and dispersed repeatsPromoter regions (TATA box, cap signal,CCAAT-box)Transcription factors, Poly-A sites

Branch point signalCT(G,A)A(C,T)

5’ and 3’ splice sites: AG/GUAAGU--------------PyPyPyPyPyPyPyPy-CAG/G

Open reading frame

Page 38: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 39: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 40: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 41: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 42: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

TATA box

Page 43: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Page 44: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Page 45: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Page 46: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://sullivan.bu.edu/~mfrith/HPD.html

Page 47: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://www.epd.isb-sib.ch/

Page 48: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://transfac.gbf.de/

Page 49: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 50: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 51: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 52: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 53: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 54: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 55: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 56: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 57: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 58: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 59: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 60: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

40,000 x 35% x 5~10 splicing=70,000 ~ 140,000

+

40,000 x 65% =26,000

96,000 ~ 166,000

Page 61: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Page 62: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Page 63: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Page 64: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Page 65: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://binfo.ym.edu.tw/passdb/index.html

Page 66: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 67: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 68: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 69: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 70: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://cgsigma.cshl.org/new_alt_exon_db2/

Page 71: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://www.bioinformatics.ucla.edu/HASDB/

Page 72: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 73: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 74: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 75: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 76: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Find non-coding features of interest in the sequence

Gene identification involves Gene identification involves 4 main stages4 main stages

Determine the exon-intron organization

Identify the gene

Find the putative coding region(s) in the sequence

motif, signal and patternBlast, FASTAFunctional studies

CpG islandsTandemly and dispersed repeatsPromoter regions (TATA box, cap signal,CCAAT-box)Transcription factors, Poly-A sites

Branch point signalCT(G,A)A(C,T)

5’ and 3’ splice sites: AG/GUAAGU--------------PyPyPyPyPyPyPyPy-CAG/G

Open reading frame

Page 77: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

“Briefly, gene-finding strategies can be grouped into three major categories. Content-based methods rely on the overall, bulk properties of a sequence in making a determination. Characteristics considered here include how often particular codons are used, the periodicity of repeats, and the compositional complexity of the sequence. Because different organisms use synonymous codons with different frequency, such clues can provide insight into determining regions that are more likely to be exons. In site-based methods, the focus turns to the presence or absence of a specific sequence, pattern, or consensus. These methods are used to detect features such as donor and acceptor splice sites, binding sites for transcription factors, polyA tracts, and start and stop codons. Finally, comparative methods make determinations based on sequence homology. Here, translated sequences are subjected to database searches against protein sequences (cf. Chapter 8) to determine whether a previously characterized coding region corresponds to a region in the query sequence.”

Page 78: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Ab Initio Gene Discovery

Protein coding sequences within a whole genome sequence can be identified using a process known as ab initio gene discovery, in which software that recognizes features common to protein coding transcripts is used.– The existence of long open reading frames– Particularly ones for which the codon bias is typical of that

observed for the species being studied– Proximity of transcriptional and Translational initiation motifs and

3’ polyadenylation site– Splicing consensus sequences at putative intron-exon boundaries

Page 79: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Hidden Markov Models

An example of the HMM

TTACTTGACGCCAGAAATCTATATTTGGTAACCCGACGCTAA

NNNNNNNNNRRRRRNNNNNNNNNNNNNNNNNRRRRRRRRNNNNormal region GC-rich region

A T G C Mean length

-----------------------------

N 0.3 0.3 0.2 0.2 10

R 0.1 0.1 0.4 0.4 5

Page 80: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Hidden Markov Models (cont.)

Pr(sequence)=Pr(sequence|hidden states)Pr(hidden states)

Pr(TGCC)=Pr(TGCC|NNNN) Pr(NNNN)+Pr(TGCC|NNNR) Pr(NNNR)+Pr(TGCC|NNRN) Pr(NNRN)+Pr(TGCC|NRNN) Pr(NRNN)+Pr(TGCC|NNRR) Pr(NNRR)+Pr(TGCC|NRNR) Pr(NRNR)+Pr(TGCC|NRRN) Pr(NRRN)+Pr(TGCC|NRRR) Pr(NRRR)

Pr(TGCC|NNNN) Pr(NNNN)

=Pr(T|N)Pr(G|N)Pr(C|N)Pr(C|N)×Pr(N-N)Pr(N-N)Pr(N-N)

=(0.3×0.2×0.2×0.2) × (0.9× 0.9× 0.9)=0.00175

Page 81: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 82: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://genes.mit.edu/GENSCAN.html

Page 83: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 84: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 85: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 86: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 87: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 88: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 89: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 90: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 91: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 92: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 93: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://compbio.ornl.gov/grailexp/

Page 94: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 95: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 96: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 97: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 98: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 99: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 100: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 101: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Predicts GenScan GRAIL MZEF HMMgen

1 10138-11018

10138-11022

11464-11518

10138-11018

2 11268-11341

12608-12711

12024-12079

11268-11341

3 11450-11518

13530-13923

13530-13892

11450-11518

4 11644-11808

15698-15771

14980-15052

11644-11808

5 11989-12144

16358-16532

16358-16451

15002-15052

Page 102: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

cDNA and genomic DNA alignment and matrix analysis:

GRAIL 210138 - 11018 +12608 - 12748 x13530 - 13923 x

GENSCAN10138 - 11018 +11268 - 11341 +11450 - 11518 +11644 - 11808 +11989 - 12144 +12360 - 12454 x12608 - 12748 x

FGENES1880 - 1908 x5061 - 5175 x5900 - 6049 x8317 - 8544 +10357 - 11018 +11268 - 11341 +11450 - 11518 +11644 - 11864 +polyA: 12521 +

Page 103: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://research.nhgri.nih.gov/genemachine/

Page 104: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 105: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

http://genome.ucsc.edu/index.html

Page 106: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 107: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 108: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Sequence manipulationORF Searching

Mapping (restriction sites)

Mapping (transcription factors)

ReverseFramesMapTranslateMap (-minc)(-maxc)Mapsort(-exclude)(-digest)Mapplot

Map tfsites

+++++++++++

+

GCG SeqWEBFunction Command

++++++++--+

-

Page 109: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Programs used in this exercise:(1) Sequence manipulation – reverse(3)  ORF Searching – frames , map , translate(4)  Mapping (restriction sites) – map (-minc, -maxc), mapsort(-exclude, -digest), mapplot, plasmidmap(5)  Mapping (transcription factor) – map(tfsites).

Sequences used in this exercise:gb:z18853 (C.elegans mRNA for capping protein alpha subunit.)

cds:10-858gb:x03795 (Human mRNA for platelet derived growth factor A-chain, PDGF-A)

cds:388-1020.

Exercise89-10

Page 110: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 111: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

cDNA and genomic DNA alignment and matrix analysis:

GRAIL 210138 - 11018 +12608 - 12748 x13530 - 13923 x

GENSCAN10138 - 11018 +11268 - 11341 +11450 - 11518 +11644 - 11808 +11989 - 12144 +12360 - 12454 x12608 - 12748 x

FGENES1880 - 1908 x5061 - 5175 x5900 - 6049 x8317 - 8544 +10357 - 11018 +11268 - 11341 +11450 - 11518 +11644 - 11864 +polyA: 12521 +

Page 112: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Gene Expression Studies

Page 113: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

EST:Expressed Sequences TagsdbEST is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms.

Page 114: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

In silico cloning:In order to perform an electronic cDNA library screen, the EST

sequences retrieved in this way can be used as queries in a BLASTN search of dbEST to identify over-lapping ESTs. This procedure can be reiterated with the newly identified ESTs until no additional hits are found. The ESTs isolated can be assembled into sequence contigs

using computer softwares, such as UniGene.

Query

EST 2EST 1EST 3

Full length mRNA sequences

Page 115: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

In silico cloning:In order to perform an electronic cDNA library screen, the EST

sequences retrieved in this way can be used as queries in a BLASTN search of dbEST to identify over-lapping ESTs. This procedure can be reiterated with the newly identified ESTs until no additional hits are found. The ESTs isolated can be assembled into sequence contigs

using computer softwares, such as UniGene.

There are many sequencing related errors in the dbEST.

EST 2

EST 1EST 3

Page 116: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

C. elegansa. a. sequences

Human EST sequences

CGI-Comparative Gene Identification

Ortholog:Homologous genes that have diverged from each other after speciation events (e.g., human beta- and chimp beta-globin)

Page 117: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Genomic sequence of the Nematode C. elegnas:A platform for investigating biology

The C. elegans Squencing Consortium

97 MB257 YACs (20% only in YAC)2527 cosmids113 fosmids44 PCR19,099 predicted genes18,891 proteins here(16,260 reviewed)

EST: 67,815 EST from 40,379 clones

7432 genes

A multicellular organism genome

Page 118: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Read protein sequences from dataset (eg. C. elegans proteome)

Perform tblastn BLAST search against HGI or EST databases

Parse BLAST results and stored in Oracle database

Rules based Neural Network

Predictions: Known genes, Gene Family, New Genes, No match

Page 119: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

[THC195737---------------------------------------------

MTRHGKNSTAASVYTYHERRRDAKASGYGTLHARLGADSIKEFHCCSLTLQPCRNPVISPTGYIF

--------]

DREAILENILAQKKAYAKKLKEYEKQVAEESAAAKIAEGQAETFTKRTQFSAIESTPSRTGAVAT

[THC195737--------------------

PRPEVGSLKRQGGVMSTEIAAKVKAHGEEGVMSNMKGDKSTSLPSFWIPELNPTAVATKLEKPSS

----------------------------------------------------]

KVLCPVSGKPIKLKELLEVKFTPMPGTETAAHRKFLCPVTRDELTNTTRCAYLKKSKSVVKYDVV

[THC195737----------------------]

EKLIKGDGIDPINGEPMSEDDIIELQRGGTGYSATNETKAKLIRPQLELQ*

U58746

*nucleotide sequence error

Page 120: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Translation of 1 MTRHGKNCTAGAVYTYHEKKKDTAASGYGTQNIRLSRDAVKDFDCCCLSLQPCHD 55U58746 1 MTRHGKNSTAASVYTYHERRRDAKASGYGTLHARLGADSIKEFHCCSLTLQPCRN 55 *******.** .******...*. ****** . ** *..*.* **.*.****.

Translation of 56 PVVTPDGYLYEREAILEYILHQKKEIARQMKAYEKQRGTRREEQKELQRAASQDH 110U58746 56 PVISPTGYIFDREAILENILAQKKAYAKKLKEYEKQVAEESAAAKIAEGQAETFT 110 **..* **...****** ** *** *...* **** * . *

Translation of 111 VRGFLEKESAIVSRPLNPFTAKALSGTSPD-----------DVQPGPSVGPPSKD 154U58746 111 KRTQFSAIESTPSRTGAVATPRPEVGSLKRQGGVMSTEIAAKVKAHGEEGVMSNM 165 * . ** * . *. *. * *

Translation of 155 K-DK--VLPSFWIPSLTPEAKATKLEKPSRTVTCPMSGKPLRMSDLTPVHFTPLD 206U58746 166 KGDKSTSLPSFWIPELNPTAVATKLEKPSSKVLCPVSGKPIKLKELLEVKFTPMP 220 * ** ******* *.* * ******** * **.****... .* *.***.

Translation of 207 SSVDRVGLITRSER-YVCAVTRDSLSNATPCAVLRPSGAVVTLECVEKLIRKDMV 260U58746 221 ------GTETAAHRKFLCPVTRDELTNTTRCAYLKKSKSVVKYDVVEKLIKGDGI 269 * * . * ..* **** *.*.* ** *. * .** . *****. * .

Translation of 261 DPVTGDKLTDRDIIVLQRGGTGFAGSGVKLQAEKSRPVMQA 301U58746 270 DPINGEPMSEDDIIELQRGGTGYSAT-NETKAKLIRPQLELQ 310 **..*. ... *** *******.. . .* ** ..

(44%/59%)

Page 121: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

[THC171302--MVFGENQDLIRTHFQKEADKVRAMKTNWGLFTRTRMIAQSDYDFIVTYQQAENEAERSTVLSVFKEK-------------------------------------------------------------------AVYAFVHLMSQISKDDYVRYTLTLIDDMLREDVTRTIIFEDVAVLLKRSPFSFFMGLLHRQDQYIVH-------------------------------------------------------------------ITFSILTKMAVFGNIKLSGDELDYCMGSLKEAMNRGTNNDYIVTAVRCMQTLFRFDPYRVSFVNING-------------------------------------------------------------------YDSLTHALYSTRKCGFQIQYQIIFCMWLLTFNGHAAEVALSGNLIQTISGILGNCQKEKVIRIVVST-----------------] [THC177150--------------------------------------------LRNLITSNQDVYMKKQAALQMIQNRIPTKLDHLENRKFTDVDLVEDMVYLQTELKKVVQVLTSFDEY-------------------------------------------------------------------ENELRQGSLHWSPAHKCEVFWNENAHRLNDNRQELLKLLVAMLEKSNDPLVLCVAAHDIGEFVRYYP------------------------------------------------]RGKLKVEQLGGKEAMMRLLTVKDPNVRYHALLAAQKLMINNWKDLGLEI

U50199

Page 122: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

gi|2895578 (AF041338) vacuolar proton pump subunit SFD alpha is... 927 0.0gi|2895576 (AF041337) vacuolar proton pump subunit SFD beta iso... 885 0.0gi|1213557 (U50199) coded for by C. elegans cDNA yk89e9.5; code... 468 e-131gi|1086810 (U41109) similar to S. cerevisiae vacular H(+)-ATPas... 335 5e-91gnl|PID|e351278 (Z99532) hypothetical protein [Schizosaccharomy... 185 5e-46sp|P41807|VM13_YEAST VACUOLAR ATP SYNTHASE 54 KD SUBUNIT (V-ATP... 123 2e-27

gi|1213557 (U50199) coded for by C. elegans cDNA yk89e9.5; coded for by C. elegans cDNA cm7g5; coded for by C. elegans cDNA cm14b9; coded for by C. elegans cDNA yk52g5.5; coded for by C. elegans cDNA yk76e5.5; coded for by C. elegans cDNA yk131f11.5; c... Length = 470 Score = 468 bits (1192), Expect = e-131 Identities = 243/477 (50%), Positives = 314/477 (64%), Gaps = 20/477 (4%)

Human gene: 483 aa

Page 123: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

gi|2895578 (AF041338) vacuolar proton pump subunit SFD alpha isoform [Bos taurus] Length = 483 Score = 927 bits (2369), Expect = 0.0 Identities = 460/483 (95%), Positives = 465/483 (96%)

Query: 1 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMISAEDCEFIQRFEMKRSPE 60 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMIS+EDCEFIQRFEMKRSPESbjct: 1 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMISSEDCEFIQRFEMKRSPE 60

Query: 61 EKQEMLQTEGSQCAKTFINLMTHICKEQTVQYILTMVDDMLQENHQRVSIFFDYARCSKN 120 EKQEMLQTEGSQ AKTFINLMTHI KEQTVQYILT+VDD LQENHQRVSIFFDYA+ SKNSbjct: 61 EKQEMLQTEGSQRAKTFINLMTHISKEQTVQYILTLVDDTLQENHQRVSIFFDYAKRSKN 120

Query: 121 TAWPYFLPILNRQDPFTVHMAARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGS 180 TAW YFLP+LNRQD FTVHM ARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGSSbjct: 121 TAWSYFLPMLNRQDLFTVHMTARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGS 180

Query: 181 GVAVETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQ 240 GV ETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQSbjct: 181 GVTAETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQ 240

Page 124: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Query: 241 YQMIFSIWLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKSTERE 300 YQMIFS+WLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKS ERESbjct: 241 YQMIFSVWLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKSVERE 300

Query: 301 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELK 360 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELKSbjct: 301 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELK 360

Query: 361 SGRLEWSPVHKSEKFWRENAVRLNEKNYELLKILTKLLEVSDDPQXLAVAAHDVGXYVRX 420 SGRLEWSPVHKSEKFWREN RLNEKNYELLKILTKLLEVSDDPQ LAVAAHDVG YVR Sbjct: 361 SGRLEWSPVHKSEKFWRENPARLNEKNYELLKILTKLLEVSDDPQVLAVAAHDVGEYVRH 420

Query: 421 YPRGKRVIEQXGGKQLVMNHMHHEXQQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQTXA 480 YPRGKRVIEQ GGKQLVMNHMHHE QQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQT ASbjct: 421 YPRGKRVIEQLGGKQLVMNHMHHEDQQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQTAA 480

Query: 481 ARS 483 ARSSbjct: 481 ARS 483

Page 125: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

[AA134689-----------------------------------------------MSLNGFGEHTRSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGYSYCGETAAYAF--------------------------]KQVVSSAVERVFILGPSHVVALNGCAITTCSKYRTPLGDLIVDHKINEELRATRHFDLMDRRDEES [THC196496-------------------------------------EHSIEMQLPFIAKVMGSKRYTIVPVLVGSLPGSRQQTYGNIFAHYMEDPRNLFVISSDFCHWGERF------------------------------------------------------------------SFSPYDRHSSIPIYEQITNMDKQGMSAIETLNPAAFNDYLKKTQNTICGRNPILIMLQAAEHFRIS-----------------------------------]NNHTHEFRFLHYTQSNKVRSSVDSSVSYASGVLFVHPN

U64857

Page 126: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Translation of 1 MSNR---VVCREASHAGSWYTASGPQLNAQLEGWLSQVQSTKRPARAIIAPHAGY 52U64857 1 MSLNGFGEHTRSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGY 55 ** .* ********.* * ** ** . ***.*.*****

Translation of 53 TYCGSCAAHAYKQVDPSITRRIFILGPSHHVPLSRCALSSVDIYRTPLYDLRIDQ 107U64857 56 SYCGETAAYAFKQVVSSAVERVFILGPSHVVALNGCAITTCSKYRTPLGDLIVDH 110 .*** .** *.*** * *.******* * * **... ***** ** .*.

Translation of 108 KIYGELWKTGMFERMSLQTDEDEHSIEMHLPYTAKAMESHKDEFTIIPVLVGALS 162U64857 111 KINEELRATRHFDLMDRRDEESEHSIEMQLPFIAKVMGSKR--YTIVPVLVGSLP 163 ** ** * *. * . .* ******.**. ** * *.. .**.*****.*

Translation of 163 ESKEQEFGKLFSKYLADPSNLFVVSSDFCHWGQRFRYSYYD-ESQGEIYRSIEHL 216U64857 164 GSRQQTYGNIFAHYMEDPRNLFVISSDFCHWGERFSFSPYDRHSSIPIYEQITNM 218 *..* .* .*..*. ** ****.********.** .* ** * ** * ..

Translation of 217 DKMGMSIIEQLDPVSFSNYLKKYHNTICGRHPIGVLLNAITELQK-NGMNMSFSF 270U64857 219 DKQGMSAIETLNPAAFNDYLKKTQNTICGRNPILIMLQAAEHFRISNNHTHEFRF 273 ** *** ** * * .* **** .******.** ..*.* . *. . * *

Translation of 271 LNYAQSSQCRNWQDSSVSYAAGALTVH 297U64857 274 LHYTQSNKVRSSVDSSVSYASGVLFVHPN 302 *.*.** . * *******.* * **

Page 127: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

[THC132858-------------------]MKQFKRGIERDGTGFVVLMAEEAEDMWHIYNLIRIGDIIKASTIRKVVSETSTGTTSSQRVHTM

LTVSVESIDFDPGAQELHLKGRNIEENDIVKLGAYHTIDLEPNRKFTLQKTEWDSIDLERLNLA

[THC85433------------------------------------------LDPAQAADVAAVVLHEGLANVCLITPAMTLTRAKIDMTIPRKRKGFTSQHEKGLEKFYEAVSTA--------------------------------------------] {AA938998*****************FMRHVNLQVVKCVIVASRGFVKDAFMQHLIAHADANGKKFTTEQRAKFMLTHSSSGFKHALKEV*******} [THC200182----------------------------------------------------LETPQVALRLADTKAQGEVKALNQFLELMSTEPDRAFYGFNHVNRANQELAIETLLVADSLFRA-----------------------------------------------]QDIETRRKYVRLVESVREQNGKVHIFSSMHVSGEQLAQLTGCAAILRFPMPDLDDEPMDEN

Z36238

Page 128: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Translation of 1 MKLVRKNIEKDNAGQVTLVPEEPEDMWHTYNLVQVGDSLRASTIRKVQTESSTGS 55Z36238 1 MKQFKRGIERDGTGFVVLMAEEAEDMWHIYNLIRIGDIIKASTIRKVVSETSTGT 55 ** ...**.*..* * *. ** ***** ***...** ..******* .*.***.

Translation of 56 VGSNRVRTTLTLCVEAIDFDSQACQLRVKGTNIQENEYVKMGAYHTIELEPNRQF 110Z36238 56 TSSQRVHTMLTVSVESIDFDPGAQELHLKGRNIEENDIVKLGAYHTIDLEPNRKF 110 *.**.* **..**.**** * .*..** **.**. **.******.*****.*

Translation of 111 TLAKKQWDSVVLERIEQACDPAWSADVAAVVMQEGLAHICLVTPSMTLTRAKVEV 165Z36238 111 TLQKTEWDSIDLERLNLALDPAQAADVAAVVLHEGLANVCLITPAMTLTRAKIDM 165 ** * .***. ***. * *** .*******..****..**.**.*******...

Translation of 166 NIPRKRKGNCSQHDRALERFYEQVVQAIQRHIHFDVVKCILVASPGFVREQFCDY 220Z36238 166 TIPRKRKGFTSQHEKGLEKFYEAVSTAFMRHVNLQVVKCVIVASRGFVKDAFMQH 220 .******* .***.. **.*** * * **.. ****..*** ***.. *

Translation of 221 MFQQAVKTDNKLLLGNRSKFLQVHASSGHKYSLKEALCDPTVLARLSDTKAAGEV 275Z36238 221 LIAHADANGKKFTTEQRAKFMLTHSSSGFKHALKEVLETPQVALRLADTKAQGEV 275 . .* . * .*.**. *.*** * .*** * * * **.**** ***

Page 129: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

sp|P48612|PELO_DROME PELOTA PROTEIN >gi|973224 (U27197) pelota ... 520 e-147sp|P50444|YNU6_CAEEL HYPOTHETICAL 42.9 KD PROTEIN R74.6 IN CHRO... 446 e-125gi|3941543 (AF069497) pelota [Arabidopsis thaliana] 385 e-106pir||S45456 DOM34 protein - yeast (Saccharomyces cerevisiae) >g... 236 2e-61sp|P33309|DO34_YEAST DOM34 PROTEIN >gi|295608 (L11277) DOM34 [S... 212 2e-54gnl|PID|e304505 (Z86109) unknown [Saccharomyces pastorianus] 199 3e-50gi|2622770 (AE000923) cell division protein [Methanobacterium t... 155 4e-37gnl|PID|d1031529 (AP000006) 356aa long hypothetical protein [Py... 146 3e-34sp|Q57638|Y174_METJA HYPOTHETICAL PROTEIN MJ0174 >gi|2127805|pi... 145 6e-34gi|2649765 (AE001046) cell division protein pelota (pelA) [Arch... 116 3e-25

sp|P50444|YNU6_CAEEL HYPOTHETICAL 42.9 KD PROTEIN R74.6 IN CHROMOSOME III >gi|3879163|gnl|PID|e1348805 (Z36238) Similar to the DOM34 protein of saccharomyces cerevisiae (Swiss Prot accession number P33309) [Caenorhabditis elegans] Length = 381 Score = 446 bits (1136), Expect = e-125 Identities = 215/371 (57%), Positives = 282/371 (75%)

BLASTP (Jan. 10, 1999)

Page 130: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 131: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

C. elegans from WormPept: 18,452 entries HGI searches

*Families 3,934*Known Gene 7,954*New Contig 3,456*Undetermined 2,070

<100 aa 1,038

*150 full length genes so far, more expected following GAP closure and 5’RACE.*110 CGI genes were included in human reference genes.

83% between Human & C. elegans11% C. elegans specific

Page 132: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

C. elegans from WormPept: 18,452 entries MGI searches

*Novel Genes 11,407

*Known Genes 4,151

*Undetermined 1,856 Short peptide 1,038

84% between Mouse & C. elegans

10% C. elegans specific

Page 133: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

0 100 200 300 400 500 600 700 800 900 1000

C. e

lega

ns

prot

ein

len

gth

Human CGI protein length

Successful GAP-closureresults were obtained on11/12 novel genes

Page 134: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 135: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 136: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 137: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 138: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 139: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 140: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 141: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 142: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

C. elegans from FlyDB: 13,146 entries HGI release 4.0 searches Expect e=10

*Known Gene 8,053*Families 3,560*New Contig 1,315*Undetermined 218

<100 aa 564

94% between Human & Drosophilia2% Drosophilia specific

March 24, 2000 - Sciences

Expect e=10-10

60% between Human & Drosophilia

Page 143: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 144: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Table 1The exon-intron junction sequences of the human crooked neck gene

Exon 5’ sequence 3’ sequence1 (EX3) — GTTGACAAAAACAgtaggtcacaaaatgatt2 gttcttactcccttgcagGGCGCCACGGCTC CAAAGTGGCCAAGgtaggcgatcgcgagggg3 tgtcttctttcttaaaagGTGAAAAACAAAG AAGGAAAAGGAAGgtcagtcagtgtggtatc4 ctttttgccttcttccagACTTTTGAAGATA AGGAGATTCAAAGgtaaaattactgagagtg5 caaatcttgcttcttaagGGCTCGATCCATA TTAATCAGTTCTGgtaagtttctgatctaac6 cgggtgattttgttacagGTACAAGTACACG ATTTATGAGCGATatatcctttggacgagat7 tccttaattcccttgcacTTGTCCTCGTGCA AAATCAGAAAGAGgtaagtatacctaacttc8 catcctgtctgcatttagTTTGAAAGGGTAC AGAAGAAGTGAAGgtgagcactggtgtggat9 aactggttgtctttctagGCGAATCCACACA ATTGGAGGCAAAGgtgaaaaaacagaattat10 tttcttatcctaatacagGATCCTGAGAGGA TCCTCACAAAAAGgtatgtttgctctaaagt11 tgtgcttttgttttctagTTCACATTTGCCA CAGAAGAGCATTGgtaagtaaagaaaggatc12 ggttattttattttccagGGAACTTCCATAG AGACATGCCAGAGgtgagcatctcaagtcaa13 tttacttttcttccttagGTGCTTTGGAAAT GCAGCATGTCAAGgtatccttgctttgtaga14 taataactttttttaaagGTATGGATCAGCT GACTGATGATGGGgtaagaactctgccctgg15 cattttttatttctgtagTCTGATGCAGGCT —

Note. Exon sequence is shown in bold uppercase letters, and intron sequence is shown in lowercase letters. Exon 2 was used as the translation initiation exon in most cells examined

Page 145: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 146: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

(A)30

(A)15

ESTsAI001718AA157950AI800228AI336977AI814570AI968687AI924865AW004018

ESTsAA458635AI049780AW014044AA665047AI003171AW512558

ESTsAI018501AW952970AA825980AA195126

(A)10

Page 147: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 148: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Type I isoform

Type II isoform

exon 2

exon 2

5’ sequence EXON 2intron gttcttactcccttgcag GGCGCCACGGCTC

SNP

Page 149: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

single nucleotide polymorphism:

DNA single base variations found in more than 1% of population.

SNP

Page 150: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

• Most common form of genetic variation-Genetic linkage studies

-Genome-wide association studies• Indicate predisposition to

-Disease predisposition and onset

-Drug tolerance-Drug efficacy

• Genome SNP scans will uncover gene function, and define new drug targets

• SNPs will enable physicians to personalize therapy

Why are SNPs Important?Why are SNPs Important?Why are SNPs Important?Why are SNPs Important?

Page 151: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Human Variations

0.3%

2%

Page 152: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Responder

Non-responder

Toxic responder (adverse drug rxn)

SNP

Page 153: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Celera-PFP TSC Kwok

#SNP 2,104,820 585,811 438,032

RefHuman 2,525(0.12%) 613(0.14%) 995(0.17%)cSNP (missense)

non-conservative 1,187 251 398

Only few cSNPs??

Low frequency functional variants are needed in human disease gene discovery

Page 154: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

EST based SNP discovery:

Multiple EST entries (>10) and Phred scores were required for SNP discovery.

Page 155: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Human reference protein sequences: 9848.

dbEST dataset: 2,208,221.

NP number Gene name a.a. cSNP

NP_002664 PLEXIN B1 [HOMO SAPIENS]. 2135 3NP_006856 LEUKOCYTE IMMUNOGLOBULIN-LIKE RECEPTOR; SUBFAMILY A (WITHOUT TM 439 3NP_006831 LEUKOCYTE IMMUNOGLOBULIN-LIKE RECEPTOR; SUBFAMILY B (WITH TM AND 590 3NP_005755 EPITHELIAL PROTEIN UP-REGULATED IN CARCINOMA; MEMBRANE ASSOCIATED 114 3NP_006411 BREFELDIN A-INHIBITED GUANINE NUCLEOTIDE-EXCHANGE PROTEIN 2 [HOMO 1785 3NP_004273 BCL2-ASSOCIATED ATHANOGENE 2; BAG-FAMILY MOLECULAR CHAPERONE 211 3NP_003772 UDP-GAL:BETAGLCNAC BETA 1;3-GALACTOSYLTRANSFERASE; POLYPEPTIDE 3 331 3NP_006854 LEUKOCYTE IMMUNOGLOBULIN-LIKE RECEPTOR; SUBFAMILY A (WITH TM 489 3NP_003881 IGG FC BINDING PROTEIN [HOMO SAPIENS]. 5404 3NP_031385 MLN51 PROTEIN [HOMO SAPIENS]. 534 4NP_031400 SINE OCULIS HOMEOBOX (DROSOPHILA) HOMOLOG 6 [HOMO SAPIENS]. 246 4NP_004087 EUKARYOTIC TRANSLATION INITIATION FACTOR 4E BINDING PROTEIN 2 [HOMO 120 4NP_005057 SPLICING FACTOR PROLINE/GLUTAMINE RICH (POLYPYRIMIDINE 707 4NP_004266 TRF-PROXIMAL PROTEIN [HOMO SAPIENS]. 209 4NP_004900 TAXOL RESISTANCE ASSOCIATED GENE 3 [HOMO SAPIENS]. 110 4NP_004731 TGF-BETA-1-INDUCED ANTIAPOPTOTIC FACTOR 1 [HOMO SAPIENS]. 115 4NP_004810 SYMPLEKIN [HOMO SAPIENS]. 1142 4NP_004800 STOMATIN-LIKE PROTEIN 1 [HOMO SAPIENS]. 394 4

FVD project (version 1) :

Page 156: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’
Page 157: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Human reference protein sequences: 9848.

total: 5,046,910 residues.

dbEST dataset: 2,208,221.

Predicted non-synonymous cSNP: 55,433.

average cSNP per protein: 5.62.

average length per protein: 514.48 a.a.

Variant EST = 1 40,215.Variant ESTs >1 15,218.

dbSNP match:

1,074 268 (synonymous)838 (non-synonymous)

Page 158: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

NP_006323, IFI30

Page 159: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Potential error residues in reference proteins:

4,432 (0:>1)

Page 160: GENE Exon 1IntronExon 3IntronExon 4Exon 2Intron Promoter Enhancer mRNA transcript Exon 1IntronExon 3IntronExon 4Exon 2Intron 5’-untranslated region 5’3’

Cancer Fetal Adult

* 0 0 32,357

0 * 0 11,441

0 0 * 18,859

PROTEIN PHOSPHATASE 1; CATALYTIC SUBUNIT