Upload
sabina
View
47
Download
0
Embed Size (px)
DESCRIPTION
Gene Structure and Identification III. Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8. BIO520 BioinformaticsJim Lund. Solve the protein folding problem Solve the molecular docking/binding problem Develop realistic simulations of molecules in cells Simulate multicellular systems. - PowerPoint PPT Presentation
Citation preview
Gene Structure and Identification III
BIO520 Bioinformatics Jim Lund
Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8
For real prediction we need…
• Solve the protein folding problem• Solve the molecular docking/binding
problem• Develop realistic simulations of
molecules in cells• Simulate multicellular systems
Promoter/Enhancer analysis
• Regulatory Sequences– Known Consensus Sequences
– Consensus Sequence Generation• Using functional (experimental) Data
• HBB as an example
Gene Regulatory Sequences
• Functional sites–Consensus
–Experimental tests
• Inferred sites–Transcriptome analysis
Sequence Logos
• http://weblogo.berkeley.edu/
Position Weight Matrix:
PO A C G T01 6 4 4 6 N02 4 9 3 4 N03 12 4 3 1 A04 6 1 11 2 R05 3 2 11 4 G06 3 3 4 10 N07 3 10 3 4 N08 11 2 4 3 A09 4 9 3 4 N10 3 6 3 8 N
EUKARYOTES
• More complex signals– Basal/core promoter– Promoter– Enhancers
• More genes• More dispersed signals
– Larger promoters, distant enhancers, regulatory sites in introns.
• Combinatoric regulation common
Basal Promoter Analysis
Myers and Maniatis, Genes VI, 831
• TATA-box -25 to -30 TBP• CCAAT-box -212 to -57 CTF/NF1• GC-box -164 to +1 SP1• K C W K Y Y Y Y +1 to +5 cap signal
TATA CAATGC
+1
Finding PolII sites (transcription start
site)• Promoter Scan• TSSG/TSSW (TSSP for plants)
• Core-Promoter• FPROM
• BCM Search Launcher
Enhancer Elements
• Octamer OCT1, OCT2B NF B• ATF ATF• AP1… AP1• ……..
Consensus Sequence Databases
• TRANSFAC
• TFD (transcription factor database)
Consensus Sequence Databases
• Finding sites in promoter regions:– TESS
• http://www.cbil.upenn.edu/cgi-bin/tess/tess
– TFSEARCH• http://www.cbrc.jp/research/db/TFSEARCH.html
– BCM Search Launcher• http://searchlauncher.bcm.tmc.edu/seq-search/gene-
search.html
HBB promoter (TESS)
Sequence-based algorithms for identifying enhancer binding sites
• Genes from: – Microarray transcription analysis
– ChIP::chip experiments
– Orthologous sequences
– Experimental/other
• Programs for finding consensus sites:– MEME analysis of clusters
– AlignAce
– BioProspector/CompareProspector
Practical Gene Finding
• Use ALL tools– Predictive: Stitch together a consensus
• ORF finders
• Find patterns (and WWW pattern searches)
• HMM: GRAIL, Genscan…
– Comparative• BLASTN, BLASTX
• Compare genomes (human:mouse)
– cDNA, protein, genetic evidence
ORFs-aldolase gene
Genomic DNA-cDNA alignment
DNA sequencing
cDNAAlign (GAP)
Infer Promoter, EnhancerTest in cis
P
Comparative Genomics
• Conservation of coding regions• Identification of transcription signals
– “words” in common
• Example-yeast comparisons
Ensembl prediction pipeline
RepeatMasker
Genscan
Blast genscan peptides vProtein,unigene,est,vert mrna
Pmatch all human Proteins and cdnas
MiniGenewiseMiniEst2genome
Genes
DNA
Genscan features
• Model both strands at once• Each state may output a string of symbols
(according to some probability distribution).• Explicit intron/exon length modeling• Advanced splice site modeling• Complete intron/exon annotation for sequence• Able to predict multiple genes and partial/whole
genes• Parameters learned from annotated genes• Separate parameter training for different CpG
content groups (< 43%, 43-51%, 51-57%,>57% CG content)
GENSCAN predictions
Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------
7.00 Prom + 63096 63135 40 -2.75 7.01 Init + 63183 63274 92 2 2 103 77 142 0.997 14.61 7.02 Intr + 63403 63625 223 1 1 83 96 181 0.999 15.61 7.03 Term + 64524 64652 129 2 0 101 50 83 0.373 3.00 7.04 PlyA + 64758 64763 6 1.05
8.00 Prom + 70508 70547 40 -4.75 8.01 Init + 70595 70686 92 1 2 103 77 133 0.990 13.71 8.02 Intr + 70817 71039 223 2 1 100 96 217 0.999 20.91 8.03 Term + 71890 72018 129 0 0 116 43 119 0.827 7.40 8.04 PlyA + 72126 72131 6 1.05
9.00 Prom + 74399 74438 40 -8.25 9.01 Sngl + 76602 76847 246 2 0 71 50 218 0.886 11.13 9.02 PlyA + 76928 76933 6 1.05
GENSCAN predicted exons
Annotated predicted exons
HBB gene
• HBB exons 1-3• 70545..70686• 70817..71039• 71890..72150
• GENSCAN• 70595 70686• 70817 71039• 71890 72018