View
41
Download
0
Category
Preview:
DESCRIPTION
Transformational Grammars and PROSITE Patterns. Roland Miezianko CIS 595 - Bioinformatics Prof. Vucetic. Agenda. Transformational Grammars Definition The Chomsky Hierarchy Finite State Automata FMR-1 Triplet Repeat Region Regular Grammar Example PROSITE Patterns in Regular Grammar Form. - PowerPoint PPT Presentation
Citation preview
Transformational GrammarsTransformational Grammarsand PROSITE Patternsand PROSITE Patterns
Roland MieziankoRoland Miezianko
CIS 595 - BioinformaticsCIS 595 - Bioinformatics
Prof. VuceticProf. Vucetic
AgendaAgenda
• Transformational GrammarsTransformational Grammars– DefinitionDefinition– The Chomsky HierarchyThe Chomsky Hierarchy
• Finite State AutomataFinite State Automata– FMR-1 Triplet Repeat RegionFMR-1 Triplet Repeat Region– Regular Grammar ExampleRegular Grammar Example
• PROSITEPROSITE– Patterns in Regular Grammar FormPatterns in Regular Grammar Form
AssumptionsAssumptions
• Treated biological sequences as Treated biological sequences as one-dimensional strings of one-dimensional strings of independent and uncorrelated independent and uncorrelated symbols.symbols.
• Need to address interaction among Need to address interaction among base pairs to understand base pairs to understand secondary structures.secondary structures.
Secondary StructuresSecondary Structures
• The 3-D folding of proteins and The 3-D folding of proteins and nucleic acids involves extensive nucleic acids involves extensive physical interactions between physical interactions between residues that are not adjacent in residues that are not adjacent in primary sequence. primary sequence. [1][1]
• Require a model for secondary Require a model for secondary structure that reflect the structure that reflect the interaction among base pairs.interaction among base pairs.
Modeling StringsModeling Strings
• General theories for modeling General theories for modeling strings of symbols has been strings of symbols has been developed by computational developed by computational linguistslinguists– Chomsky in 1956, 1959Chomsky in 1956, 1959– Interested in how a brain or computer Interested in how a brain or computer
program could algorithmically program could algorithmically determine whether a sentence was determine whether a sentence was grammatical or notgrammatical or not
Transformational Transformational GrammarsGrammars
• Transformational Grammars consist Transformational Grammars consist of:of:– SymbolsSymbols
• Abstract Nonterminal SymbolsAbstract Nonterminal Symbols• Terminal SymbolsTerminal Symbols
– Rewriting Rules (Productions) Rewriting Rules (Productions) • A --> BA --> B
Transformational Transformational Grammars, ExampleGrammars, Example
Example GrammarTwo-letter terminal alphabet: {a, b}Single nonterminal letter: SThree Productions:S->aSS->bSS->e (e=special blank terminal symbol)
Example derivation of our simple grammar:S->aS->abS->abbS->abb
Chomsky HierarchyChomsky Hierarchy
• Four types of restrictions on Four types of restrictions on grammar’s productions resulted on grammar’s productions resulted on four classes of grammars.four classes of grammars.– Regular GrammarsRegular Grammars– Context-Free GrammarsContext-Free Grammars– Context-Sensitive GrammarsContext-Sensitive Grammars– Unrestricted GrammarsUnrestricted Grammars
Chomsky HierarchyChomsky Hierarchy
regular
context-free
context-sensitive
unrestricted
AutomataAutomata
• Each grammar has a corresponding Each grammar has a corresponding abstract computational device called: abstract computational device called: automatonautomaton
Grammar Parsing Automaton
Regular Finite State
Context-Free Push-Down
Context-Sensitive Linear Bounded
Unrestricted Turing Machine
FRM-1 TripletFRM-1 TripletRepeat RegionRepeat Region
• FRM-1 gene sequence contains FRM-1 gene sequence contains CGG which is repeated number of CGG which is repeated number of timestimes
• Number of triplets is highly variable Number of triplets is highly variable between individualsbetween individuals
• Increased copy number is Increased copy number is associated with a genetic diseaseassociated with a genetic disease
FRM-1 TripletFRM-1 TripletRepeat RegionRepeat Region
• FSA will match any string from the FSA will match any string from the “language” that contains the “language” that contains the strings:strings:
GCG CTG
GCG CGG CTG
GCG CGG CGG CTG
GCG CGG CGG CGG CGG … CTG
FRM-1 TripletFRM-1 TripletRepeat RegionRepeat Region
FRM-1 TripletFRM-1 TripletRepeat RegionRepeat Region
Regular Grammar for our Finite State Automaton finds any number of copies of CGG
PROSITE PatternsPROSITE Patterns
• PROSITE database is an example of PROSITE database is an example of a biological application of regular a biological application of regular grammarsgrammars– Unlike methods which assign scores to Unlike methods which assign scores to
alignments, PROSITE patterns either alignments, PROSITE patterns either match a sequence or do not.match a sequence or do not.
PROSITE PatternsPROSITE Patterns
• Consists of a string of pattern Consists of a string of pattern elements separated by dashes and elements separated by dashes and terminated by a periodterminated by a period– Pattern Element – single letterPattern Element – single letter– [ ] - any one letter[ ] - any one letter– { } – anything but enclosed letters{ } – anything but enclosed letters– X – any residue can occurX – any residue can occur– X(y) – any letter of length yX(y) – any letter of length y
PROSITE PatternsPROSITE Patterns
[RK]-G-{EDRKHPCG}-[AGSCI]-[FY]-[LIVA]-x-[FYM].
RNP-1 Motif
ConclusionConclusion
• Transformational grammars are Transformational grammars are useful in developing acceptors of useful in developing acceptors of different length sequences and for different length sequences and for matching specific multi-sequence matching specific multi-sequence regions.regions.
• Higher order grammars in the Higher order grammars in the Chomsky hierarchy are more Chomsky hierarchy are more difficult to program and applydifficult to program and apply
ReferencesReferences
[1] Durbin, R. Biological Sequence Analysis: Probabilistic Models of [1] Durbin, R. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. University of Cambridge Press, 1998.Proteins and Nucleic Acids. University of Cambridge Press, 1998.
[2] Gibson, G. A Primer of Genome Science. Sinauer Associates, Inc. [2] Gibson, G. A Primer of Genome Science. Sinauer Associates, Inc. Publishers, 2002. Publishers, 2002.
[4] PROSITE Database http://us.expasy.org/prosite/[4] PROSITE Database http://us.expasy.org/prosite/
[3] Mount, D. Bioinformatics: Sequence and Genome Analysis. Cold [3] Mount, D. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, 2001. Spring Harbor Laboratory Press, 2001.
Transformational GrammarsTransformational Grammarsand PROSITE Patternsand PROSITE Patterns
QuestionsQuestions
AndAnd
AnswersAnswers
Recommended