View
217
Download
0
Tags:
Embed Size (px)
Citation preview
1. Computing Patterns in Strings, Bill Smyth, to appear in 2003
2. Pattern matching and text compression algorithms, M. Crochemore and T. Lecroq, Chapter 8 in the “Computer Science and Engineering Handbook”,to appear in 2003
More to appear this year…
Is “Pattern Matching” a community?
1. Conferences
2. Bibliographies
3. Pattern Matching Forum
4. Software and Animations
Conferences
Pattern Matching:
CPM, SPIRE, Prague Stringology Conference
Sister Conferences:
SIGIR, LATIN, DCC, KDD, …
Theory Conferences:
STOC, FOCS, ICALP, SODA, ESA, WADS, SWAT, STACS,…
Collection HomeUp:
Bibliographies on Theory/Foundations of Computer Science
The Collection ofComputer Science Bibliographies
Bibliography on Pattern Matching[ About | Browse | Statistics ]
2002Most recent reference:
yesSupported:2Number of online publications:
November 29, 2002Last update:2184Number of references:
Search the Bibliography
Help on: [ Syntax | Options | Improving your query | Query examples ] Boolean operators:and and or. Use () to group boolean subexpressions.
Example: (specification or verification) and asynchronous
Max. of matches Results:
Options:
Query:
BibliographyAuthor: Thiery LecroqMaintained by: U. of Karlsruhe, Germany
online papers only
Pattern Matching PointersMaintained by: Stefano Lonardi
Contents (last updated: MMon Aug 12 16:01:47 PDT 2002)
People: [ A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z ]
Pattern Matching Discussion Boards Conference announcements
Resources: on-line bibliographies, journals, proceedings, software, newgroups.
Pattern Matching Pointers
The purpose of this page is to serve as an index to information relevant to Pattern Matching/Computational Biologist researchers. We prefer to point to information rather than store it locally. We include all submissions that seem appropriate. However, inclusion should not be interpreted as an endorsement of a contribution's accuracy or importance.
IntroductionCombinatorial Pattern Matching addresses issues of searching and matching strings and more complicated patterns such as trees, regular expressions, graphs, point sets, and arrays. The goal is to derive non-trivial combinatorial properties for such structures and then to exploit these properties in order to achieve improved performance for the corresponding computational problem.
Discussion Topics Last Day Last Week Tree View
Documentation Getting Started Formatting Troubleshooting Program Credits Utilities New Messages Keyword Search Contact Edit Profile Administration
Pattern Matching ForumPattern Matching Problems
Pattern Matching Discussion Board: Pattern Matching Problems
Subtopic Msgs Last Updated
<String Libray> anyone? 4
Matching pattern for stable marriage problem
2 11/07 07:00am
Point pattern matching 2
Pattern matching of strings 5
Bit Pattern Matching 2
String matching? 3
Pattern Matching in C 5 12/30 05:17pm
Person demographics matching
1
X-ray image comparison and matching
2 07/18 04:03am
I ,m aware of how I start or use algorithm approach to difine the similarty between two shapes
1
I ,m aware of how I start or use algorithm approach to difine the similarty between two shapes
2
String Matching Animations
1. 30 Exact String Matching Algorithms Animated in Java, Christian Charras and Thiery Lecroq,http://www-igm.univ-mlv.fr/~lecroq/string/
2. Java Applets for Sequence Comparison Algorithms,Christian Charras and Thiery Lecroq
3.Animations around the Globe:Stephen Campbell, UK; Gusfield, USA; Navarro, Chile; Buhler, Germany;Cássia, Brasil; Michailidis, Macedonia;
Applications inPattern Matching
1. Computational Biology2. Compilers3. Search Engines4. XML5. Musicology6. Meteorology7. Image Processing/X-Ray8. Databases/SQL
Applications inPattern Matching
9. Data Mining10. TCP-IP Routing Tables11. Soundex12. Data Compression13. P2P Networking
(Napster,Kaaza,Gnutella)14. Intrusion Detection
Applications inPattern Matching
1. Computational Biology2. Compilers3. Search Engines4. XML5. Musicology6. Meteorology7. Image Processing/X-Ray8. Databases/SQL
Computational Biology
1. Sequence Alignment - LCS, Edit Distance,…
2. Multiple Sequence Alignment
3. Gene Finding
4. Phylogeny
5. Physical Mappings
6. Genome Rearrangements
7. DNA Chips and Gene Networks
Sequence Alignment –Edit Distance
Dynamic programming: O(nm)
D(i,j) A A C T
0 1 2 3 4
A 1 0 1 2 3
G 2 1 1 2 3
T 3 2 2 2 2
Edit Operations
1. Insert 2. Delete 3. Mismatch
Sequence Alignment
1.Nucleic Acids: ENTREZ, SRS, BankIt, EMBL, NDB, dbEST
2.Proteins: SwissProt, PIR, OWL, Molecules ‘R Us3.Chromosome Maps: CEPH-Genethon, CHLC, NCBI 4.Factors and Motifs: TFD, Prosite (New!)5.Enzymes: REBASE, ENZYME, EC Enzyme DB, Merops6.Organism specific databases: Many!
Databases
Alignment Algo’s: BLAST, FASTA, PAM, Prosite, BLOCKS, BLOSUM, Teiresias
Sequence Alignment
Online Approximate Queries Indexing with Errors
.
.
.
Pattern
• For (small!) constant distance, seems that there may be hope…
Database
Multiple Sequence Alignment
Problem: Strings S1,…,Sk – find S closest to strings
Closest: sum-of-pairs, distance-from-consensus
Solution: Dynamic Programming – exponential in k NP-completeness, heuristics, approximations
Multiple Alignment to a Phylogenetic tree
{ aba cdaa daab mada dag lab abda daa }
aba
abda
cdaa
daa
mada
dgab
lab
dag
1
2 2 12
22
Optimal alignment: 2+2+2+2+2+1+1=12
From Jeremy Buhler’sWeb pages
1. Choosing Cell Populations
2. mRNA Extraction and
Reverse Transcription
3. Fluorescent Labeling of cDNA's
4. Hybridization to a DNA Microarray
5. Scanning the Hybridized Array
6. Interpreting the Scanned Image
DNA Chips – Sequencing by Hybridization
Computational Biology
1. Sequence Alignment - LCS, Edit Distance,…
2. Multiple Sequence Alignment
3. Gene Finding
4. Phylogeny
5. Physical Mappings
6. Genome Rearrangements
7. DNA Chips and Gene Networks
Applications inPattern Matching
1. Computational Biology2. Compilers3. Search Engines4. XML5. Musicology6. Meteorology7. Image Processing/X-Ray8. Databases/SQL
Compilers
Grammars (EBNF): Regular Expressions
Regular Expression Search: Search of m-length
Regular Expression in n-length Text.
Time: O(nm) Since 1968!!!
Parameterized/FunctionMatching
Parameterized Matching a b b c a b z x x x y y z x y z y z x y w z x y
Prog.c
int a,b;
a=1;a = g(a)*5+f(a);b=2;a = func(a,b);a = a*g(b);b=1;b = g(b)*5+f(b);….
c=1;c = g(c)*5+f(c);
Pattern
Parameterized/FunctionMatching
Parameterized Matching a b b c a b z x x x y y z x y z y z x y w z x y
f(a)=x
Prog.c
int a,b;
a=1;a = g(a)*5+f(a);b=2;a = func(a,b);a = a*g(b);b=1;b = g(b)*5+f(b);….
c=1;c = g(c)*5+f(c);
Pattern
Parameterized/FunctionMatching
Parameterized Matching a b b c a b z x x x y y z x y z y z x y w z x y
f(a)=x, f(b)=y
Prog.c
int a,b;
a=1;a = g(a)*5+f(a);b=2;a = func(a,b);a = a*g(b);b=1;b = g(b)*5+f(b);….
c=1;c = g(c)*5+f(c);
Pattern
Parameterized/FunctionMatching
Parameterized Matching a b b c a b z x x x y y z x y z y z x y w z x y
f(a)=x, f(b)=y, f(c)=z
Prog.c
int a,b;
a=1;a = g(a)*5+f(a);b=2;a = func(a,b);a = a*g(b);b=1;b = g(b)*5+f(b);….
c=1;c = g(c)*5+f(c);
Pattern
Parameterized/FunctionMatching
Parameterized Matching a b b c a b z x x x y y z x y z y z x y w z x y
f(a)=x, f(b)=y, f(c)=z
Prog.c
int a,b;
a=1;a = g(a)*5+f(a);b=2;a = func(a,b);a = a*g(b);b=1;b = g(b)*5+f(b);….
c=1;c = g(c)*5+f(c);
Pattern
Applications inPattern Matching
1. Computational Biology2. Compilers3. Search Engines4. XML5. Musicology6. Meteorology7. Image Processing/X-Ray8. Databases/SQL
Standard Web Search Engine Architecture
crawl theweb
create an inverted
index
Check for duplicates,store the
documents
Inverted index
Search engine servers
userquery
Show results To user
DocIds
Search EngineQueries
Search on inverted index (use ranking schemes – most cited, link analysis, most visited –
term frequency) - PageRankTM
Inverted index
Challenges
1. Distributed queries.2. Boolean queries.
Distributed Queries
Inverted index
Inverted index
Inverted index
Inverted index
Inverted index
Inverted index …
…
Boolean Queries
BFS Dijkstra
Dijkstra
BFS
...
...
...
…
563 33 2131 …...12 78 33 …...
And = Intersection
Or = Union
Not = not in list
Applications inPattern Matching
1. Computational Biology2. Compilers3. Search Engines4. XML5. Musicology6. Meteorology7. Image Processing/X-Ray8. Databases/SQL
XML
XQuery: Query language for XML
Query types: …, path expression, …
More extensive: Tree Pattern Matching, Kilpeläinen [92] (Hot in Automata community)
Applications inPattern Matching
1. Computational Biology2. Compilers3. Search Engines4. XML5. Musicology6. Meteorology7. Image Processing/X-Ray8. Databases/SQL
Musicology
1. Music Comparison2. Music Information Retrieval3. Music Pattern Induction
Attributes: Duration and Pitch
Properties: transposition invariance, polyphony, and the musical context
Copyright Infringement: (US federal court)
Pitch sequences
Musicology
1. Music Comparison2. Music Information Retrieval3. Music Pattern Induction
..
..
..
..
…
Pattern
Musicology
1. Music Comparison2. Music Information Retrieval3. Music Pattern Induction
“the importance of parallelism (that is, approximate or literal repetition) in musical structure cannot be overestimated.The more parallelism one can detect, the more internally coherent an analysis becomes, and the less independent information must be processed and retained in hearingor remembering a piece.”
Lerdahl and Jackendoff:
Musicology
1. Music Comparison2. Music Information Retrieval3. Music Pattern Induction
Attributes: Duration and Pitch
Copyright Infringement: (US federal court)
Pitch sequences
Properties: transposition invariance, polyphony, and the musical context
Pattern Matching?
1. Find approximate match to pitch sequences, with distance defined by properties.
2. Can music be de-polyphony-ized? i.e. create multiple monophony tracks by differentiating patterns?
3. Automatic detection of transposition invariance?
Musicology
Projects in MIR(Music Information Retreival):
1. University of Waikato, New Zealand 2. University of Massachusetts, US3. King's College and City University in London, UK 4. Université Pierre et Marie Curie, France5. Università degli Studi di Milano, Italy6. University of Helsinki, Finland 7. more…
New annual conference since 2000: MIR
Applications inPattern Matching
1. Computational Biology2. Compilers3. Search Engines4. XML5. Musicology6. Meteorology7. Image Processing/X-Ray8. Databases/SQL
Meteorology
Immediate weather prediction = Atmospheric Models, e.g. Eta, RUCS, AVN/MRF, Ensembles, MM5, ARP5, MOS, Global Ocean Model, etc. Based upon work of hydrodynamist V. Bjerknes (1904)
Long term weather prediction = Pattern Search El Niño, La Niña, Climate Prediction Centers, Military, NASA, Air Quality Research, Large Scale Computers
Meteorology
Weather Pattern Recognition
Difficult!
Measure? temperature, wind speed, wind direction,
Atmospheric image recognition
Pei and Lin (1995) operations: scaling, rotation, translation, and skew (due to the curvature of earth)
σθ
MeteorologyWeather Pattern Recognition
Measure: temperature, wind speed, wind direction, Parameters: height of measurement (3, 10 meters off ground), elevation, barometric pressure, cloudiness, stability measurement,…
σθ
Applications inPattern Matching
1. Computational Biology2. Compilers3. Search Engines4. XML5. Musicology6. Meteorology7. Image Processing/X-Ray8. Databases/SQL
Applications inPattern Matching
9. Data Mining10. TCP-IP Routing Tables11. Soundex12. Data Compression13. P2P Networking
(Napster,Kaaza,Gnutella)14. Intrusion Detection
Motif Discovery
Motif = Pattern of the form x1-x2-…-xn
where “-” is a bounded gap
Biological Databases – Find all frequent Motifs
Current research: Suffix trees for text
with gaps
Approximation Algorithms
Examples
1. Shortest Common Supertring
2. Edit Distance with Block operations
3. Phylogenetic trees
4. Evolutionary trees
5. …