Upload
genero
View
40
Download
1
Embed Size (px)
DESCRIPTION
Pa tterns for N ext-Generation Da tabase Systems PANDA. A First Attempt towards a Logical Model for the PBMS. PANDA Meeting, Milano , 18 April 2002 National Technical University of Athens. Overview. General Understanding of the PBMS Mathematical Background - PowerPoint PPT Presentation
Citation preview
A First Attempt towards a Logical Model for the
PBMS
PANDA Meeting, Milano, 18 April 2002National Technical University of Athens
Patterns for Next-Generation Database Systems
PANDA
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
2
Overview
• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
3
Overview
• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
4
General Framework
Meta-Pattern Type + Patter Types = PBMS Catalog
Pattern Layer =PBMS Content
Raw Data
Cluster 3
Cluster 2
Cluster 1
Assoc. Rule n
Assoc. Rule 2
Assoc. Rule 1
Decision Tree 1
Ass. Rule Algorithm
Dec. Tree Algorithm
DBSCAN Cluster
Algorithm
belong to
belongs tobelongto
Association Rule Type
DBSCAN Cluster Type
Decision Tree Type
belong to
Meta_Pattern Type
PBMS
Pattern TypeLayer
Meta-Pattern TypeLayer Language
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
5
General IdeaMeta-Pattern Type+ Language Relation + Language
• a Name • a Condensed Expression • an Extension and
Language
• a Name • a Schema • an Extension and
Relational Calculus
Pattern Type Relational Table• AssociationRuleType • head :- body
• ext(AssociationRuleType)
• Buys• session_id,date,item, price
• ext(Buys)
Pattern Tuple
Buys(x,_,beer,_):- Buys(x,_,pampers,_)
Buys(34,4/4/2002,beer,2)
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
6
Overview
• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective • Conclusions
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
7
Mathematical Background
Assumptions from the definition:• There exists a data space and a pattern space.• There always exist M:N relationships among data and
patterns.
Data Space Pattern Space
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
8
Characteristics of data and pattern space
• Each data item is characterized by a finite number of features N.
• dom(x) the domain of each feature. • Data space DN dom(A1)x…xdom(AN)• Proposal: all dom(x) are infinitely countable +
consider cases for DN (whether it is finite or not).
• Each pattern is characterized by a finite number of features M.
• Pattern space DM dom(A1)x…xdom(AM)• Proposal: all dom(x) are infinitely countable + DM is
clearly finite.
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
9
Statistical Measures
The data-pattern relationship fDP has:
• participation measures for the relationship;• importance measures for a data item;• importance measures for a pattern.
Data Space Pattern Space
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
10
Statistical Measures
• Richness of representation =relationships captured by the condensed representation
total number of relationships
• Compactness of the representation = size(DM)*M
size(DN)*N
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
11
Overview
• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
12
General Framework
Meta-Pattern Type + Patter Types = PBMS Catalog
Pattern Layer =PBMS Content
Raw Data
Cluster 3
Cluster 2
Cluster 1
Assoc. Rule n
Assoc. Rule 2
Assoc. Rule 1
Decision Tree 1
Ass. Rule Algorithm
Dec. Tree Algorithm
DBSCAN Cluster
Algorithm
belong to
belongs tobelongto
Association Rule Type
DBSCAN Cluster Type
Decision Tree Type
belong to
Meta_Pattern Type
PBMS
Pattern TypeLayer
Meta-Pattern TypeLayer Language
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
13
Pattern Types
• Intentional Description of a Pattern Type as follows:– PID
– Explicit Relationship: fDPi:DN→Di
M.
– Relationship Expression
– Statistical Measures.
• Extensional Description (or Pattern Extension) of a Pattern Type : a finite set of patterns
• Data extension of of a Pattern Type : a countable? set of data items
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
14
Example
Pattern Type Intentional Description
[small part of] Pattern Type Extensional Description
• PID• Explicit Relationship• Relationship Expression
• Statistical Measures
• PID123
• fDPi:DN→Di
M ={(PID123,RID124),…}
• Buys(x,_,beer,_):-
Buys(x,_,pampers,_) • Coverage=80%, Confidence=90%
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
15
General Framework
Meta-Pattern Type + Patter Types = PBMS Catalog
Pattern Layer =PBMS Content
Raw Data
Cluster 3
Cluster 2
Cluster 1
Assoc. Rule n
Assoc. Rule 2
Assoc. Rule 1
Decision Tree 1
Ass. Rule Algorithm
Dec. Tree Algorithm
DBSCAN Cluster
Algorithm
belong to
belongs tobelongto
Association Rule Type
DBSCAN Cluster Type
Decision Tree Type
belong to
Meta_Pattern Type
PBMS
Pattern TypeLayer
Meta-Pattern TypeLayer Language
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
16
Meta-Pattern Types
• Intentional Description of a Pattern Type as follows:– Name
– Condensed Expression
– [Meta]Statistical Measures.
– ?? Schema Attributes ??
• Extensional Description of a Meta-Pattern Type : a finite set of pattern types
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
17
ExampleMeta-Pattern Type Intentional Description
[small part of] Meta-Pattern Type Extensional Description
• Name• Condensed Expression• [Meta]Statistical
Measures• Schema Attributes??
•AssociationRuleType •head :- body•Coverage: Float[0..1],
Confidence: Float[0..1]•PID, Head, Body ??
Pattern Type Intentional Description
[small part of] Pattern Type Extensional Description
• PID• Explicit Relationship• Relationship Expression
• Statistical Measures
• PID123
• fDPi:DN→Di
M ={(PID123,RID124),…}
• Buys(x,_,beer,_):-
Buys(x,_,pampers,_) • Coverage=80%, Confidence=90%
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
18
Which language to choose?
• Relational Calculus, Datalog and Stratified Datalog ?– Powerful but not elegant for all the patterns that we
might want to express…
• Constraint database approach ?– We cannot guarantee a finite representation of the
result for non-linear constraints…
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
19
Which language to choose?
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
20
Which language to choose?• Remove recursion ?
– Cannot express interesting patterns like transitive closure…
• Only linear constraints ?– Cannot express interesting patterns like cyclic clusters…
– Approximation of polynomials through sets of linear constraints ? Not elegant…
• Forget constraints and describe every pattern type as a simple predicate ?– Loss of all the declarative information on the nature of the
pattern type …
• So, what to do? Possible dead-end due to the paradigm?
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
21
Overview
• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
22
How to build it?
• Each of the pattern types implemented as a Class. • The different pattern types defined as specializations
of a Generic Pattern Class.
• Treat pattern types as predicates, with semantics computed by a computationally complete procedural language [e.g., PL/SQL, C++, …]? – Instead of fundamental research we turn to feasibility
issues…
• What about behavior?
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
23
General Framework
Meta-Pattern Type + Patter Types = PBMS Catalog
Pattern Layer =PBMS Content
PBMS
Cluster 3
Cluster 2
Cluster 1
Assoc. Rule n
Assoc. Rule 2
Assoc. Rule 1
Decision Tree 1
IN ININ
Association Rule Class
Cluster Class
Decision Tree Class
ISA
GenericClass
Set of DDL/DMLLanguages
How to build it?
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
24
Overview
• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
25
Conclusions• Followed the Datalog paradigm (need for deductive
capabilities) enhanced with constraints (need for elegance)
• Reduced the problem to the specification of a proper language for the description of pattern types
• Fundamental language limitations when considered constraints
• Dilemma: – Change paradigm?
– Stick with this paradigm and focus on engineering issues?
– …Any other suggestions ?…
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
26
Thank you …
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
27
Definitions from the minutes of Athens meeting
• Pattern is a compact and rich in semantics representation of raw data.
• A Pattern-Based Management System (PBMS) is a system for handling (storing / processing / retrieving) patterns extracted from raw data in order to efficiently support pattern matching and to exploit pattern- related operations generating intentional information.
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
28
Issues around the pattern definition
• The mapping from original raw data space to less populated ( compact) pattern space is always possible preserving (or, documenting) as much knowledge as possible from raw data space ( rich in semantics).
• A M:N mapping between raw data space and pattern space is permitted
• Perhaps, several levels of representation / abstraction exist (different levels of granularity, multi-dimensionality, recursion, hierarchies, etc.)
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
29
Issues around the PBMS definition
• A PBMS will cooperate with a DBMS storing raw data;
• A PBMS processes different kinds of queries (because of different user needs) on raw data and returns more intuitive results to users;
• A PBMS is useful in order to process those queries more efficiently than a normal DBMS would do;
• A PBMS will have its own mechanisms for representing and storing its entries (patterns), posing and processing queries, efficiently retrieving its entries.
P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002
30
Query Language Issues
• Given a datum, which pattern does it refer to? Which are the data that correspond to this pattern?
• Zoom-in, zoom-out a pattern. Pattern union, difference.• Composition of patterns (i.e., if A B and B C, then derive A
C). • What are values of the statistical measures for this pattern?
Which patterns fulfill a certain constraint on a statistical measure?
• Which are the patterns in the PBMS catalog? Which are the attributes or the statistical measures for this pattern type? Which pattern types relate to a certain statistical measure?
• Closed Form of the Language.