30
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Pa tterns for N ext-Generation Da tabase Systems PANDA

A First Attempt towards a Logical Model for the PBMS

  • Upload
    genero

  • View
    40

  • Download
    1

Embed Size (px)

DESCRIPTION

Pa tterns for N ext-Generation Da tabase Systems PANDA. A First Attempt towards a Logical Model for the PBMS. PANDA Meeting, Milano , 18 April 2002 National Technical University of Athens. Overview. General Understanding of the PBMS Mathematical Background - PowerPoint PPT Presentation

Citation preview

Page 1: A First Attempt towards a Logical Model for the PBMS

A First Attempt towards a Logical Model for the

PBMS

PANDA Meeting, Milano, 18 April 2002National Technical University of Athens

Patterns for Next-Generation Database Systems

PANDA

Page 2: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

2

Overview

• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions

Page 3: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

3

Overview

• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions

Page 4: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

4

General Framework

Meta-Pattern Type + Patter Types = PBMS Catalog

Pattern Layer =PBMS Content

Raw Data

Cluster 3

Cluster 2

Cluster 1

Assoc. Rule n

Assoc. Rule 2

Assoc. Rule 1

Decision Tree 1

Ass. Rule Algorithm

Dec. Tree Algorithm

DBSCAN Cluster

Algorithm

belong to

belongs tobelongto

Association Rule Type

DBSCAN Cluster Type

Decision Tree Type

belong to

Meta_Pattern Type

PBMS

Pattern TypeLayer

Meta-Pattern TypeLayer Language

Page 5: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

5

General IdeaMeta-Pattern Type+ Language Relation + Language

• a Name • a Condensed Expression • an Extension and

Language

• a Name • a Schema • an Extension and

Relational Calculus

Pattern Type Relational Table• AssociationRuleType • head :- body

• ext(AssociationRuleType)

• Buys• session_id,date,item, price

• ext(Buys)

Pattern Tuple

Buys(x,_,beer,_):- Buys(x,_,pampers,_)

Buys(34,4/4/2002,beer,2)

Page 6: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

6

Overview

• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective • Conclusions

Page 7: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

7

Mathematical Background

Assumptions from the definition:• There exists a data space and a pattern space.• There always exist M:N relationships among data and

patterns.

Data Space Pattern Space

Page 8: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

8

Characteristics of data and pattern space

• Each data item is characterized by a finite number of features N.

• dom(x) the domain of each feature. • Data space DN dom(A1)x…xdom(AN)• Proposal: all dom(x) are infinitely countable +

consider cases for DN (whether it is finite or not).

• Each pattern is characterized by a finite number of features M.

• Pattern space DM dom(A1)x…xdom(AM)• Proposal: all dom(x) are infinitely countable + DM is

clearly finite.

Page 9: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

9

Statistical Measures

The data-pattern relationship fDP has:

• participation measures for the relationship;• importance measures for a data item;• importance measures for a pattern.

Data Space Pattern Space

Page 10: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

10

Statistical Measures

• Richness of representation =relationships captured by the condensed representation

total number of relationships

• Compactness of the representation = size(DM)*M

size(DN)*N

Page 11: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

11

Overview

• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions

Page 12: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

12

General Framework

Meta-Pattern Type + Patter Types = PBMS Catalog

Pattern Layer =PBMS Content

Raw Data

Cluster 3

Cluster 2

Cluster 1

Assoc. Rule n

Assoc. Rule 2

Assoc. Rule 1

Decision Tree 1

Ass. Rule Algorithm

Dec. Tree Algorithm

DBSCAN Cluster

Algorithm

belong to

belongs tobelongto

Association Rule Type

DBSCAN Cluster Type

Decision Tree Type

belong to

Meta_Pattern Type

PBMS

Pattern TypeLayer

Meta-Pattern TypeLayer Language

Page 13: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

13

Pattern Types

• Intentional Description of a Pattern Type as follows:– PID

– Explicit Relationship: fDPi:DN→Di

M.

– Relationship Expression

– Statistical Measures.

• Extensional Description (or Pattern Extension) of a Pattern Type : a finite set of patterns

• Data extension of of a Pattern Type : a countable? set of data items

Page 14: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

14

Example

Pattern Type Intentional Description

[small part of] Pattern Type Extensional Description

• PID• Explicit Relationship• Relationship Expression

• Statistical Measures

• PID123

• fDPi:DN→Di

M ={(PID123,RID124),…}

• Buys(x,_,beer,_):-

Buys(x,_,pampers,_) • Coverage=80%, Confidence=90%

Page 15: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

15

General Framework

Meta-Pattern Type + Patter Types = PBMS Catalog

Pattern Layer =PBMS Content

Raw Data

Cluster 3

Cluster 2

Cluster 1

Assoc. Rule n

Assoc. Rule 2

Assoc. Rule 1

Decision Tree 1

Ass. Rule Algorithm

Dec. Tree Algorithm

DBSCAN Cluster

Algorithm

belong to

belongs tobelongto

Association Rule Type

DBSCAN Cluster Type

Decision Tree Type

belong to

Meta_Pattern Type

PBMS

Pattern TypeLayer

Meta-Pattern TypeLayer Language

Page 16: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

16

Meta-Pattern Types

• Intentional Description of a Pattern Type as follows:– Name

– Condensed Expression

– [Meta]Statistical Measures.

– ?? Schema Attributes ??

• Extensional Description of a Meta-Pattern Type : a finite set of pattern types

Page 17: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

17

ExampleMeta-Pattern Type Intentional Description

[small part of] Meta-Pattern Type Extensional Description

• Name• Condensed Expression• [Meta]Statistical

Measures• Schema Attributes??

•AssociationRuleType •head :- body•Coverage: Float[0..1],

Confidence: Float[0..1]•PID, Head, Body ??

Pattern Type Intentional Description

[small part of] Pattern Type Extensional Description

• PID• Explicit Relationship• Relationship Expression

• Statistical Measures

• PID123

• fDPi:DN→Di

M ={(PID123,RID124),…}

• Buys(x,_,beer,_):-

Buys(x,_,pampers,_) • Coverage=80%, Confidence=90%

Page 18: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

18

Which language to choose?

• Relational Calculus, Datalog and Stratified Datalog ?– Powerful but not elegant for all the patterns that we

might want to express…

• Constraint database approach ?– We cannot guarantee a finite representation of the

result for non-linear constraints…

Page 19: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

19

Which language to choose?

Page 20: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

20

Which language to choose?• Remove recursion ?

– Cannot express interesting patterns like transitive closure…

• Only linear constraints ?– Cannot express interesting patterns like cyclic clusters…

– Approximation of polynomials through sets of linear constraints ? Not elegant…

• Forget constraints and describe every pattern type as a simple predicate ?– Loss of all the declarative information on the nature of the

pattern type …

• So, what to do? Possible dead-end due to the paradigm?

Page 21: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

21

Overview

• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions

Page 22: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

22

How to build it?

• Each of the pattern types implemented as a Class. • The different pattern types defined as specializations

of a Generic Pattern Class.

• Treat pattern types as predicates, with semantics computed by a computationally complete procedural language [e.g., PL/SQL, C++, …]? – Instead of fundamental research we turn to feasibility

issues…

• What about behavior?

Page 23: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

23

General Framework

Meta-Pattern Type + Patter Types = PBMS Catalog

Pattern Layer =PBMS Content

PBMS

Cluster 3

Cluster 2

Cluster 1

Assoc. Rule n

Assoc. Rule 2

Assoc. Rule 1

Decision Tree 1

IN ININ

Association Rule Class

Cluster Class

Decision Tree Class

ISA

GenericClass

Set of DDL/DMLLanguages

How to build it?

Page 24: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

24

Overview

• General Understanding of the PBMS• Mathematical Background• MetaModel: Entities and Language• The Software Engineering Perspective• Conclusions

Page 25: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

25

Conclusions• Followed the Datalog paradigm (need for deductive

capabilities) enhanced with constraints (need for elegance)

• Reduced the problem to the specification of a proper language for the description of pattern types

• Fundamental language limitations when considered constraints

• Dilemma: – Change paradigm?

– Stick with this paradigm and focus on engineering issues?

– …Any other suggestions ?…

Page 26: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

26

Thank you …

Page 27: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

27

Definitions from the minutes of Athens meeting

• Pattern is a compact and rich in semantics representation of raw data.

• A Pattern-Based Management System (PBMS) is a system for handling (storing / processing / retrieving) patterns extracted from raw data in order to efficiently support pattern matching and to exploit pattern- related operations generating intentional information.

Page 28: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

28

Issues around the pattern definition

• The mapping from original raw data space to less populated ( compact) pattern space is always possible preserving (or, documenting) as much knowledge as possible from raw data space ( rich in semantics).

• A M:N mapping between raw data space and pattern space is permitted

• Perhaps, several levels of representation / abstraction exist (different levels of granularity, multi-dimensionality, recursion, hierarchies, etc.)

Page 29: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

29

Issues around the PBMS definition

• A PBMS will cooperate with a DBMS storing raw data;

• A PBMS processes different kinds of queries (because of different user needs) on raw data and returns more intuitive results to users;

• A PBMS is useful in order to process those queries more efficiently than a normal DBMS would do;

• A PBMS will have its own mechanisms for representing and storing its entries (patterns), posing and processing queries, efficiently retrieving its entries.

Page 30: A First Attempt towards a Logical Model for the PBMS

P. Vassiliadis. PANDA Meeting, Milano, 18 April 2002

30

Query Language Issues

• Given a datum, which pattern does it refer to? Which are the data that correspond to this pattern?

• Zoom-in, zoom-out a pattern. Pattern union, difference.• Composition of patterns (i.e., if A B and B C, then derive A

C). • What are values of the statistical measures for this pattern?

Which patterns fulfill a certain constraint on a statistical measure?

• Which are the patterns in the PBMS catalog? Which are the attributes or the statistical measures for this pattern type? Which pattern types relate to a certain statistical measure?

• Closed Form of the Language.