31
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/a 1 Advanced databases – Inferring implicit/new knowledge from data(bases): Tying it all together (a start) Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ ast update: 6 December 2007

Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Embed Size (px)

Citation preview

Page 1: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

1

Advanced databases –

Inferring implicit/new knowledge from data(bases):

Tying it all together (a start)

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

Last update: 6 December 2007

Page 2: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

2

Goal 1 for today

Wrap up yesterday‘s lecture and discussion + prepare you for the next assignment

Page 3: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

3Goal 2 for today: identify „missing links“ & point to solution approaches

(on the board)

Page 4: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

4

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Page 5: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

5

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Page 6: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

6

Mining association rules

Apriori: (slides from D. Delic)

Mining generalized association rules: (Karlsruhe slides)

Page 7: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

7

Main interestingness measures of association rules

Support of a rule A B

= no. of instances with A and B / no. of all instances

Confidence of a rule A B

= no. of instances with A and B / no. of instances with A

= support (A & B) / support (A)

Lift of a rule A B

= support (A & B) / [ support (A) * support (B) ]

What does this measure, and in what numerical interval can it be?

Page 8: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

8

Interesting- ness measures

Page 9: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

9

Interestingness as a constraint

So we‘re not interested in

„show me all patterns“

But

„show me all patterns that are interesting = that have properties X“

constraints!

Page 10: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

10

Examples from MINERULE

MINE RULE exemple as

SELECT DISTINCT 1..n Item as BODY, 1..1 Item as HEAD, SUPPORT, CONFIDENCE

WHERE HEAD.Item=« umbrellas » // also other fields, e.g. Date

FROM Purchase

GROUP BY Tid

HAVING COUNT(*)<6

EXTRACTING RULES WITH SUPPORT: 0.06, CONFIDENCE: 0.9

E.g., jacket flight_Dublin umbrellas (0.08,0.93)

Page 11: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

11

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Page 12: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

12

The site

Business understanding / problem definition:

* How do users search in this online catalog?

* Which search criteria are popular?

* Which are efficient?

[Berendt & Spiliopoulou,VLDB Journal 2000]

Page 13: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

13

The concept hierarchies / site ontology(excerpt)

SEITE1-...LI (1st page of a list)orSEITEn-...LI (further page)

LA („Land“) SA („Schulart“) SU („Suche“)

Page 14: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

14

Sequence mining – one result pattern: successful search for a school in Germany

a refinement

a repetition

a continuation

one example pattern

select t from node a b, template a * b as t where a.url startswith "SEITE1-" and a.occurrence = 1 and b.url contains "1SCHULE" and b.occurrence = 1 and (b.support / a.support) >= 0.2

(Berendt & Spiliopoulou, VLDB J. 2000)

/liste.html?offset=920&zeilen=20&anzahl=1323&sprache=de&sw_kategorie=de&erscheint=&suchfeld=&suchwert=&staat=de&region=by&schultyp=

/liste.html?offset=920&zeilen=20&anzahl=1323&sprache=de&sw_kategorie=de&erscheint=&suchfeld=&suchwert=&staat=de&region=by&schultyp=

Page 15: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

15

Sequences

Page 16: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

16

Generalized sequences, navigation patterns, hits in WUM

Page 17: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

17

Aggregated Logs: The basic internal representation in WUM

Page 18: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

18The confi-dence measure for genera-lized sequences

Page 19: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

19

Templates in the query language MINT, g-sequences, and navigation patterns

Page 20: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

20

Interestingness measures: Support (hits) and confidence

Page 21: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

21

Aggregated Logs, queries, and query results

Page 22: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

22

The basic idea of the WUM algorithm

Page 23: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

23

MINT can express 3 types of constraints (“predicates“)

Page 24: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

24

The WUM gseqm algorithm

(B predicates)

Page 25: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

25

Also for higher-order structures (graphs): Ex. MolFea

Page 26: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

26

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Page 27: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

27The basic idea

(on the board)

Page 28: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

28

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Page 29: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

29(One) basic idea

(on the board)

Page 30: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

30

Next lecture

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Applications

Page 31: Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

31

References and background reading; acknowledgements

Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207--216, Washington, D.C., May 1993. http://citeseer.ist.psu.edu/agrawal93mining.html

(presentation from Delic, D. (2002). Mining Association Rules with Rough Sets and Large Itemsets - A Comparative Study.)

Ramakrishnan Srikant and Rakesh Agrawal. Mining Generalized Association Rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995. http://citeseer.ist.psu.edu/srikant95mining.html

(presentation from http://www.kde.cs.uni-kassel.de/lehre/ss2004/kdd/folien/4Folie_VII.3_Assoziationsregeln.pdf)

P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proceedings of the Eight A CM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2002. 183 http://citeseer.ist.psu.edu/tan02selecting.html

MINERULE: R. Meo, G. Psaila and S. Ceri, An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, Vol. 2 (2), pp. 195-224, 1998. http://www.springerlink.com/index/L57188431Q027L73.pdf

WUM and the Schulweb study: Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75. http://vasarely.wiwi.hu-berlin.de/Home/berendt-spiliopoulou-vldbj00.pdf

MolFea (esp. The example): S. Kramer, L. De Raedt, C. Helma. Molecular Feature Mining in HIV Data, in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 2001.

De Raedt, L. (2002) A perspective on inductive databases. SIGKDD Explorations. Volume 4, Issue 2, 69-77. http://owl-workshop.man.ac.uk/acceptedLong/submission_25.pdf