36
1 Data Mining for Enlightenme nt Bettina Berendt www.cs.kuleuven.be/ ~berendt

1 Data Mining for Enlightenment Bettina Berendt ~berendt

Embed Size (px)

Citation preview

Page 1: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

1

Data Mining for Enlightenment

Bettina Berendtwww.cs.kuleuven.be/~berendt

Page 2: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

2

Basics

Data Mining (DM) – used in the sense of Knowledge Discovery:

“the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”

(Fayyad et al., 1996)

Enlightenment:

“man's emergence from his self-imposed immaturity”

(Kant, 1784)

Page 3: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

3

Data mining for ...

Page 4: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

4

Putting it together: (One) first tryPutting it together: (One) first try

What makes people happy?

Classification learning

Page 5: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

5

Results: Corpus-derived happiness factors

yay 86.67

shopping 79.56

awesome 79.71

birthday 78.37

lovely 77.39

concert 74.85

cool 73.72

cute 73.20

lunch 73.02

books 73.02

goodbye 18.81hurt 17.39tears 14.35cried 11.39upset 11.12sad 11.11cry 10.56died 10.07lonely 9.50crying 5.50 [Mihalcea & Liu, Proc. CAAW 2006]

Page 6: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

6

The approach: DM for human learning

5 wrong (but popular) metaphors

about the Internet

articulation and reflection

socialmultiple perspectives

active / con-structive

situated and authentic; multiple contexts

Successful learning

is / has ...

Refutation and DM tool support

Page 7: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

7

Interdisciplinary challenges

Engineering challenges

[Reputation challenges]

Interdisciplinary / application question challenges

Computational / DM methods challenges

Page 8: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

8

Metaphor 1

Page 9: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

9

The Internet is a textbook

Page 10: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

10

Multi-purpose tools (with DM) for situated and authentic Internet use

Text and link analysis

Page 11: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

11

Metaphor 2

Page 12: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

12

The Internet is television

Page 13: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

13

DM for active/constructive information use: Can you organize these results some more?

Page 14: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

14

Organisation of the literature /bibliography constructionDM for active/constructive information use (1): Intelligent bibliography creation

[Berendt, Dingel, & Hanser, Proc. ECDL 2006; Berendt & Krause, submitted; Berendt & Kolbe, in prep.]

Citation-based clustering,text analysis (TF.IDF, ...)for semi-automatic ontology learning;Embedded in authoring tool

Page 15: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

15

Metaphor 3

Page 16: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

16

The Internet is a pile of rubbish (biased / extremist / subjective)

Page 17: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

17

DM for analyzing multiple perspectives:

[Fortuna, Galleguillos, & Cristianini, in press]

What characterizes different news sources?

Nearest neighbour / best reciprocal hitfor document matching;Kernel Canonical Correlation Analysisand vector operationsfor finding topics and characteristic keywords

Page 18: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

18

DM for exploring multiple perspectives

Hyperlinks from blogs to mainstream news media Germany USA

[Berendt, Schlegel, & Koch, in Kommunikation, Partizipation und Wirkungen im Social Web, in press]

How do different news media source / refer to one another?

HTML wrappingand link analysis;(not shown:Named Entity Recognitionfor retrieving textual links)

Page 19: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

19DM for making people active explorers of multiple perspectives and multiple contexts

[Berendt & Trümper, PASCAL Symposium, 2008]

Clustering for semi-automatic ontology learning;Named Entity Recognition;Multi-dimensional similarity construct and filtering for nearest-neighbour search

Page 20: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

20

Metaphor 4

Page 21: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

21

The Internet is a dark cave

Page 22: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

22

Social tagging for making people see, explore and generate multiple perspectives

See also [Vuorikari, Ochoa, & Duval, submitted]

Page 23: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

23

Social browsing with the semantic pointerfor making people see & explore multiple perspectives

[Ferlež, PASCAL Symposium, 2008; www.jureferlez.name/2007/07/text-mining-for-semantically-enabled.html]

Inter-page text-block similarity analysis;Client-side usage tracking and real-time matching

Page 24: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

24

DM for exploring how multiple perspectives evolve

[Griffith, 2007; http://wikiscanner.virgil.gr/]

Why is Scientology an uncontroversial organisation?

Usage tracking,feature constructionby table lookup

Page 25: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

25

DM for exploring how multiple perspectives evolve

Page 26: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

26

Metaphor 5

Page 27: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

27

The Internet is a library

Page 28: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

28

... But you‘re a document too!... but you‘re a document too!

[Owad, 2006; www.applefritter.com/bannedbooks]

Where do people live who will buy the Qur‘an soon?

Page 29: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

29

DM for demonstrating the Internet‘s inference capabilities (how to create that book map)

Attribute matching in diff. schemas, view construction

Page 30: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

30

DM for articulation and reflection

Repetition Organisation Elaboration

[Berendt, in Neues

Handbuch Hochschul-

lehre, 2006]

Proxy server

LogfileASP

Usage tracking, semantic graph coarsening

Page 31: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

31

A conclusion ... and a vision

Page 32: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

32

A conclusion ... and a vision

New happiness factors:

yay 86.67

shopping 79.56

awesome 79.71

learning 86.67

understanding 79.56

democracy 79.71

Page 33: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

33

Caveat 1: Data preparation

One approach:

Tools for active (interactive) wrapper learning

Page 34: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

34

Caveat 2: “Digging and surfing“

Reductive understanding is not always adequate and/or desired

Person

Context

Task

...

One approach: Treat it as a competency

Page 35: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

35

Caveat 3: Cultural/economic biasLand area Population Internet users

[www.worldmapper.org]

Page 36: 1 Data Mining for Enlightenment Bettina Berendt  ~berendt

36

… Questions? Comments? Other?

Thank you …