1 Data Mining for Enlightenment Bettina Berendt ~berendt

Preview:

Citation preview

1

Data Mining for Enlightenment

Bettina Berendtwww.cs.kuleuven.be/~berendt

2

Basics

Data Mining (DM) – used in the sense of Knowledge Discovery:

“the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”

(Fayyad et al., 1996)

Enlightenment:

“man's emergence from his self-imposed immaturity”

(Kant, 1784)

3

Data mining for ...

4

Putting it together: (One) first tryPutting it together: (One) first try

What makes people happy?

Classification learning

5

Results: Corpus-derived happiness factors

yay 86.67

shopping 79.56

awesome 79.71

birthday 78.37

lovely 77.39

concert 74.85

cool 73.72

cute 73.20

lunch 73.02

books 73.02

goodbye 18.81hurt 17.39tears 14.35cried 11.39upset 11.12sad 11.11cry 10.56died 10.07lonely 9.50crying 5.50 [Mihalcea & Liu, Proc. CAAW 2006]

6

The approach: DM for human learning

5 wrong (but popular) metaphors

about the Internet

articulation and reflection

socialmultiple perspectives

active / con-structive

situated and authentic; multiple contexts

Successful learning

is / has ...

Refutation and DM tool support

7

Interdisciplinary challenges

Engineering challenges

[Reputation challenges]

Interdisciplinary / application question challenges

Computational / DM methods challenges

8

Metaphor 1

9

The Internet is a textbook

10

Multi-purpose tools (with DM) for situated and authentic Internet use

Text and link analysis

11

Metaphor 2

12

The Internet is television

13

DM for active/constructive information use: Can you organize these results some more?

14

Organisation of the literature /bibliography constructionDM for active/constructive information use (1): Intelligent bibliography creation

[Berendt, Dingel, & Hanser, Proc. ECDL 2006; Berendt & Krause, submitted; Berendt & Kolbe, in prep.]

Citation-based clustering,text analysis (TF.IDF, ...)for semi-automatic ontology learning;Embedded in authoring tool

15

Metaphor 3

16

The Internet is a pile of rubbish (biased / extremist / subjective)

17

DM for analyzing multiple perspectives:

[Fortuna, Galleguillos, & Cristianini, in press]

What characterizes different news sources?

Nearest neighbour / best reciprocal hitfor document matching;Kernel Canonical Correlation Analysisand vector operationsfor finding topics and characteristic keywords

18

DM for exploring multiple perspectives

Hyperlinks from blogs to mainstream news media Germany USA

[Berendt, Schlegel, & Koch, in Kommunikation, Partizipation und Wirkungen im Social Web, in press]

How do different news media source / refer to one another?

HTML wrappingand link analysis;(not shown:Named Entity Recognitionfor retrieving textual links)

19DM for making people active explorers of multiple perspectives and multiple contexts

[Berendt & Trümper, PASCAL Symposium, 2008]

Clustering for semi-automatic ontology learning;Named Entity Recognition;Multi-dimensional similarity construct and filtering for nearest-neighbour search

20

Metaphor 4

21

The Internet is a dark cave

22

Social tagging for making people see, explore and generate multiple perspectives

See also [Vuorikari, Ochoa, & Duval, submitted]

23

Social browsing with the semantic pointerfor making people see & explore multiple perspectives

[Ferlež, PASCAL Symposium, 2008; www.jureferlez.name/2007/07/text-mining-for-semantically-enabled.html]

Inter-page text-block similarity analysis;Client-side usage tracking and real-time matching

24

DM for exploring how multiple perspectives evolve

[Griffith, 2007; http://wikiscanner.virgil.gr/]

Why is Scientology an uncontroversial organisation?

Usage tracking,feature constructionby table lookup

25

DM for exploring how multiple perspectives evolve

26

Metaphor 5

27

The Internet is a library

28

... But you‘re a document too!... but you‘re a document too!

[Owad, 2006; www.applefritter.com/bannedbooks]

Where do people live who will buy the Qur‘an soon?

29

DM for demonstrating the Internet‘s inference capabilities (how to create that book map)

Attribute matching in diff. schemas, view construction

30

DM for articulation and reflection

Repetition Organisation Elaboration

[Berendt, in Neues

Handbuch Hochschul-

lehre, 2006]

Proxy server

LogfileASP

Usage tracking, semantic graph coarsening

31

A conclusion ... and a vision

32

A conclusion ... and a vision

New happiness factors:

yay 86.67

shopping 79.56

awesome 79.71

learning 86.67

understanding 79.56

democracy 79.71

33

Caveat 1: Data preparation

One approach:

Tools for active (interactive) wrapper learning

34

Caveat 2: “Digging and surfing“

Reductive understanding is not always adequate and/or desired

Person

Context

Task

...

One approach: Treat it as a competency

35

Caveat 3: Cultural/economic biasLand area Population Internet users

[www.worldmapper.org]

36

… Questions? Comments? Other?

Thank you …

Recommended