16
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures Transparencies prepared by Ho Tu Bao [JAIST]

Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

  • Upload
    gloria

  • View
    46

  • Download
    2

Embed Size (px)

DESCRIPTION

Brief introduction to lectures. Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge. Transparencies prepared by Ho Tu Bao [JAIST]. Lecture 5: Automatic Cluster Detection . - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

1

Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered

knowledge

Brief introduction to lectures

Transparencies prepared by Ho Tu Bao [JAIST]

Page 2: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

2

Lecture 5: Automatic Cluster Detection

•One of the most widely used KDD classification techniques for unsupervised data.

•Content of the lecture1. Introduction2. Partitioning Clustering3. Hierarchical Clustering4. Software and case-studies

•Prerequisite: Nothing special

Page 3: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

3

Partitioning Clustering

:conditions following the satisfying ,clusters called often n),(K X of P,...,P,P subsets empty-non disjointK of collection a is

}x,...,x,{xX objects n of set a of partition AK21

n21

• Each cluster must contain at least one object• Each object must belong to exactly one group

P of components called are P },P,...,P,{PP partition the Denote iK21

X.P...PP:X is union their (2) ji ,P and P all for 0PP :disjoint are they (1)

K21

jiji

Page 4: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

4

Partitioning ClusteringWhat is a “good” partitioning clustering?Key ideas: Objects in each group are similar and objects between different groups are dissimilar.

Minimize the within-group distance and Maximize the between-group distance.

Notice: Many ways to define the “within-group distance” (the average of distance to the group’s center or the average of distance between all pairs of objects, etc.) and to define the “between-group distance”. It is in general impossible to find the optimal clustering.

}},,{,},{,},,,,{{321

65372109741 PPP

xxxxxxxxxxP

Page 5: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

5

Hierarchical Clustering

A hierarchical clustering is a sequence of partitions in which each partition is nested into the next partition in the sequence.

Partition Q is nested into partition P if every component of Q is a subset of a component of P.

(This definition is for bottom-up hierarchical clustering. In case of top-down hierarchical clustering, “next” becomes “previous”).

},,{},,{},,,,,{ 65382109741 xxxxxxxxxxP },{},{},,{},,{},,,{ 63582107941 xxxxxxxxxxQ

Page 6: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

6

Bottom-up Hierarchical Clustering

654321 ,,,,, xxxxxx

},{},,,,{ 654321 xxxxxx

},{},{},,,{ 654321 xxxxxx

}{},{},{},,{},{ 654321 xxxxxx

}{},{},{},{},{},{ 654321 xxxxxx x1 x2 x3 x4 x5 x6

Page 7: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

7

Top-Down Hierarchical Clustering

654321 ,,,,, xxxxxx

},{},,,,{ 654321 xxxxxx

},{},{},,,{ 654321 xxxxxx

}{},{},{},,{},{ 654321 xxxxxx

}{},{},{},{},{},{ 654321 xxxxxx

x1 x2 x3 x4 x5 x6

Page 8: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

8

OSHAM: Hybrid Model

WisconsinBreastCancerData

Attributes

Brief Descriptionof Concepts

ConceptHierarchy

Multiple Inheritance Concepts

Discovered Concepts

Page 9: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

9

Lecture 1: Overview of KDDLecture 2: Preparing dataLecture 3: Decision tree induction Lecture 4: Mining association rulesLecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge

Brief introduction to lectures

Page 10: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

10

Lecture 6: Neural networks•One of the most widely used KDD classification

techniques.•Content of the lecture

•Prerequisite: Nothing special

1. Neural network representation2. Feed-forward neural networks3. Using back-propagation algorithm4. Case-studies

Page 11: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

11

Lecture 1: Overview of KDDLecture 2: Preparing dataLecture 3: Decision tree induction Lecture 4: Mining association rulesLecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge

Brief introduction to lectures

Page 12: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

12

Lecture 7 Evaluation of discovered knowledge

•One of the most widely used KDD classification techniques.

•Content of the lecture1. Cross validation2. Bootstrapping3. Case-studies

•Prerequisite: Nothing special

Page 13: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

13

Out-of-sample testing

HistoricalData

(warehouse)Samplingmethod

Sampledata

Samplingmethod

Trainingdata Induction

method

Testingdata

Errorestimation

Model

2/3

1/3

error

The quality of the test sample estimate is dependent on the number of test cases and the validity of the independent assumption

Page 14: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

14

Cross ValidationHistorical

Data(warehouse)

Samplingmethod

Sampledata

Samplingmethod

Sample 1 Inductionmethod

Sample nError

estimation

Model

Run’serror

10-fold cross validation appears adequate (n = 10)

Sample 2

...

Errorestimation

iterate

- Mutually exclusive- Equal size

Page 15: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

15

randomly split the data set into 3 subsets of equal size

run on each 2 subsets as training

data to find knowledge

test on the rest subset as testing data to evaluate the accuracy

average theaccuracies asfinal evaluation

2

3

1

1

2

2A data set

A method to be evaluated

Evaluation: k-fold cross validation (k=3)1 3

3 2

3 1

Page 16: Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

16

Outline of the presentation

Objectives, Prerequisite and Content

Brief Introduction to Lectures

DiscussionandConclusion

This presentation summarizes the content and organizationof lectures in module “Knowledge Discovery and Data Mining”