51
Data Mining 김김김 김김김김김김김 Data Mining Lab., Univ. of Seoul, Copyright ® 2008

김한준 서울시립대학교 Data Mining Lab., Univ. of Seoul, Copyright ® 2008

Embed Size (px)

Citation preview

PowerPoint Presentation

Data Mining

Data Mining Lab., Univ. of Seoul, Copyright 2008

1

datainformationknowledgeData Mining Lab., Univ. of Seoul, Copyright 2008

2 DataInformationKnowledgeShallow knowledgeOLAP knowledgeHidden knowledgeSQL Data WarehousingOLAP Data Mining/

Data Mining Lab., Univ. of Seoul, Copyright 2008

3Data Analysis and OLAPOnline Analytical Processing (OLAP)Interactive analysis of data, allowing data to be summarized and viewed in different ways in an online fashion (with negligible delay)Data that can be modeled as dimension attributes and measure attributes are called multidimensional data.Measure attributes measure some valuecan be aggregated upone.g. the attribute number of the sales relationDimension attributesdefine the dimensions on which measure attributes (or aggregates thereof) are viewede.g. the attributes item_name, color, and size of the sales relation4Cross Tabulation of sales by item-name and color

The table above is an example of a cross-tabulation (cross-tab), also referred to as a pivot-table.Values for one of the dimension attributes form the row headersValues for another dimension attribute form the column headersOther dimension attributes are listed on topValues in individual cells are (aggregates of) the values of the dimension attributes that specify the cell.5Data Cube

A data cube is a multidimensional generalization of a cross-tabCan have n dimensions; we show 3 below Cross-tabs can be used as views on a data cube6Online Analytical ProcessingPivoting: changing the dimensions used in a cross-tab is called Slicing: creating a cross-tab for fixed values onlySometimes called dicing, particularly when values for multiple dimensions are fixed.Rollup: moving from finer-granularity data to a coarser granularity Drill down: The opposite operation - that of moving from coarser-granularity data to finer-granularity data

7Data Warehousing

8Data Warehouse Schema

9

46Knowledge Discovery in large Databases , , Data Mining

Data Mining Lab., Univ. of Seoul, Copyright 2008

10Data Mining -

Data Mining Lab., Univ. of Seoul, Copyright 2008

11Data Mining -

12Data Mining -

, Data Mining Lab., Univ. of Seoul, Copyright 2008

13

Data Mining -

Entertainment (Yahoo)Comic&AnimationMovie&FilmFilm FestivalFilmMakingAnimatoinComputer AnimationFestivalAnimeComic BooksEditorial CartoonsMagazineComic StripNews&MediaMagazineShort FilmsScreen WritingAnimated GifsMagazineConventionsCartoonistReviewMagnaHistory

ManualAutomaticData Mining Lab., Univ. of Seoul, Copyright 2008

14Data Mining -

Data Mining Lab., Univ. of Seoul, Copyright 2008

15Data Mining ApplicationeCRM (): , push ()Event detection

Data Mining Lab., Univ. of Seoul, Copyright 2008

16Data Mining Application

/

?Data Mining Lab., Univ. of Seoul, Copyright 2008

17Data Mining Application

Data Mining Lab., Univ. of Seoul, Copyright 2008

18Data Mining ClassificationClusteringOutlier DetectionAssociation Rule MiningSequential Pattern MiningData Mining Lab., Univ. of Seoul, Copyright 2008

19Data Mining PatternClassificationIF age < 25 and salary > 40,000 THEN sports cars Prediction Model

ClusteringA : age=30s and job=IT and address = SeoulDescription Model

Association rules98% of people who purchase diapers also buy beer

Sequential pattern

Data Mining Lab., Univ. of Seoul, Copyright 2008

20Data Mining Classification (Categorization)

LearningModel

Least loyalcommon

profitable (Training Data) (Unknown Data)ClassificationData Mining Lab., Univ. of Seoul, Copyright 2008

21Classification Machine Learning

Pattern, Model(Intelligence)

Least loyalcommon

profitable Data Mining Lab., Univ. of Seoul, Copyright 2008

Classification Machine LearningLearnerClassifierObservedTrainingdataUnknown dataModelCategorizeddataModel of good credit(25 = 50 )Data Mining Lab., Univ. of Seoul, Copyright 2008

23Classification Web directory-based Search Engine

Data Mining Lab., Univ. of Seoul, Copyright 2008

24Classification CRM (Customer Relationship Management)

Direct Mail Marketing Data Mining Lab., Univ. of Seoul, Copyright 2008

25Classification

, , ,.. : 450,000 ~ 545,000 ClassificationData Mining Lab., Univ. of Seoul, Copyright 2008

26Data Mining ClassificationClusteringOutlier DetectionAssociation Rule MiningSequential Pattern MiningData Mining Lab., Univ. of Seoul, Copyright 2008

27Clustering

Data Mining Lab., Univ. of Seoul, Copyright 2008

28Clustering

= , Young urbancareer women Teenager having a computerData Mining Lab., Univ. of Seoul, Copyright 2008

29Clustering Summarization of large dataUnderstand the large customer data

Data organizationManage the large customer data

Outlier detectionFind unusual customer data

Classification/Association Rule Mining

Data Mining Lab., Univ. of Seoul, Copyright 2008

30Clustering Classification/Association Rule Mining cluster class Cluster Association Rule Mining

Data Mining Lab., Univ. of Seoul, Copyright 2008

31Clustering 32 clustering

Data Mining Lab., Univ. of Seoul, Copyright 2008

32Clustering Clusty.com vivisimo incorp.

Data Mining Lab., Univ. of Seoul, Copyright 2008

33Data Mining ClassificationClusteringOutlier DetectionAssociation Rule MiningSequential Pattern MiningData Mining Lab., Univ. of Seoul, Copyright 2008

34Association (Rule) MiningBasket Analysis

Data Mining Lab., Univ. of Seoul, Copyright 2008

35Association (Rule) MiningX YSupport : statistical significance|X Y|/NConfidence: accuracy|X Y|/|X|

Data Mining Lab., Univ. of Seoul, Copyright 2008

36Association RulesExample:

Association Rules1 => 3 with 50% support and 66% confidence3 => 1 with 50% support and 100% confidence

Data Mining Lab., Univ. of Seoul, Copyright 2008

37Association Rules A priori algorithm

Li : Large Item SetCi : Candidate Item SetB,C => E( 0.5, 1.0)Data Mining Lab., Univ. of Seoul, Copyright 2008

3847 80% 74% (, , ), (, , ) , Association Rule Data Mining Lab., Univ. of Seoul, Copyright 2008

39SummaryData Mining Lab., Univ. of Seoul, Copyright 2008

40Data Mining Processinteractive , iterative ongoing processing

Data Mining Lab., Univ. of Seoul, Copyright 2008

41Data Mining ClassificationMachine learning approachsupervised

ClusteringUnsupervised

Association Rule Miningunsupervised

Data Mining Lab., Univ. of Seoul, Copyright 2008

42Data Mining Retail/Marketing , , shelf planning, supermarket inventory planning

Banking "loyal" identify

InsuranceClaim Analysis risky customer identify identify

Medicine history identify / ,

DM

, Data Mining Lab., Univ. of Seoul, Copyright 2008

4344Data Mining : Confluence of Multiple DisciplinesData MiningDatabase TechnologyStatisticsOtherDisciplinesInformationScienceMachineLearningVisualizationData Mining Lab., Univ. of Seoul, Copyright 2008

44Data Mining Text MiningData Mining Lab., Univ. of Seoul, Copyright 2008

45Text MiningDifference with data miningAnalyze both raw data and textual information at the same timeRequire complicated FEATURE SELECTION technologiesMay include linguistic, lexical, and contextual techniques

Data Mining Lab., Univ. of Seoul, Copyright 2008

46Text MiningFeature SelectionStopword Zipfs LawDF (document frequency)-basedx2 Statistics-basedMutual InformationTerm Strengthetc

Data Mining Lab., Univ. of Seoul, Copyright 2008

47Text Mining Text Classification , FeatureExtraction 0.191149,, 0.134847 0.114641, 0.109833 0.099062,, 0.084554. . .Classi-ficationData Mining Lab., Univ. of Seoul, Copyright 2008

48Text Clustering49 clustering

Data Mining Lab., Univ. of Seoul, Copyright 2008

49Association Rule

Data Mining Lab., Univ. of Seoul, Copyright 2008

50XML Schema Version of Bank DTD

.. definitions of customer and depositor .

51Sheet: Sheet1Transaction IdPurchased Items 1.0{1, 2, 3}2.0{1, 4}3.0{1, 3}4.0{2, 5, 6}