Upload
cynthia-felicity-andrews
View
217
Download
1
Embed Size (px)
Citation preview
PowerPoint Presentation
Data Mining
Data Mining Lab., Univ. of Seoul, Copyright 2008
1
datainformationknowledgeData Mining Lab., Univ. of Seoul, Copyright 2008
2 DataInformationKnowledgeShallow knowledgeOLAP knowledgeHidden knowledgeSQL Data WarehousingOLAP Data Mining/
Data Mining Lab., Univ. of Seoul, Copyright 2008
3Data Analysis and OLAPOnline Analytical Processing (OLAP)Interactive analysis of data, allowing data to be summarized and viewed in different ways in an online fashion (with negligible delay)Data that can be modeled as dimension attributes and measure attributes are called multidimensional data.Measure attributes measure some valuecan be aggregated upone.g. the attribute number of the sales relationDimension attributesdefine the dimensions on which measure attributes (or aggregates thereof) are viewede.g. the attributes item_name, color, and size of the sales relation4Cross Tabulation of sales by item-name and color
The table above is an example of a cross-tabulation (cross-tab), also referred to as a pivot-table.Values for one of the dimension attributes form the row headersValues for another dimension attribute form the column headersOther dimension attributes are listed on topValues in individual cells are (aggregates of) the values of the dimension attributes that specify the cell.5Data Cube
A data cube is a multidimensional generalization of a cross-tabCan have n dimensions; we show 3 below Cross-tabs can be used as views on a data cube6Online Analytical ProcessingPivoting: changing the dimensions used in a cross-tab is called Slicing: creating a cross-tab for fixed values onlySometimes called dicing, particularly when values for multiple dimensions are fixed.Rollup: moving from finer-granularity data to a coarser granularity Drill down: The opposite operation - that of moving from coarser-granularity data to finer-granularity data
7Data Warehousing
8Data Warehouse Schema
9
46Knowledge Discovery in large Databases , , Data Mining
Data Mining Lab., Univ. of Seoul, Copyright 2008
10Data Mining -
Data Mining Lab., Univ. of Seoul, Copyright 2008
11Data Mining -
12Data Mining -
, Data Mining Lab., Univ. of Seoul, Copyright 2008
13
Data Mining -
Entertainment (Yahoo)Comic&AnimationMovie&FilmFilm FestivalFilmMakingAnimatoinComputer AnimationFestivalAnimeComic BooksEditorial CartoonsMagazineComic StripNews&MediaMagazineShort FilmsScreen WritingAnimated GifsMagazineConventionsCartoonistReviewMagnaHistory
ManualAutomaticData Mining Lab., Univ. of Seoul, Copyright 2008
14Data Mining -
Data Mining Lab., Univ. of Seoul, Copyright 2008
15Data Mining ApplicationeCRM (): , push ()Event detection
Data Mining Lab., Univ. of Seoul, Copyright 2008
16Data Mining Application
/
?Data Mining Lab., Univ. of Seoul, Copyright 2008
17Data Mining Application
Data Mining Lab., Univ. of Seoul, Copyright 2008
18Data Mining ClassificationClusteringOutlier DetectionAssociation Rule MiningSequential Pattern MiningData Mining Lab., Univ. of Seoul, Copyright 2008
19Data Mining PatternClassificationIF age < 25 and salary > 40,000 THEN sports cars Prediction Model
ClusteringA : age=30s and job=IT and address = SeoulDescription Model
Association rules98% of people who purchase diapers also buy beer
Sequential pattern
Data Mining Lab., Univ. of Seoul, Copyright 2008
20Data Mining Classification (Categorization)
LearningModel
Least loyalcommon
profitable (Training Data) (Unknown Data)ClassificationData Mining Lab., Univ. of Seoul, Copyright 2008
21Classification Machine Learning
Pattern, Model(Intelligence)
Least loyalcommon
profitable Data Mining Lab., Univ. of Seoul, Copyright 2008
Classification Machine LearningLearnerClassifierObservedTrainingdataUnknown dataModelCategorizeddataModel of good credit(25 = 50 )Data Mining Lab., Univ. of Seoul, Copyright 2008
23Classification Web directory-based Search Engine
Data Mining Lab., Univ. of Seoul, Copyright 2008
24Classification CRM (Customer Relationship Management)
Direct Mail Marketing Data Mining Lab., Univ. of Seoul, Copyright 2008
25Classification
, , ,.. : 450,000 ~ 545,000 ClassificationData Mining Lab., Univ. of Seoul, Copyright 2008
26Data Mining ClassificationClusteringOutlier DetectionAssociation Rule MiningSequential Pattern MiningData Mining Lab., Univ. of Seoul, Copyright 2008
27Clustering
Data Mining Lab., Univ. of Seoul, Copyright 2008
28Clustering
= , Young urbancareer women Teenager having a computerData Mining Lab., Univ. of Seoul, Copyright 2008
29Clustering Summarization of large dataUnderstand the large customer data
Data organizationManage the large customer data
Outlier detectionFind unusual customer data
Classification/Association Rule Mining
Data Mining Lab., Univ. of Seoul, Copyright 2008
30Clustering Classification/Association Rule Mining cluster class Cluster Association Rule Mining
Data Mining Lab., Univ. of Seoul, Copyright 2008
31Clustering 32 clustering
Data Mining Lab., Univ. of Seoul, Copyright 2008
32Clustering Clusty.com vivisimo incorp.
Data Mining Lab., Univ. of Seoul, Copyright 2008
33Data Mining ClassificationClusteringOutlier DetectionAssociation Rule MiningSequential Pattern MiningData Mining Lab., Univ. of Seoul, Copyright 2008
34Association (Rule) MiningBasket Analysis
Data Mining Lab., Univ. of Seoul, Copyright 2008
35Association (Rule) MiningX YSupport : statistical significance|X Y|/NConfidence: accuracy|X Y|/|X|
Data Mining Lab., Univ. of Seoul, Copyright 2008
36Association RulesExample:
Association Rules1 => 3 with 50% support and 66% confidence3 => 1 with 50% support and 100% confidence
Data Mining Lab., Univ. of Seoul, Copyright 2008
37Association Rules A priori algorithm
Li : Large Item SetCi : Candidate Item SetB,C => E( 0.5, 1.0)Data Mining Lab., Univ. of Seoul, Copyright 2008
3847 80% 74% (, , ), (, , ) , Association Rule Data Mining Lab., Univ. of Seoul, Copyright 2008
39SummaryData Mining Lab., Univ. of Seoul, Copyright 2008
40Data Mining Processinteractive , iterative ongoing processing
Data Mining Lab., Univ. of Seoul, Copyright 2008
41Data Mining ClassificationMachine learning approachsupervised
ClusteringUnsupervised
Association Rule Miningunsupervised
Data Mining Lab., Univ. of Seoul, Copyright 2008
42Data Mining Retail/Marketing , , shelf planning, supermarket inventory planning
Banking "loyal" identify
InsuranceClaim Analysis risky customer identify identify
Medicine history identify / ,
DM
, Data Mining Lab., Univ. of Seoul, Copyright 2008
4344Data Mining : Confluence of Multiple DisciplinesData MiningDatabase TechnologyStatisticsOtherDisciplinesInformationScienceMachineLearningVisualizationData Mining Lab., Univ. of Seoul, Copyright 2008
44Data Mining Text MiningData Mining Lab., Univ. of Seoul, Copyright 2008
45Text MiningDifference with data miningAnalyze both raw data and textual information at the same timeRequire complicated FEATURE SELECTION technologiesMay include linguistic, lexical, and contextual techniques
Data Mining Lab., Univ. of Seoul, Copyright 2008
46Text MiningFeature SelectionStopword Zipfs LawDF (document frequency)-basedx2 Statistics-basedMutual InformationTerm Strengthetc
Data Mining Lab., Univ. of Seoul, Copyright 2008
47Text Mining Text Classification , FeatureExtraction 0.191149,, 0.134847 0.114641, 0.109833 0.099062,, 0.084554. . .Classi-ficationData Mining Lab., Univ. of Seoul, Copyright 2008
48Text Clustering49 clustering
Data Mining Lab., Univ. of Seoul, Copyright 2008
49Association Rule
Data Mining Lab., Univ. of Seoul, Copyright 2008
50XML Schema Version of Bank DTD
.. definitions of customer and depositor .
51Sheet: Sheet1Transaction IdPurchased Items 1.0{1, 2, 3}2.0{1, 4}3.0{1, 3}4.0{2, 5, 6}