Upload
meoland
View
2.561
Download
1
Embed Size (px)
DESCRIPTION
WEKA 3.6.1 Manual
Citation preview
AS714 Data Mining
DATA MINING TOOL WEKA . 1. 2. 3. 4. 5020428016 AS 714 1 2552 1 5020428005 5020428006 5020428012
AS714 Data Mining
2
AS714 Data Mining
Knowledge Discovery in Databases (KDD) Data Mining Data Mining Software Software Weka Software Open source Data mining
10 (DTM#2)
19 2552
3
AS714 Data Mining
1 Download WEKA1: address http://www.cs.waikato.ac.nz/ml/weka/ 2: Download
1
4
AS714 Data Mining 3: Windows ()
Stable GUI version Windows version weka-3-6-1jre.exe here
4: web browser
2
downloads Pop up direct link mirror ( 3)
5
AS714 Data Mining
Use this mirror
3
4
6
AS714 Data Mining 5: Downloads
Run Install WEKA save Hard disk
Save save weak-3-6-1jre.exe Install ( Save) Cancel
5
7
AS714 Data Mining
Install
Download
6
7
8
AS714 Data Mining
Download
8
9
AS714 Data Mining
2 WEKA 1: Weka 3.6.1 G:
My Computer G:\
9
10
AS714 Data Mining 2: G:\ weka-3-6-1jre
10
3: Weka 3.6.1 Next
11
I Agree
11
AS714 Data Mining
Next Install
12
13
12
AS714 Data Mining
C:\ Next
14
Install
13
AS714 Data Mining
15
16
14
AS714 Data Mining
17
Cancel
18
15
AS714 Data Mining
J2SE
19
16
AS714 Data Mining
Typical Accept
20
17
AS714 Data Mining
C:\
21
18
AS714 Data Mining
Finish
22
23
19
AS714 Data Mining
Weka 3.6.1
24
Weka 3.6.1
25
20
AS714 Data Mining
3 WEKA WEKA Waikato Environment for Knowledge Analysis WEKA Software free download GNU General Public License Java (Machine Learning) Graphic User Interface / GUI Software 1. ASCII arff, csv, C45 2. URL 3. JDBC Arff 1. ARFF = Attribute-Relation File Format 2. ASCII 21
AS714 Data Mining
@relation name @attribute att-name type numeric real (v1, v2, , vn) @data Arff o text file notepad o @relation relation_name o @attribute att_name value o @data @data 1,2,3,4 sample01.csv
22
AS714 Data Mining
ID,SEX,PASS/FAIL,Score,Class 1,M,Pass,45.5,B 2,F,Pass,56.78,B 3,M,Pass,89,A 4,F,Pass,77,A 5,M,Fail,32,C 6,F,Fail,12,D sample01.csv 7,M,Fail,35,C
Weka o (Univariate Statistic) = Nominal Numeric 23
AS714 Data Mining
SEX o SEX o Nominal o o M F M 5 F 5 o SCORE o Score o Numeric o o 10
o () 24
AS714 Data Mining
Minimum = 10
Maximum = 89
Mean = 48.728
StdDev = 26.585
Explorer WEKA 3.6.1
26
WEKA ICON
Start Program Weka 3.6.1 Weka 3.6
25
AS714 Data Mining
27
WEKA 3.6.1 (Weka GUI Chooser) 2
28
26
AS714 Data Mining
Applications () 1. Explorer: GUI (Graphical User Interface) 2. Experimenter: 3. KnowledgeFlow: 4. Simple CLI: Menu bar ()1. Program
-LogWindow: log stdout stderr
27
AS714 Data Mining
-Memory usage:
29
-Exit: 2. Visualization
30
Weka
-Plot: 2
28
AS714 Data Mining
-ROC: ROC (receiver operating characteristic) curve
31
32
-TreeVisualizer: (directed graphs) decision tree 29
AS714 Data Mining
-GraphVisualizer: XML BIF DOT format Bayesian networks -BoundaryVisualizer:
33
3. Tools
- ArffViewer: MDI (Multiple Document Interface) ARFF spreadsheet
30
AS714 Data Mining
- SqlViewer: Sql query - Bayes net editor: , Bayes nets4. Help
34
WEKA
- Weka homepage: Brower WEKA (http://www.cs.waikato.ac.nz/~ml/weka/) - HOWTOs,code snippets, etc.: Weka Wiki WEKA (http://weka.wiki.sourceforge.net/)
31
AS714 Data Mining
- Weka on Sourceforge: WEKA Sourceforge.net (http://sourceforge.net/projects/weka/) - SystemInfo: Java/WEKA the CLASSPATH
35
32
AS714 Data Mining
4ExplorerUser InterfaceSection Tabs
36
1. Preprocess: 2. Classify: 3. Cluster: 4. Associate: 5. Select attributes: 6. Visualize: 33
AS714 Data Mining
Explorer
37
Status Box
Log Button
Weka
Bird icon
Weka
Graphical output
Loading Data
1. Preprocessing
1. Open file 34
AS714 Data Mining
Hard disk
38
2. Open URL Address
39
35
AS714 Data Mining
3. Open DB
40
4. Generate choose DataGenerator 36
AS714 Data Mining
41Working with filter
42
37
AS714 Data Mining (Filters) 2
Supervised (attribute) (instance)
Unsupervised (attribute) (instance) Attribute Remove ( 43)
43
38
AS714 Data Mining
Preprocess Open file weather.arff
5 - outlook, temperature, humidity, windy, play 14 outlook Nominal 3
sunny 5 overcast 4 rainy 5
39
AS714 Data Mining
weather.arff
40
AS714 Data Mining
41
AS714 Data Mining
Weka Visualize all
Weka Visualize Scatter plot PlotSize PointSize Update
42
AS714 Data Mining
Scatter Plot
Weka Select Attributes ( Ctrl ) Update
o Weka Weka Classify
Classifier Choose (Functions) Test Option Use Training Set Start LinearRegression
Test Option (Num) Classifier Output 43
AS714 Data Mining
2. Classification
44
44
AS714 Data Mining
Classifier
45
bayes: functions: lazy: meta: misc: trees: rules: 45
AS714 Data Mining3. Clustering
46
46
AS714 Data Mining
Petallength
o Logistic Regression Weka Classify Logistic Classifier Choose (Functions) Test Option Use Training Set Start
Test Option (Nom) Classifier Output
47
AS714 Data Mining
Play
48
AS714 Data Mining
4. Associate
Choose Associator
Start 48
47
49
AS714 Data Mining
48
5. Select Attribute
50
AS714 Data Mining
49
51
AS714 Data Mining
50
6. Visualize
52
AS714 Data Mining
51
53
AS714 Data Mining
Appendix
54
AS714 Data Mining
Weka sample01.cvs ID.SEX,PASS/FAIL,Score,Class 1,M,Pass,45.5,B 2,F,Pass,56.78,B 3,M,Pass,89,A 4,F,Pass,77,A 5,M,Fail,32,C 6,F,Fail,12,D 7,M,Fail,35,C 8,F,Pass,62,B 9,M,Pass,68,B+ 10,F,Fail,10,D
55
AS714 Data Mining
Weka o (Filters) o Supervised (attribute) (instance)
56
AS714 Data Mining
Unsupervised (attribute) (instance)
57
AS714 Data Mining
Remove
Supervised o : AttrivuteSelection, ClassOrder, Discretize, NominalToBinary : Resample, SpreadSubsample, StratifiedREmoveFolds
58
AS714 Data Mining
AttributeSelection
o evaluator search o OK o Apply
ClassOrder 59
AS714 Data Mining
o classOrder seed o OK Apply Discretize
o attributeIndices Help
60
AS714 Data Mining
o OK o Apply
Discretize Help
NominalToBinary
61
AS714 Data Mining
o Nominal Binary o OK Apply
Resample
62
AS714 Data Mining
o sampleSizePercent o OK Apply SpreadSubsample
63
AS714 Data Mining
o distributionSpread o OK Apply StratifiedRemoveFolds
64
AS714 Data Mining
o fold o OK o Apply Unsupervised o Help Weka o : Add, AddCluster, AddExpression, AddNoise, ClusterMembership, Copy, Discretize, FirstOrder, MakeIndicator, MergTwoValues, NominalToBinary, Normalize, NumericToBinary, NumericTransform, Obfuscate, PKIDiscretize, NumericToBinary, NumericTransform, Obfuscate, PKIDiscretize, RandomProjection, Remove, RemoveType, 65
AS714 Data Mining
RemoveUseless, ReplaceMissingValues, Standardize, StringToNominal, StringToWordVector, Swap Values, TimeSeriesData, TimeSeriesTranslate o : Normalize, NonSparseToSpare, Randomize, RemoveFolds, RemoveMisclassified, RemovePercentage, RemoveRange, RemoveWithValues, Resample, SparseToNonSparse o Add filter o AddExpression filter o NominalToBinary filter o NumericToBinary filter o NumericTransform filter o Remove filter o ReplaceMissing Values filter o Standardize filter o AddCluster filter o Discretize filter o Normalize filter o RemoveType filter 66
AS714 Data Mining
Add filter
o Add missing value o OK o Apply AddCluster filter
o addCluster SimpleKMeans o ignoredAttributeIndices 67
AS714 Data Mining
o OK o Apply AddExpression filter
o addExpression o OK o Apply Discretize filter
68
AS714 Data Mining
o attributeIndices o bins o equal width equal depth False useEqualFrequency o OK o Apply Discretize
MergeTwo Values filter
69
AS714 Data Mining
o MergeTwo Values o attributeIndex o firstValueIndex secondValueIndex o OK o Apply NominalToBinary filter
o 0 1 NominalToBinary o attributeIndices o OK 70
AS714 Data Mining
o Apply
Normalize filter
o Normalize
0-1
o Apply Numeric ToBinary filter
71
AS714 Data Mining
o 0 1 NumericToBinary 0 0 1 o Apply
0
Numeric Transform filter
o NumericTransform abs o OK o Apply 72
AS714 Data Mining
Remove filter
o Remove attributeIndices o OK o Apply RemoveType filter
o RemoveType attributeType o OK o Apply ReplaceMissing Value
73
AS714 Data Mining
o ReplaceMissingValue Standardize filter
o z-score
o OK o Apply
74
AS714 Data Mining
o Randomize o RemoveFolds o RemovePercentage o RemoveRange o RemoveWithValues o Resample Randomize filter
o Randomize o OK o Apply RemoveFold filter
75
AS714 Data Mining
o RemoveFold numFolds o Save o OK o Apply RemovePercentage filter
o RemovePercentage percentage
o OK o Apply 76
AS714 Data Mining
RemoveRange filter
o RemoveRange instancesindices o OK o Apply
RemoveWithValues filter
o RemoveWith Values attributeIndex 77
AS714 Data Mining
o splitPoint o OK o Apply Resample filter
o Resample sampleSizePercent o save o OK o Apply
o Weka (Filters)
78
AS714 Data Mining
Supervised Unsupervised o
Weka79
AS714 Data Mining
o Market Basket analysis o o Transaction o
o o TID
80
AS714 Data Mining
o Boolean y 1 T100,I1,I2 T100, 1, 1, ?, ?, ? Weka ? (missing value) market.arff
Market.arff
81
AS714 Data Mining
Apriori o Associate o Associator Apriori
Apriori 82
AS714 Data Mining
o min support lowerBoundMinSuport 0.2 ( 20%) o min confidence minMetric metricType Confidence 0.5 ( 50%) o numRules 100
Apriori
83
AS714 Data Mining
16 market.arff
1: I5 I1 2: I4 I2 84
AS714 Data Mining
o transaction Nominal Ordinal o Weka dummy coding Nominal Ordinal outlook overcast, sunny, rainy rainy weather.nominal.arff
outlook = overcast, outlook = sunny, outlook =
weather.nominal.arff
85
AS714 Data Mining
8 weather.nominal.arff
1: overcast play = yes 2: cool (narmal) 3: windy = FALSE play = yes
86
AS714 Data Mining
o Nominal Ordinal o transaction Nominal ? (missing value) TID, atri_1, attri_2,, attri_n TID attri_i y ? o Associate Apriori Associator o min support min confidence numRules
Wekao Classification 87
AS714 Data Mining
o o o ID3 o J48
o Classifier () o Bayes Functions Lazy Meta Misc Trees Rules 88
AS714 Data Mining
Weather.nominal.arff
4
14
o 89
AS714 Data Mining
o o k-fold cross validation leave-one-out o Validation, Test data Training data 3/10, 3/10 4/10
Weka explorer
90
AS714 Data Mining
o Weka Explorer o Weather.nominal.arff o Filter Classify
91
AS714 Data Mining
Outlook 3
temperature 3
92
AS714 Data Mining
humidity 2
windy 2
93
AS714 Data Mining
choose
classifiers classifiers trees
use training set
94
AS714 Data Mining
training Confusion matrix ( ) () diagonal
weather.arff @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real 95
AS714 Data Mining
@attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes 96
AS714 Data Mining
overcast,81,75,FALSE,yes rainy,71,91,TRUE,no
o Discretize Filter filter
unsupervised
attribute 97
o bins 3
AS714 Data Mining
o OK o Apply
ID3
98
AS714 Data Mining
o Id3 Classify Classify classifiers o Use Training set Test option o Start o
trees
Id3
ID3
99
AS714 Data Mining
play 100% === Confusion Matrix === a b