154
Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann) WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Machine Learning with WEKA

Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

Embed Size (px)

Citation preview

Page 1: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

Department of Computer Science, University of Waikato, New Zealand

Bernhard Pfahringer(based on material by Eibe Frank, Mark

Hall, and Peter Reutemann)

WEKA: A Machine Learning Toolkit

The Explorer• Classification and

Regression• Clustering• Association Rules• Attribute Selection• Data Visualization

The Experimenter The Knowledge

Flow GUI Other Utilities Conclusions

Machine Learning with WEKA

Page 2: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer ([email protected])

The Weka or woodhen (Gallirallus australis) is an endemic bird of New Zealand. (Source: WikiPedia)

Page 3: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 3

WEKA: the software Machine learning/data mining software written in

Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features:

Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods

Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms

Page 4: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 4

History Project funded by the NZ government since 1993

Develop state-of-the art workbench of data mining tools Explore fielded applications Develop new fundamental methods

Page 5: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 5

History (2) Late 1992 - funding was applied for by Ian Witten 1993 - development of the interface and infrastructure

WEKA acronym coined by Geoff Holmes WEKA’s file format “ARFF” was created by Andrew Donkin

ARFF was rumored to stand for AAndrew’s RRidiculous FFile FFormat Sometime in 1994 - first internal release of WEKA

TCL/TK user interface + learning algorithms written mostly in C Very much beta software Changes for the b1 release included (among others):

“Ambiguous and Unsupported menu commands removed.”“Crashing processes handled (in most cases :-)”

October 1996 - first public release: WEKA 2.1

Page 6: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 6

History (3) July 1997 - WEKA 2.2

Schemes: 1R, T2, K*, M5, M5Class, IB1-4, FOIL, PEBLS, support for C5

Included a facility (based on Unix makefiles) for configuring and running large scale experiments

Early 1997 - decision was made to rewrite WEKA in Java Originated from code written by Eibe Frank for his PhD Originally codenamed JAWS (JAJAva WWeka SSystem)

May 1998 - WEKA 2.3 Last release of the TCL/TK-based system

Mid 1999 - WEKA 3 (100% Java) released Version to complement the Data Mining book Development version (including GUI)

Page 7: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 7

WEKA: versions There are several versions of WEKA:

WEKA 3.4: “book version” compatible with description in data mining book

WEKA 3.5.5: “development version” with lots of improvements

This talk is based on a nightly snapshot of WEKA 3.5.5 (12-Feb-2007)

Page 8: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 8

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

Page 9: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 9

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

Page 10: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 10

Page 11: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 11

java -jar weka.jar

Page 12: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 12

Explorer(pre-processing the data) Data can be imported from a file in various

formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL

database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for:

Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

Page 13: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 13

Page 14: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 14

Page 15: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 15

Page 16: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 16

Page 17: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 17

Page 18: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 18

Page 19: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 19

Page 20: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 20

Page 21: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 21

Page 22: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 22

Page 23: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 23

Page 24: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 24

Page 25: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 25

Page 26: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 26

Page 27: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 27

Page 28: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 28

Page 29: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 29

Page 30: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 30

Page 31: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 31

Page 32: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 32

Page 33: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 33

Page 34: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 34

Explorer: building “classifiers” Classifiers in WEKA are models for predicting

nominal or numeric quantities Implemented learning schemes include:

Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

“Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output

codes, locally weighted learning, …

Page 35: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 35

Page 36: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 36

Page 37: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 37

Page 38: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 38

Page 39: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 39

Page 40: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 40

Page 41: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 41

Page 42: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 42

Page 43: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 43

Page 44: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 44

Page 45: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 45

Page 46: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 46

Page 47: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 47

Page 48: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 48

Page 49: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 49

Page 50: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 50

Page 51: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 51

Page 52: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 52

Page 53: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 53

Page 54: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 54

Page 55: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 55

Page 56: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 56

Page 57: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 57

Page 58: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 58

Page 59: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 59

Page 60: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 60

Page 61: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 61

Page 62: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 62

Page 63: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 63

Page 64: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 64

Page 65: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 65

Page 66: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 66

Page 67: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 67

Page 68: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 68

Page 69: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 69

Page 70: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 70

Page 71: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 71

Page 72: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 72

Page 73: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 73

Page 74: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 74

Page 75: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 75

Page 76: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 76

Page 77: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 77

Page 78: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 78

Page 79: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 79

Page 80: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 80

Page 81: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 81

Page 82: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 82

Page 83: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 83

Page 84: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 84

Page 85: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 85

Page 86: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 86

Page 87: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 87

Page 88: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 88

Page 89: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 89

Page 90: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 90

Explorer: clustering data WEKA contains “clusterers” for finding groups of

similar instances in a dataset Some implemented schemes are:

k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true”

clusters (if given) Evaluation based on loglikelihood if clustering

scheme produces a probability distribution

Page 91: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 91

Page 92: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 92

Page 93: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 93

Page 94: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 94

Page 95: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 95

Page 96: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 96

Page 97: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 97

Page 98: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 98

Page 99: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 99

Page 100: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 100

Explorer: finding associations WEKA contains the Apriori algorithm (among

others) for learning association rules Works only with discrete data

Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and

support 2000) Apriori can compute all rules that have a given

minimum support and exceed a given confidence

Page 101: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 101

Page 102: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 102

Page 103: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 103

Page 104: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 104

Explorer: attribute selection Panel that can be used to investigate which

(subsets of) attributes are the most predictive ones Attribute selection methods contain two parts:

A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking

An evaluation method: correlation-based, wrapper, information gain, chi-squared, …

Very flexible: WEKA allows (almost) arbitrary combinations of these two

Page 105: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 105

Page 106: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 106

Page 107: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 107

Page 108: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 108

Page 109: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 109

Page 110: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 110

Page 111: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 111

Explorer: data visualization Visualization very useful in practice: e.g. helps to

determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and

pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style)

Color-coded class values “Jitter” option to deal with nominal attributes (and

to detect “hidden” data points) “Zoom-in” function

Page 112: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 112

Page 113: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 113

Page 114: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 114

Page 115: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 115

Page 116: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 116

Page 117: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 117

Page 118: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 118

Page 119: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 119

Page 120: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 120

Page 121: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 121

Page 122: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 122

Performing experiments Experimenter makes it easy to compare the

performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning

curve, hold-out Can also iterate over different parameter settings Significance-testing built in!

Page 123: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 123

Page 124: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 124

Page 125: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 125

Page 126: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 126

Page 127: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 127

Page 128: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 128

Page 129: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 129

Page 130: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 130

Page 131: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 131

Page 132: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 132

Page 133: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 133

The Knowledge Flow GUI

Java-Beans-based interface for setting up and running machine learning experiments

Data sources, classifiers, etc. are beans and can be connected graphically

Data “flows” through components: e.g.,“data source” -> “filter” -> “classifier” -> “evaluator”

Layouts can be saved and loaded again later cf. Clementine ™

Page 134: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 134

Page 135: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 135

Page 136: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 136

Page 137: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 137

Page 138: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 138

Page 139: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 139

Page 140: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 140

Page 141: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 141

Page 142: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 142

Page 143: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 143

Page 144: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 144

Page 145: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 145

Page 146: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 146

Page 147: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 147

Page 148: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 148

Page 149: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 149

Sourceforge.net – Downloads

Page 150: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 150

Sourceforge.net – Web Traffic

WekaWiki launched – 05/2005

WekaDoc Wiki introduced – 12/2005

Page 151: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 151

Projects based on WEKA 45 projects currently (30/01/07) listed on the WekaWiki Incorporate/wrap WEKA

GRB Tool Shed - a tool to aid gamma ray burst research YALE - facility for large scale ML experiments GATE - NLP workbench with a WEKA interface Judge - document clustering and classification RWeka - an R interface to Weka

Extend/modify WEKA BioWeka - extension library for knowledge discovery in biology WekaMetal - meta learning extension to WEKA Weka-Parallel - parallel processing for WEKA Grid Weka - grid computing using WEKA Weka-CG - computational genetics tool library

Page 152: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 152

WEKA and PENTAHO Pentaho – The leader in Open Source Business

Intelligence (BI) September 2006 – Pentaho acquires the Weka project

(exclusive license and SF.net page) Weka will be used/integrated as data mining component in

their BI suite Weka will be still available as GPL open source software Most likely to evolve 2 editions:

Community edition BI oriented edition

Page 153: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 153

Limitations of WEKA

Traditional algorithms need to have all data in main memory

==> big datasets are an issue Solution:

Incremental schemes Stream algorithms

MOA “MMassive OOnline AAnalysis”

(not only a flightless bird, but also extinct!)

Page 154: Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

04/21/23 University of Waikato 154

Conclusion: try it yourself! WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA (probably incomplete list of) WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer, Brent Martin, Peter Flach, Eibe Frank, Gabi Schmidberger, Ian H. Witten, J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall, Remco Bouckaert, Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong

Wang, Zhihai Wang