13
Lecture 4: The Weka Package Marina Santini, Uppsala University Department of Linguistics and Philology, September 2013 Lec 4: The Weka Package 1 Machine Learning for Language Technology

Lecture 4: The Weka Package

Embed Size (px)

DESCRIPTION

The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools. It includes virtually all the algorithms described in this book. It is designed so that you can quickly try out existing methods on new datasets in flexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result of learning. As well as a wide variety of learning algorithms, it includes a wide range of preprocessing tools. This diverse and comprehensive toolkit is accessed through a common interface so that its users can compare different methods and identify those that are most appropriate for the problem at hand. (Witten and Frank, 2005)

Citation preview

Page 1: Lecture 4: The Weka Package

1

Lecture 4: The Weka Package

Marina Santini, Uppsala University

Department of Linguistics and Philology, September 2013

Lec 4: The Weka Package

Machine Learning for Language Technology

Page 2: Lecture 4: The Weka Package

Lec 4: The Weka Package2

OutlineRe: Witten & Frank (2005)

Introduction to Weka (Ch. 9) Getting Started: The Explorer (Ch. 10) The basic methods (4.3, 4.6, 4.7) Implementations (6.1, 6.3, 6.4) Evaluation (5.1-5.6)

Assignment 1

Page 3: Lecture 4: The Weka Package

Lec 4: The Weka Package3

Introduction: What is Weka? WEKA: Waikato Environment for Knowledge Analysis Weka: the name of a flightless bird living in New Zealand

The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools;

Open source code (GNU General Public License ) written in Java

http://www.cs.waikato.ac.nz/ml/weka/downloading.html

Page 4: Lecture 4: The Weka Package

Lec 4: The Weka Package4

The interface: The Explorer Uploading the input (ARFF format);

Preprocessing

Bulding a classifier;

Tuning the parameters;

Examining the output (evaluation)

Page 5: Lecture 4: The Weka Package

Lec 4: The Weka Package5

Uploading the input (2nd_set_7webgenres.arff)

Page 6: Lecture 4: The Weka Package

Lec 4: The Weka Package6

Preprocessing

Page 7: Lecture 4: The Weka Package

Lec 4: The Weka Package7

Building a classifier

Page 8: Lecture 4: The Weka Package

Lec 4: The Weka Package8

Methods & Implementations Decision Trees

J4.8 is Weka’s implementation of C.4.5 revision 8.

Instance-Based Learning IBk is a k-nearest-neighbor classifier that uses the Eucledian

distance as a default, other options include Manhattan, Chebyshev and Minkowski distances. The number of nearest neighbors (default k=1) can be specified explicitly in the parameter window.

Linear Models In VotedPerceptron, each weight vector contribute a certain

number of votes. SMO implements the sequential minimal optimization algorithm for

training a support vector classifier, (SVM) using polynomial or Gaussian kernels (Platt 1998, Keerthi et al. 2001). 

Logistic builds linear logistic regression models

Page 9: Lecture 4: The Weka Package

Lec 4: The Weka Package9

Tuning Parameters

Page 10: Lecture 4: The Weka Package

Lec 4: The Weka Package10

Evaluation

Page 11: Lecture 4: The Weka Package

Lec 4: The Weka Package11

Compare Results

Page 12: Lecture 4: The Weka Package

Lec 4: The Weka Package12

Assignment 1 Classification: Decision Trees, Nearest Neighbors

and a linear classifier of your choice; Software package: Weka; Data sets:

German plural English past tense

Send WRITTEN REPORT to:  [email protected]

Report deadline Fri 4 Oct 2013, week 40.

Page 13: Lecture 4: The Weka Package

Lec 4: The Weka Package13

Thank you and Good Luck!