Upload
marina-santini
View
126
Download
2
Embed Size (px)
DESCRIPTION
The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools. It includes virtually all the algorithms described in this book. It is designed so that you can quickly try out existing methods on new datasets in flexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result of learning. As well as a wide variety of learning algorithms, it includes a wide range of preprocessing tools. This diverse and comprehensive toolkit is accessed through a common interface so that its users can compare different methods and identify those that are most appropriate for the problem at hand. (Witten and Frank, 2005)
Citation preview
1
Lecture 4: The Weka Package
Marina Santini, Uppsala University
Department of Linguistics and Philology, September 2013
Lec 4: The Weka Package
Machine Learning for Language Technology
Lec 4: The Weka Package2
OutlineRe: Witten & Frank (2005)
Introduction to Weka (Ch. 9) Getting Started: The Explorer (Ch. 10) The basic methods (4.3, 4.6, 4.7) Implementations (6.1, 6.3, 6.4) Evaluation (5.1-5.6)
Assignment 1
Lec 4: The Weka Package3
Introduction: What is Weka? WEKA: Waikato Environment for Knowledge Analysis Weka: the name of a flightless bird living in New Zealand
The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools;
Open source code (GNU General Public License ) written in Java
http://www.cs.waikato.ac.nz/ml/weka/downloading.html
Lec 4: The Weka Package4
The interface: The Explorer Uploading the input (ARFF format);
Preprocessing
Bulding a classifier;
Tuning the parameters;
Examining the output (evaluation)
Lec 4: The Weka Package5
Uploading the input (2nd_set_7webgenres.arff)
Lec 4: The Weka Package6
Preprocessing
Lec 4: The Weka Package7
Building a classifier
Lec 4: The Weka Package8
Methods & Implementations Decision Trees
J4.8 is Weka’s implementation of C.4.5 revision 8.
Instance-Based Learning IBk is a k-nearest-neighbor classifier that uses the Eucledian
distance as a default, other options include Manhattan, Chebyshev and Minkowski distances. The number of nearest neighbors (default k=1) can be specified explicitly in the parameter window.
Linear Models In VotedPerceptron, each weight vector contribute a certain
number of votes. SMO implements the sequential minimal optimization algorithm for
training a support vector classifier, (SVM) using polynomial or Gaussian kernels (Platt 1998, Keerthi et al. 2001).
Logistic builds linear logistic regression models
Lec 4: The Weka Package9
Tuning Parameters
Lec 4: The Weka Package10
Evaluation
Lec 4: The Weka Package11
Compare Results
Lec 4: The Weka Package12
Assignment 1 Classification: Decision Trees, Nearest Neighbors
and a linear classifier of your choice; Software package: Weka; Data sets:
German plural English past tense
Send WRITTEN REPORT to: [email protected]
Report deadline Fri 4 Oct 2013, week 40.
Lec 4: The Weka Package13
Thank you and Good Luck!