65
the open experiment database meta-learning for the masses Joaquin Vanschoren @joavanschoren

Open Machine Learning

Embed Size (px)

DESCRIPTION

This talk explores the possibility of turning machine learning research into open science and proposed concrete approaches to achieve this goal

Citation preview

Page 1: Open Machine Learning

the open experiment databasemeta-learning for the masses

Joaquin Vanschoren @joavanschoren

Page 2: Open Machine Learning

The Polymath story

Tim Gowers

Page 3: Open Machine Learning

Machine Learningare we doing it right?

Page 4: Open Machine Learning

Computer Science

• The scientific method• Make a hypothesis about the world

• Generate predictions based on this hypothesis

• Design experiments to verify/falsify the prediction

• Predictions verified: hypothesis might be true

• Predictions falsified: hypothesis is wrong

Page 5: Open Machine Learning

Computer Science

• The scientific method (for ML)• Make a hypothesis about (the structure of) given data

• Generate models based on this hypothesis

• Design experiments to measure accuracy of the models

• Good performance: It works (on this data)

• Bad performance: It doesn’t work on this data

• Aggregates (it works 60% of the time) not useful

Page 6: Open Machine Learning

Computer Science

• The scientific method (for ML)• Make a hypothesis about (the structure of) given data

• Generate models based on this hypothesis

• Design experiments to measure accuracy of the models

• Good performance: It works (on this data)

• Bad performance: It doesn’t work on this data

• Aggregates (it works 60% of the time) not usefulHow can data be characterized on

which the algorithm works well?

Page 7: Open Machine Learning

Computer Science

• The scientific method (for ML)• Make a hypothesis about (the structure of) given data

• Generate models based on this hypothesis

• Design experiments to measure accuracy of the models

• Good performance: It works (on this data)

• Bad performance: It doesn’t work on this data

• Aggregates (it works 60% of the time) not usefulHow can data be characterized on

which the algorithm works well? What is the effect of

parameter settings?

Page 8: Open Machine Learning

Meta-Learning

• The science of understanding which algorithms work well on which types of data

• Hard: thorough understanding of data and algorithms

• Requires good data: extensive experimentation

• Why is this separate from other ML research?• A thorough algorithm evaluation = a meta-learning study

• Original authors know algorithms and data best, have large sets of experiments, are (presumably) interested in knowing on which data their algorithms work well (or not)

Page 9: Open Machine Learning

Meta-Learning

With the right tools, can we make everyone a meta-learner?

ML algorithm design meta-learning

Large sets of experiments algorithm selection

algorithm characterizationdata characterization

bias-variance analysis

learning curvesdata insight

algorithm insight

algorithm comparisondatasets

source code

Page 10: Open Machine Learning

Open Machine Learning

Page 11: Open Machine Learning

Open science

World-wide Telescope

Page 12: Open Machine Learning

Open science

Microarray Databases

Page 13: Open Machine Learning

Open science

GenBank

Page 14: Open Machine Learning

Open machine learning?

• We can also be `open’• Simple, common formats to describe experiments, workflows,

algorithms,...

• Platform to share, store, query, interact

• We can go (much) further• Share experiments automatically (open source ML tools)

• Experiment on-the-fly (cheap, no expensive instruments)

• Controlled experimentation (experimentation engine)

Page 15: Open Machine Learning

Formalizing machine learning

• Unique names for algorithms, datasets, evaluation measures, data characterizations,... (ontology)

• Based on DMOP, OntoDM, KDOntology, EXPO,...

• Simple, structured way to describe algorithm setups, workflows and experiment runs

• Detailed enough to reproduce all experiments

Page 16: Open Machine Learning

Run

run

Page 17: Open Machine Learning

Run

run

Execution of a predefined setup

Page 18: Open Machine Learning

Run

run

Execution of a predefined setup

setup

Page 19: Open Machine Learning

Run

setup

run

Page 20: Open Machine Learning

Run

in

setup

data run

Page 21: Open Machine Learning

Run

in

setup

data

machine

run

Page 22: Open Machine Learning

Run

in out

setup

data data

machine

run

Page 23: Open Machine Learning

Run

in out

setup

data data

machine

run

Also: start time author status,...

Page 24: Open Machine Learning

Setup

setup

Page 25: Open Machine Learning

Setup

Plan of what we want to do

setup

Page 26: Open Machine Learning

Setup

Plan of what we want to do

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

Page 27: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part of

Hierarchical

Page 28: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part ofp=!

parameter setting

HierarchicalParameterized

Page 29: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part ofp=!

parameter setting

HierarchicalParameterized

Abstract/concrete

Page 30: Open Machine Learning

Algorithm Setup

algorithmsetup

Page 31: Open Machine Learning

Algorithm Setup

Fully defined algorithm configuration

algorithmsetup

part of

Page 32: Open Machine Learning

Algorithm Setup

Fully defined algorithm configuration

algorithmsetup

p=!parameter settingimplementation

part of

f(x)function

setup

Page 33: Open Machine Learning

Algorithm Setup

Fully defined algorithm configuration

algorithmsetup

p=!parameter settingimplementation

part of

f(x)function

setup

Page 34: Open Machine Learning

Algorithm Setup

algorithmsetup

p=!parameter setting

part of

f(x)function

setupimplementation

Page 35: Open Machine Learning

Algorithm Setup

algorithmsetup

p=!

algorithm

parameter setting

algorithm quality

part of

f(x)function

setup

p=?parameter

f(x)mathematical function

implementation

Page 36: Open Machine Learning

Algorithm Setup

algorithmsetup

p=!

algorithm

parameter setting

algorithm quality

part of

f(x)function

setup

p=?parameter

f(x)mathematical function

implementation

unique names

Page 37: Open Machine Learning

Algorithm Setup

algorithmsetup

p=!

algorithm

parameter setting

algorithm quality

part of

f(x)function

setup

p=?parameter

f(x)mathematical function

implementation

unique names

Roles: learner, base-learner, kernel,...

Page 38: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part of

Page 39: Open Machine Learning

Workflow Setup

setup

algorithmsetup

work!ow

part of

Page 40: Open Machine Learning

Workflow Setup

setup

algorithmsetup

work!ow

part of

source

connection

target

Workflow: components, connections, and parameters (inputs)

Page 41: Open Machine Learning

Workflow Setup

setup

algorithmsetup

work!ow

part of

source

connection

target

Workflow: components, connections, and parameters (inputs)

Also: ports datatype

Page 42: Open Machine Learning

WorkflowExample

Weka.ARFFLoader

p=! location= http://...

2:loadData

Weka.Evaluation

p=! F=10

3:crossValidate

Weka.SMO

p=! C=0.01

4:learner

Weka.RBF

f(x) 5:kernel

p=! G=0.01

p=! S=1

data

data

eval

pred

url evalu-ations

predic-tions

par

logRuns=true logRuns=falselogRuns=true

1:mainFlow

Page 43: Open Machine Learning

WorkflowExample

Weka.ARFFLoader

p=! location= http://...

2:loadData

Weka.Evaluation

p=! F=10

3:crossValidate

Weka.SMO

p=! C=0.01

4:learner

Weka.RBF

f(x) 5:kernel

p=! G=0.01

p=! S=1

data

data

eval

pred

url evalu-ations

predic-tions

par

logRuns=true logRuns=falselogRuns=true

1:mainFlow

86

Evaluations

7 Predictions

data data evalpred

predictions

evaluations

Weka.Instances

Page 44: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part of

Page 45: Open Machine Learning

ExperimentSetup

setup

algorithmsetup

work!ow experiment

part of

<X>experiment

variable

Page 46: Open Machine Learning

ExperimentSetup

setup

algorithmsetup

work!ow experiment

part of

<X>experiment

variable

setup

Also: experiment design, description, literature reference, author,...

Page 47: Open Machine Learning

Experiment Setup

Page 48: Open Machine Learning

Experiment SetupVariables: labeled tuples which can be

referenced in setups

Page 49: Open Machine Learning

Run

in out

setup

data data

machine

run

Also: start time author status,...

Page 50: Open Machine Learning

Run

data

dataset evaluation model predictions

Page 51: Open Machine Learning

Run

sourcedata run

dataset evaluation model predictions

Page 52: Open Machine Learning

Run

sourcedata run

dataset evaluation model predictions

data quality

Page 53: Open Machine Learning

EXPMLWeka.ARFFLoader

p=! location= http://...

2:loadData

Weka.Evaluation

p=! F=10

3:crossValidate

Weka.SMO

p=! C=0.01

4:learner

Weka.RBF

f(x) 5:kernel

p=! G=0.01

p=! S=1

data

data

eval

pred

url evalu-ations

predic-tions

par

logRuns=true logRuns=falselogRuns=true

1:mainFlow

Page 54: Open Machine Learning

Demo(preview)

Page 55: Open Machine Learning

Learning curves

0.2$

0.3$

0.4$

0.5$

0.6$

0.7$

0.8$

0.9$

1$

10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$

pred

ic've)accuracy)

percentage)of)original)dataset)size)

RandomForest$C45$Logis<cRegression$RacedIncrementalLogitBoostAStump$NaiveBayes$SVMARBF$

Examples

Page 56: Open Machine Learning

When does one algorithm outperform another?

Examples

Page 57: Open Machine Learning

When does one algorithm outperform another?

Examples

Page 58: Open Machine Learning

Bias-variance profile + effect of dataset size

Examples

Page 59: Open Machine Learning

Bias-variance profile + effect of dataset size

boosting

bagging

Examples

Page 60: Open Machine Learning

Bias-variance profile + effect of dataset size

Examples

Page 61: Open Machine Learning

Taking it furtherSeamless integration

• Webservice for sharing, querying experiments

• Integrate experiment sharing in ML tools (WEKA, KNIME, RapidMiner, R, ....)

• Mapping implementations, evaluation measures,...

• Online platform for custom querying, community interaction

• Semantic wiki: algorithm/data descriptions, rankings, ...

Page 62: Open Machine Learning

Experimentation Engine

• Controlled experimentation (Delve, MLComp)• Download datasets, build training/test sets

• Feed training and test sets to algorithms, retrieve predictions/models

• Run broad set of evaluation measures

• Benchmarking (Cross-Validation), learning curve analysis, bias-variance analysis, workflows(!)

• Compute data properties for new datasets

Page 63: Open Machine Learning

Why would you use it?(seeding)

• Let the system run the experiments for you

• Immediate, highly detailed benchmarks (no repeats)

• Up to date, detailed results (vs. static, aggregated in journals)

• All your results organized online (private?), anytime, anywhere

• Interact with people (weird results?)

• Get credit for all your results (e.g. citations), unexpected results

• Visibility, new collaborations

• Check if your algorithm really the best (e.g. active testing)

• On which datasets does it perform well/badly?

Page 64: Open Machine Learning

Question

Is open

machine learning possible?

Page 65: Open Machine Learning

http://expdb.cs.kuleuven.be

Thanks

Gracias

Xie XieDanke

Dank U

Merci

Efharisto

Dhanyavaad

GrazieSpasiba

Kia oraTesekkurler

Diolch

KöszönömArigato

Hvala

Toda