Vowpal Wabbit

Preview:

Citation preview

VOWPAL WABBIT Paul Mineiro

O P E ND A T AS C I E N C EC O N F E R E N C E_

BOSTON 2015

@opendatasci

Vowpal Wabbit

Paul MineiroMicrosoft

Vowpal Wabbit: What Is It?

Machine Learning Toolkit and Research VehicleOpen Sourcehttps://github.com/JohnLangford/vowpal_wabbit/

Commercially Supportedhttp://azure.microsoft.com/en-us/services/machine-learning/

Currently sponsored by Microsoft ResearchFormerly sponsored by Yahoo! Research

Vowpal Wabbit: What Is It?

Machine Learning Toolkit and Research VehicleOpen Sourcehttps://github.com/JohnLangford/vowpal_wabbit/

Commercially Supportedhttp://azure.microsoft.com/en-us/services/machine-learning/

Currently sponsored by Microsoft ResearchFormerly sponsored by Yahoo! Research

It’s aMindset!

Iterate quickly

Smash giant data sets

Go beyond classification

Iterate quickly

Sub-Linear Debugging

Key Technology: Online Learning

Key Concept: Progressive Validation Loss

Goal: Rapid Interactive Experimentation

Sub-Linear Debugging

Key Technology: Online Learning

Key Concept: Progressive Validation Loss

Goal: Rapid Interactive Experimentation

Latency killsproductivity.

Sub-Linear Debugging: Pitfalls

Bias-Variance Tradeoffs (``Learning Curves Cross’’)

Lower Bias: model class matches target better.

Lower Variance: fit less sensitive to training set.Ideal: push on both.Usually: pushing on just one, e.g.,

New features: lowering bias, increasing variance.

Regularizing: lowering variance.

Smash giant data sets

There’s no data like more data

Subject to the Bayes limit, larger training sets admit beneficial tradeoffs of bias for variance, potentially resulting in substantially lower generalization error.

There’s no data like more data

Smash giant data sets

Strategy 1: Multinode

Multinode Training

Start cluster spanning daemonStart (many) vw and point them at the daemonTwo strategies available:

iterative (SGD + Averaging)L-BFGS

Both might work poorly for non-convex problems

Multinode Training

Start cluster spanning daemonStart (many) vw and point them at the daemonTwo strategies available:

iterative (SGD + Averaging)L-BFGS

Both might work poorly for non-convex problems

such as matrix factorization

Smash giant data sets

Strategy 2: Multicore

Multicore Training

Start several vw in daemon modeShared (lock-free!) stateSend data to children via netcat

Multicore Training

Start several vw in daemon modeShared (lock-free!) stateSend data to children via netcat

… and then hope for the best.

Go beyond classification

Structured Prediction

Exploration Learning

Go beyond classification

Structured Prediction

Go beyond classification

Structured Prediction: What Is It?Linear DynamicsNon-linear Dynamics

Equilibrium ThermodynamicsNon-equilibrium Thermodynamics

ClassificationStructured Prediction

Structured Prediction: What Is It?Linear DynamicsNon-linear Dynamics

Equilibrium ThermodynamicsNon-equilibrium Thermodynamics

ClassificationStructured Prediction

Shit we understood first

Everything else

Structured Prediction: ExamplesTask Input Output

Image Segmentation

Machine Translation Ces deux principes se tiennent à la croisée de la philosophie, de la politique, de l’économie, de la sociologie et du droit.

Both principles lie at the crossroads of philosophy, politics, economics,  sociology, and law.

Syntactic Analysis The monster ate a big sandwich.

The monster ate a big sandwich.

Structured Prediction HaikuA joint prediction

Across a single inputLoss measured jointly

Hal Daumé III

Structured Prediction via Reduction(Imperatively) Define Search Space:

Process your inputMake calls to predictInform vw about losses experienced

Testing uses exactly same code as training

Example: Entity and Relation ExtractionJames Earl Ray pleaded guilty in Memphis, Tenn. to

the assassination of civil rights leader

Martin Luther King Junior.

Example: Entity and Relation ExtractionJames Earl Ray pleaded guilty in Memphis, Tenn. to

the assassination of civil rights leader

Martin Luther King Junior.

Person Location

Person

Example: Entity and Relation ExtractionJames Earl Ray pleaded guilty in Memphis, Tenn. to

the assassination of civil rights leader

Martin Luther King Junior.

Person Location

Person

kill (James Earl Ray, Martin Luther King Junior)

ER Search Space Pseudocodepreds={}foreach pos in input: // left to right

thispred=predict(input,pos,preds,’entity’)preds=preds {(pos,thispred)}if (label) loss(label,thispred,’entity’)

foreach pair in zip(preds,preds): thispred=predict(input,pair,preds,’relation’)

preds=preds {(pos,thispred)}if (label) loss(label,thispred,’relation’)

ER Search Space Pseudocodepreds={}foreach pos in input: // left to right

thispred=predict(input,pos,preds,’entity’)preds=preds {(pos,thispred)}if (label) loss(label,thispred,’entity’)

foreach pair in zip(preds,preds): thispred=predict(input,pair,preds,’relation’)

preds=preds {(pos,thispred)}if (label) loss(label,thispred,’relation’)

Predict entities

preds={}foreach pos in input: // left to right

thispred=predict(input,pos,preds,’entity’)preds=preds {(pos,thispred)}if (label) loss(label,thispred,’entity’)

foreach pair in zip(preds,preds): thispred=predict(input,pair,preds,’relation’)

preds=preds {(pos,thispred)}if (label) loss(label,thispred,’relation’)

ER Search Space PseudocodePredict entities

Predict relations

preds={}foreach pos in input: // left to right

thispred=predict(input,pos,preds,’entity’)preds=preds {(pos,thispred)}if (label) loss(label,thispred,’entity’)

foreach pair in zip(preds,preds): thispred=predict(input,pair,preds,’relation’)

preds=preds {(pos,thispred)}if (label) loss(label,thispred,’relation’)

ER Search Space PseudocodePredict entities

Predict relations

Enforceconstraint

shere

Play with it.https://github.com/JohnLangford/vowpal_wabbit

Ask questions.https://groups.yahoo.com/neo/groups/vowpal_wabbit/info

Have fun.

FIN

Recommended