What is machine learning? - Rice University · (c) Devika Subramanian, 2008 5 Course work A term project (groups of at most two). Project proposal Interim progress reports (5) Project

1

Comp 540Statistical Machine Learning:

Principles and Applications

Devika SubramanianComputer Science

Rice University

(c) Devika Subramanian, 2008 2

Class information

4 creditsTuesday/Thursday 1:00 to 2:20 at Duncan 1075Instructor: Devika Subramanian


Goals of courseintroduce several state-of-the-art algorithms in statistical machine learning. show how each of these algorithms applies to real-world problems in science and engineering.provide practice in applying algorithms by solving real problems.


Auxiliary goals of course

To give you experience in independent research.To give you practice in oral presentation of technical material.To train you to write high-quality technical papers.


Course work

A term project (groups of at most two).

Project proposalInterim progress reports (5)Project presentationProject report (a technical paper)

An oral technical presentation


What is machine learning?

LearningalgorithmTraining

data

Prior knowledgePredictive model

Reprogramsystem decision rules

System

use data to build models useful for decision making.

systemresponse

System produces responsesbased on inputs from environmentaccording to pre-determined rules.

2


An example from finance

Learningalgorithm

EconomicIndicators,Price of GOOG

Prediction on GOOGprice

Investment policy

The stockmarket

Prior knowledge

Validation of learned model:are you making money?


IssuesWhat data should be gathered to make predictions? (Feature selection)What kind of model should be learned (e.g., a deterministic function of observed data, a probabilistic prediction on action choice)? (model selection)How can we be sure we have a good predictive model? (model assessment or validation)What algorithms should we use to learn these models from data? How can we scale them to work on large data sets in real time?


An example from biology

LearningalgorithmFlow cytometry

measurements

Prior knowledgeSignaling network

Therapeutic intervention based on network

Cancercells

drugsystemresponse


PKC

Raf

P44/42

Mek

Plcγ

PKA

Akt

Jnk P38

PIP2

PIP3

Expected Pathway

Reported

Missed

15/17 Classic17/17 Reported3 Missed

Reversed

Phospho-Proteins

Phospho-LipidsPerturbed in data

T-cell signaling network

Science 2005, Sachs et. al


Bcr-Abl signal transduction pathways in CML

Sawyers CL. NEJM. 1999; 340(17):1331


Inhibiting Bcr-Abl kinase(Gleevec)

3


Spam filtering


data

Prior knowledge Probability that msgis spam

Spam labeling policy

MailStream

userfeedback

labelmsg as spam/ham


Standard system building methodology

Analyze problemInterview human experts, gather requirements, understand how decisions are made.

Design a solutionHandcraft system models and devise algorithms for decision making

ImplementTest System

InputsOutputs


When do we need machine learning?

When we don’t know how to calculate outputs from inputs. (cancer cells)When requirements change rapidly. (spam filtering)When environments in which systems operate change rapidly. (stock market)When there is tremendous individual variability and therefore need for customization. (spam filtering, cancer cells)


Machine learningPrinciples, methods and algorithms for prediction and modeling on the basis of past experience.


Statistical machine learningMachine Learning is already at the heart of speech recognition and handwriting recognition.Statistical learning methods are transforming information retrieval. (Google) and retail (Amazon, Walmart).Statistical learning methods are creating opportunities in databases, computer graphics, robotics, computer vision, networking, operating systems, and computer security.


Role of ML in CSData is a new source of power for computer science.Every computer science student should learn the fundamentals of machine learning and statistical thinking.By combining engineered frameworks with models learned from data, we can develop the high-performance systems of the future.

4


Learning in context

Machine learning

Uncertainty

Multi-agent

systems

Artificial Intelligence

Control Th. OR (MDPs) Statistics App. Math.

Algorithms Systems/Software E. Data Mining


Outline of rest of lectureLearning, by exampleThe structure of the courseOne more example application of learning


A problemSort incoming mail into bins based on zipcode

Great variabilityin handwriting, hardto write a fixedset of rules forrecognizingdigits.


Learn from examplesMachine aligns the letter so that camera can take an image of the letter and extract zip code from it (segmentation and pre-processing for digit extraction)


How supervised learning worksSteps

Entertain a set of possibilitiesAdjust predictions based on feedbackRethink the possibilities

3

2

2Input Label

(adapted from T. Jaakola) (c) Devika Subramanian, 2008 24

Key questions Data and assumptions

What data is available for the learning task?What can we assume about the problem?

RepresentationHow should we represent the examples (feature selection)

Method and estimationWhat are the possible hypotheses?How do we adjust our predictions based on feedback?

(23-30 adapted from T. Jaakola)

5


Key questions (contd.)Evaluation

How well are we doing?Model selection

Can we do even better by selecting a richer class of hypotheses?


Data and assumptionsHow are the digits generated and how reliable are the labels?

3

2

2Input Label


Data representationRepresentation of input can be as

A bitmapExtracted features on the bitmap (number of curves, loops, etc..)

Representation can make learning problem easy or difficult.

3

2

2Input Label


Method and estimation

).( xwsigny =

Bitmap representation (8 by 8) of input, stored as a 64 bit vector xHypothesis:

3

2

2Input Label

where w is a parametervector we learn fromthe data


Model assessmentLook at average classification error as a function of the number of examples

3

2

2Input Label

Number of examples

Average error


Model selectionOur classifier is limited, can we make it more flexible?Is there an entirely different type of classifier that will be more suitable?

3

2

2Input Label

6


Outline of courseLearning techniques

Supervised learningRegression

Linear, locally weighted, polynomial, additiveNearest neighbor, prototype methods

ClassificationDiscriminative: logistic regression, perceptrons, neural nets, SVMsGenerative: LDA, QDA, naïve Bayes, Bayesian nets


Outline (contd.)Unsupervised learning

Clustering and k-meansExpectation maximization and Gaussian mixturesFactor Analysis: PCA, ICA, Isomap

Learning from sequential dataHMMs and CRFsReinforcement learning

Learning theoryBias/variance tradeoff, overfitting and regularizationEnsemble learning: boosting and baggingModel assessment and selection


Outline (contd.)Applications

Elevator control and backgammon playingLearning from forests of sensorsFace and handwriting recognitionText mining and information extractionLearning regulatory networks from biological dataGene findingAnd others that interest you!


The basic methodsSupervised learningUnsupervised learningReinforcement learning


Supervised learning

83601

Labeled Training

Examples

LearningAlgorithm Classifier

New Examples

8(c) Devika Subramanian, 2008 36

Supervised learningDesired model is a classifier function y = f(x).Training examples are pairs of the form (x1,y1)…(xn,yn), where xi is a vector denoting an input and yi is its corresponding classification.

7


Graphics: image analogies

: ::

: ?

Hertzmann, Jacobs, Oliver, Curless, Salesin (2000) SIGGRAPH(c) Devika Subramanian, 2008 38

Learning texture maps

:

:


Unsupervised learningModel is a probability density function p on the space of inputs X. This is the joint probability distribution on X.Training data are samples x1,…,xnfrom X.

mean(c) Devika Subramanian, 2008 40

ClusteringFinding structure in the data

Mixture models


Clustering of microarray data


Reinforcement learning

agentEnvironment

state s

reward r

action a

Agent’s goal: Choose actions to maximize total rewardAction Selection Rule is called a “policy”: a = p(s)

8


Methods for reinforcement learning

DirectStart with initial policy πExperiment with environment to decide how to improve πRepeat

Model BasedExperiment with environment to learn how it behaves (dynamics + rewards)Compute optimal policy π


Reinforcement learningDesired output is an action selection policy πTraining examples are <s,a,r,s’> tuplescollected by the agent interacting with the environment


Temporal Difference LearningTD-Gammon [Tesauro]

Neural Network (Input: raw board information)A more intelligent weight update ruleSelf-play (300,000 games)Human expert level

Program

Program

action action


Fundamental questions in machine learning

Incorporating prior knowledgeWe shouldn’t force systems to learn everything from scratch, but to bootstrap off of what is already known.

Incorporating learned structures into larger systems

How to embed learning in a larger system (good examples: speech recognizers and handwriting recognizers)


Fundamental questions (contd.)

Making learning algorithms (particularly reinforcement learning) practical.

Unsupervised and supervised learning have close ties to statistics; many practically fielded algorithms there.

Tradeoff between accuracy, sample size, and hypothesis complexity.

Need for both theoretical results and experimental methodologies for making these tradeoffs.


An example from cognitive science


data

Prior knowledgePredictive model

Interventionsto aid learning

HumanLearningthe NRL

task

9


Submarine School 101The NRL Navigation Task

50% of class weeded out by this game!

•Pilot a submarine to a goal through a minefield in a limited time period

•Distance to mines revealed via seven discrete sonars

•Time remaining, as-the-crow-flies distance to goal, and bearing to goal is given

•Actions communicated via a joystick interface


The NRL Navigation Task

Mine configurationchanges with everygame.

Game has a strategicand a visual-motorcomponent!


Learning curves

01020

3040506070

8090

100

1 50 99 148

197

246

295

344

393

442

491

540

589

638

687

736

Episode

Succ

ess

%

S3

S4S5

S1S2

Successful learnerslook similar: plateausbetween improvements

Unsuccessfullearners areDOA!

Navy takes 5 days to tell if a person succeeds/fails.(c) Devika Subramanian, 2008 52

Task QuestionsIs the game hard? What is the source of complexity?Why does human performance plateau out at 80%? Is that a feature of the human learning system or the game? Can machine learners achieve higher levels of competence?Can we understand why humans learn/fail to learn the task? Can we detect inability to learn early enough to intervene?How can we actively shape human learning on this task?


Mathematical characteristics of the NRL task

A partially observable Markov decision process which can be made fully observable by augmentation of state with previous action.State space of size 1014, at each step a choice of 153 actions (17 turns and 9 speeds).Feedback at the end of up to 200 steps.Challenging for both humans and machines.



“a way of programming agents by reward and punishment without needing to specify

how the task is to be achieved”

[Kaelbling, Littman, & Moore, 96]

10



Learner

Task

action

Feedback

state

Reinforcement Learning, Barto and Sutton, MIT Press, 1998.

1. Observe state, st2. Decide on an action, at3. Perform action4. Observe new state, st+15. Observe reward, rt+16. Learn from experience7. Repeat

AS→:π


Reinforcement learning/NRL task

Representational hurdlesState and action spaces have to be manageably small.Good intermediate feedback in the form of a non-deceptive progress function needed.

Algorithmic hurdlesAppropriate credit assignment policy needed to handle the two types of failures (timeouts and explosions are different).Learning is too slow to converge (because there are up to 200 steps in a single training episode).


State space design

Binary distinction on sonar: is it > 50?Six equivalence classes on bearing: 12, {1,2}, {3,4}, {5,6,7},{8,9}, {10,11}State space size = 27 * 6 = 768.Discretization of actions

speed: 0, 20 and 40.turn: -32, -16, -8, 0, 8, 16, 32.

Automated discovery of abstract state spaces for reinforcement learning,Griffin and Subramanian, 2001.


The dense reward function

r(s,a,s’) = 0 if s’ is a state where player hits mine.= 1 if s’ is a goal state= 0.5 if s’ is a timeout state

= 0.75 if s is an all-blocked state and s’ is a not-all-blocked state= 0.5 + Diff in sum of sonars/1000 if s’ is an all-blocked state= 0.5 + Diff in range/1000 + abs(bearing - 6)/40 otherwise

Feedback at theend

Useful feedback during play


Credit assignment policy

Penalize the last action alone in a sequence which ends in an explosion.Penalize all actions in sequence which ends in a timeout.


Simplification of value estimation

Estimate the average local reward for each action in each state.

ss’

t

Q(s,a) = is the sum of rewards from s to terminal state.

r1r2 r3

),()1()]','(max[),('

asQasQrasQa

αα −++=

Instead of learning Q

We maintain an approximationsasasQ from winsofpct *)for at rewards of avg running(),(' =

Open question:When does thisapprox work?

11


Results of learning complete policy

Blue: learn turnsonly

Red: learn turnand speed

Humans makemore effectiveuse of trainingexamples. ButQ-learner gets tonear 100% success.

Griffin and Subramanian, 2000(c) Devika Subramanian, 2008 62

Full Q learner/1500 episodes


Full Q learner/10000 episodes


Full Q learner/failure after 10K


Why learning takes so long

Stateswhere3 or fewerof the 153action choicesare correct!Griffin and

Subramanian, 2000


Lessons from machine learningTask level

Task is hard because states in which action choice is critical occur less than 5% of the time.Staged learning makes task significantly easierA locally non-deceptive reward function speeds up learning.

Reinforcement learningLong sequence of moves makes credit assignment

hard; a new cheap approximation to global value function makes learning possible for such problems.Algorithm for automatic discretization of large, irregular state spaces.

Griffin and Subramanian, 2000, 2001

12


Task QuestionsIs the game hard? Is it hard for machines? What is the source of complexity?Why does human performance plateau out at 80%? Is that a feature of the human learning system or the game? Can machine learners achieve higher levels of competence?Can we understand why humans learn/fail to learn the task? Can we detect inability to learn early enough to intervene?How can we actively shape human learning on this task?


Tracking human learning

(sensor panel,joystick action)

Learningalgorithm

Prior knowledgeModel

Strategy mappingsensor panelsto joystickaction

(time coursedata)

Interventionsto aidlearning

Extract strategy and study its evolution over time


ChallengesHigh-dimensionality of visual data (11 dimensions spanning a space of size 1014)Large volumes of dataNoise in dataNon-stationarity: policies change over time


Embedded learner designRepresentation

Use raw visual-motor data stream to induce policies/strategies.

LearningDirect models: lookup table mapping sensors at time t and action at t-1 to distribution of actions at time t. (1st order Markov model)

Decision-makingCompute “derivative” of the policies over time, and use it (1) to classify learner and select interventions, (2) to build behaviorally equivalent models of subjects


Strategy: mapping from sensors to action distributions

w

Window ofw games


Surely, this can’t work!

There are 1014 sensor configurations possible in the NRL Navigation task.However, there are between 103 to 104 of those configurations actually observed by humans in a training run of 600 episodes.Exploit sparsity in sensor configuration space to build a direct model of the subject.

13


How do strategies evolve over time?

Distance function between strategies: KL-divergence

)2,(),,(( swiswiwiiP −+−+Π+ΠΔ

ww

Overlap = s


Results: model derivative

Siruguri and Subramanian, 2002


Before shift (episode 300)


After shift (episode 320)


Model derivative for Hei

Siruguri andSubramanian, 2002


How humans learn

Subjects have relatively static periods of action policy choice punctuated by radical shifts.Successful learners have conceptual shifts during the first part of training; unsuccessful ones keep trying till the end of the protocol!

14


Behaviorally equivalent modelsModel

NRL task


Generating behaviorally equivalent models

To compute action a associated with current sensor configuration s in a given segment,

take 100 neighbors of s in lookup table.perform locally weighted regression (LWR) on these 100 (s,a) pairs.


Subject Cea: Day 5: 1

Subject Model



Subject Model


Subject Cea: day 5: 3

Subject Model



Subject Model

15



Subject Model



Subject Model



Subject Model



Subject Model



Subject Model


Comparison with global methods

Siruguri and Subramanian, 2002

16


SummaryWe can model subjects on the NRL task in real-time, achieving excellent fits to their learning curves, using the available visual-motor data stream.One of the first in cognitive science to directly use objective visual-motor performance data to derive evolution of strategy on a complex task.


Where’s the science?


Lessons Learn simple models from objective, low-level data!Non-stationarity is commonplace, need to design algorithms robust with respect to it.Fast new algorithms for detecting change-points and building predictive stochastic models for massive, noisy, non-stationary, vector time series data.


Neural correlates

Are there neural correlates to strategy shifts observed in the visual-motor data?


Task QuestionsCan we adapt training protocols in the NRL task by identifying whether subjects are struggling with strategy formulation or visual-motor control or both?Can we use analysis of EEG data gathered during learning as well as visual-motor performance data to correlate ‘brain events’ with ‘visual-motor performance events’? Can this correlation separate subjects with different learning difficulties?


The (new) NRL Navigation Task

17


Gathering performance data


Fusing EEG and visualmotor data

EEGData

Artifact Removal

Coherence computation

Visualization Mechanism

PerformanceData


Measuring functional connectivity in the brain

Coherence provides the means to measure synchronous activity between two brain areasA function that calculates the normalized cross-power spectrum, a measure of similarity of signal in the frequency domain

)]()([|)(|

)(2

fSfSfS

fCyyxx

xyxy =


Topological coherence map

Front

Back


Frequency bandsCoherence map of connections in each band

Δ (0-5 Hz)θ (5-9 Hz)α (9-14 Hz)β (14-30 Hz)γ (40-52 Hz)


Subject moh progression chart

18


Results (subject moh)


Results

Subject bil progression chart


Results (subject bil)

Baluch, Zouridakis, Stevenson and Subramanian, 2005, 2006


Subject G

Subject is inskill refinementphase

Subject is a near-expert performer


Subject VSubjectneverlearned agood strategy

It wasn’tfor lack of trying..


There are distinct EEG coherence map signatures associated with different learning difficulties

Lack of strategy Shifting between too many strategies

Subjects in our study who showed a move from a low level of performance to a high level of performance show front to back synchrony in the gamma range or long range gamma synchrony (LRGS). [Baluch,Zouridakis,Stevenson,Subramanian 2007]We are conducting experiments on more subjects to confirm these findings. (14 subjects so far, and more are being collected right now.)

Results

19


What else is this good for?Using EEG readouts to analyze the effectiveness of video games for relieving pre-operative stress in children (A. Patel, UMDNJ).Using EEG to read emotional state of players in immersive video games (M. Zyda, USC).Analyzing human performance on any visualmotor task with significant strategic component.


Questions?

Documents

What is machine learning? - Rice University · (c) Devika Subramanian, 2008 5 Course work A term project (groups of at most two). Project proposal Interim progress reports (5) Project