64
Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Machine Learning: Making Computer Science Scientific

Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Thomas G. Dietterich

Department of Computer Science

Oregon State University

Corvallis, Oregon 97331

http://www.cs.orst.edu/~tgd

Machine Learning: Making Computer Science

Scientific

Page 2: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Acknowledgements

VLSI Wafer Testing Tony Fountain

Robot Navigation Didac Busquets Carles Sierra Ramon Lopez de Mantaras

NSF grants IIS-0083292 and ITR-085836

Page 3: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Outline

Three scenarios where standard software engineering methods fail

Machine learning methods applied to these scenarios

Fundamental questions in machine learning

Statistical thinking in computer science

Page 4: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Scenario 1: Reading Checks

Find and read “courtesy amount” on checks:

Page 5: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Possible Methods:

Method 1: Interview humans to find out what steps they follow in reading checks

Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts

Page 6: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Scenario 2: VLSI Wafer Testing

Wafer test: Functional test of each die (chip) while on the wafer

Page 7: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Which Chips (and how many) should be tested?

Tradeoff: Test all chips on wafer?

Avoid cost of packaging bad chips Incur cost of testing all chips

Test none of the chips on the wafer?May package some bad chipsNo cost of testing on wafer

Page 8: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Possible Methods

Method 1: Guess the right tradeoff point Method 2: Learn a probabilistic model

that captures the probability that each chip will be bad Plug this model into a Bayesian decision

making procedure to optimize expected profit

Page 9: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Scenario 3: Allocating mobile robot camera

Binocular

No GPS

Page 10: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Camera tradeoff

Mobile robot uses camera both for obstacle avoidance and landmark-based navigation

Tradeoff: If camera is used only for navigation, robot

collides with objects If camera is used only for obstacle

avoidance, robot gets lost

Page 11: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Possible Methods

Method 1: Manually write a program to allocate the camera

Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking

Page 12: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Challenges for SE Methodology

Standard SE methods fail when…1) System requirements are hard to collect

2) The system must resolve difficult tradeoffs

Page 13: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

(1) System requirements are hard to collect

There are no human experts Cellular telephone fraud

Human experts are inarticulate Handwriting recognition

The requirements are changing rapidly Computer intrusion detection

Each user has different requirements E-mail filtering

Page 14: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

(2) The system must resolve difficult tradeoffs

VLSI Wafer testing Tradeoff point depends on probability of bad

chips, relative costs of testing versus packaging

Camera Allocation for Mobile Robot Tradeoff depends on probability of

obstacles, number and quality of landmarks

Page 15: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Machine Learning: Replacing guesswork with data

In all of these cases, the standard SE methodology requires engineers to make guesses Guessing how to do character recognition Guessing the tradeoff point for wafer test Guessing the tradeoff for camera allocation

Machine Learning provides a way of making these decisions based on data

Page 16: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Outline

Three scenarios where software engineering methods fail

Machine learning methods applied to these scenarios

Fundamental questions in machine learning

Statistical thinking in computer science

Page 17: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Basic Machine Learning Methods

Supervised Learning Density Estimation Reinforcement Learning

Page 18: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Supervised Learning

8

3

6

0

1

Training Examples

LearningAlgorithm

Classifier

New Examples

8

Page 19: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

AT&T/NCR Check Reading System

Recognition transformer is a neural network trained on 500,000 examples of characters

The entire system is trained given entire checks as input and dollar amounts as output

LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition

Page 20: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Check Reader Performance

82% of machine-printed checks correctly recognized

1% of checks incorrectly recognized 17% “rejected” – check is presented to a

person for manual reading

Fielded by NCR in June 1996; reads millions of checks per month

Page 21: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Supervised Learning Summary

Desired classifier is a function y = f(x) Training examples are desired input-

output pairs (xi,yi)

Page 22: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Density Estimation

Training Examples

LearningAlgorithm

DensityEstimator

P(chipi is bad) = 0.42

Partially-tested wafer

Page 23: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

On-Wafer Testing System

Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR) Probability model is “naïve Bayes” mixture model

with four components (trained with EM)

W

C209C3C2C1 . . .

Page 24: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

One-Step Value of Information

Choose the larger of Expected profit if we predict remaining

chips, package, and re-test Expected profit if we test chip Ci, then

predict remaining chips, package, and re-test [for all Ci not yet tested]

Page 25: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

On-Wafer Chip Test Results

$1,160

$1,170

$1,180

$1,190

$1,200

$1,210

$1,220

$1,230

Profit($K)

Test all VOI testing

3.8% increase in profit

Page 26: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Density Estimation Summary

Desired output is a joint probability distribution P(C1, C2, …, C203)

Training examples are points X= (C1, C2, …, C203) sampled from this distribution

Page 27: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Reinforcement Learning

agent

Environment

state s

reward r

action a

Agent’s goal: Choose actions to maximize total reward

Action Selection Rule is called a “policy”: a = (s)

Page 28: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Reinforcement Learning for Robot Navigation

Learning from rewards and punishments in the environment Give reward for reaching goal Give punishment for getting lost Give punishment for collisions

Page 29: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Experimental Results:% trials robot reaches goal

Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)

Page 30: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Reinforcement Learning Summary

Desired output is an action selection policy

Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment

Page 31: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Outline

Three scenarios where software engineering methods fail

Machine learning methods applied to these scenarios

Fundamental questions in machine learning

Statistical thinking in computer science

Page 32: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Fundamental Issues in Machine Learning

Incorporating Prior Knowledge Incorporating Learned Structures into

Larger Systems Making Reinforcement Learning Practical Triple Tradeoff: accuracy, sample size,

hypothesis complexity

Page 33: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Incorporating Prior Knowledge

How can we incorporate our prior knowledge into the learning algorithm? Difficult for decision trees, neural networks,

support-vector machines, etc.Mismatch between form of our knowledge and

the way the algorithms work Easier for Bayesian networks

Express knowledge as constraints on the network

Page 34: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Incorporating Learned Structures into Larger Systems

Success story: Digit recognizer incorporated into check reader

Challenges: Larger system may make several

coordinated decisions, but learning system treated each decision as independent

Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07

Page 35: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Making Reinforcement Learning Practical

Current reinforcement learning methods do not scale well to large problems

Need robust reinforcement learning methodologies

Page 36: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

The Triple Tradeoff

Fundamental relationship between amount of training data size and complexity of hypothesis space accuracy of the learned hypothesis

Explains many phenomena observed in machine learning systems

Page 37: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Learning Algorithms

Set of data points Class H of hypotheses Optimization problem: Find the

hypothesis h in H that best fits the data

TrainingData

h

Hypothesis Space

Page 38: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Triple Tradeoff

Amount of Data – Hypothesis Complexity – Accuracy

N = 1000

Hypothesis Space Complexity

Acc

urac

y

N = 10

N = 100

Page 39: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Triple Tradeoff (2)

Number of training examples N

Acc

urac

y

Hypothesis

Com

plexity

H1

H2

H3

Page 40: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Intuition

With only a small amount of data, we can only discriminate between a small number of different hypotheses

As we get more data, we have more evidence, so we can consider more alternative hypotheses

Complex hypotheses give better fit to the data

Page 41: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Fixed versus Variable-Sized Hypothesis Spaces

Fixed size Ordinary linear regression Bayes net with fixed structure Neural networks

Variable size Decision trees Bayes nets with variable structure Support vector machines

Page 42: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Corollary 1:Fixed H will underfit

Number of training examples N

Acc

urac

y

H1

H2 underfit

Page 43: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Corollary 2:Variable-sized H will overfit

Hypothesis Space Complexity

Acc

urac

y

N = 100overfit

Page 44: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Ideal Learning Algorithm: Adapt complexity to data

Hypothesis Space Complexity

Acc

urac

y

N = 10

N = 100

N = 1000

Page 45: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Adapting Hypothesis Complexity to Data Complexity

Find hypothesis h to minimizeerror(h) + complexity(h)

Many methods for adjusting Cross-validation MDL

Page 46: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Outline

Three scenarios where software engineering methods fail

Machine learning methods applied to these scenarios

Fundamental questions in machine learning

Statistical thinking in computer science

Page 47: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

The Data Explosion

NASA Data 284 Terabytes (as of August, 1999) Earth Observing System: 194 G/day Landsat 7: 150 G/day Hubble Space Telescope: 0.6 G/day

http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html

Page 48: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

The Data Explosion (2)

Google indexes 2,073,418,204 web pages

US Year 2000 Census: 62 Terabytes of scanned images

Walmart Data Warehouse: 7 (500?) Terabytes

Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes

Page 49: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Old Computer Science Conception of Data

Store Retrieve

Page 50: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

New Computer Science Conception of Data

Store Build

Models

Solve

Problems

Problems

Solutions

Page 51: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Machine Learning:Making Data Active

Methods for building models from data Methods for collecting and/or sampling

data Methods for evaluating and validating

learned models Methods for reasoning and decision-

making with learned models Theoretical analyses

Page 52: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Machine Learning andComputer Science

Natural language processing Databases and data mining Computer architecture Compilers Computer graphics

Page 53: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Hardware Branch Prediction

Source: Jiménez & Lin (2000) Perceptron Learning for Predicting the Behavior of Conditional Branches

Page 54: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Instruction Scheduler for New CPU

The performance of modern microprocessors depends on the order in which instructions are executed

Modern compilers rearrange instruction order to optimize performance (“instruction scheduling”)

Each new CPU design requires modifying the instruction scheduler

Page 55: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Instruction Scheduling

Moss, et al. (1997): Machine Learning scheduler can beat performance of commercial compilers and match the performance of research compiler.

Training examples: small basic blocks Experimentally determine optimal instruction

order Learn preference function

Page 56: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Computer Graphics: Video Textures

Generate new video by splicing together short stretches of old video

A B C D E F

B D E D E F A

Apply reinforcement learning to identify good transition points

Arno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000)

Page 57: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Video TexturesArno Schödl, Richard Szeliski, David H. Salesin, Irfan

Essa (SIGGRAPH 2000)

You can find this video at Virtual Fish Tank Movie

Page 58: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Graphics: Image Analogies

: ::

: ?

Hertzmann, Jacobs, Oliver, Curless, Salesin (2000) SIGGRAPH

Page 59: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Learning to Predict Textures

A(p) A’(p)

B(q) B’(q)

Find p to minimize Euclidean distance between

and

B’(q) := A’(p)

Page 60: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Image Analogies

: ::

:

Page 61: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

A video can be found at

Image Analogies Movie

Page 62: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Summary

Standard Software Engineering methods fail in many application problems

Machine Learning methods can replace guesswork with data to make good design decisions

Page 63: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Machine Learning and Computer Science

Machine Learning is already at the heart of speech recognition and handwriting recognition

Statistical methods are transforming natural language processing (understanding, translation, retrieval)

Statistical methods are creating opportunities in databases, computer graphics, robotics, computer vision, networking, and computer security

Page 64: Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 tgd Machine Learning: Making

Computer Power and Data Power

Data is a new source of power for computer science

Every computer science student should learn the fundamentals of machine learning and statistical thinking

By combining engineered frameworks with models learned from data, we can develop the high-performance systems of the future