1 COMP3503 Intro to Inductive Modeling with Daniel L. Silver Daniel L. Silver

1

COMP3503 COMP3503 Intro to Inductive ModelingIntro to Inductive Modeling

withwith

Daniel L. SilverDaniel L. Silver

2

AgendaAgenda

Deductive and Inductive ModelingDeductive and Inductive Modeling Learning Theory and Learning Theory and

GeneralizationGeneralization Common Statistical MethodsCommon Statistical Methods

3

The KDD ProcessThe KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Consolidation

Knowledge

p(x)=0.02

DataWarehouse

Data Sources

Patterns & Models

Prepared Data

ConsolidatedData

4

Deductive and Inductive Deductive and Inductive ModelingModeling

5

Induction versus Induction versus DeductionDeduction

Model or General Rule

Deduction

Example AExample BExample C

Top-down verification

InductionBottom-up construction

6

Deductive ModelingDeductive Modeling

Top-down (toward the data) Top-down (toward the data) verification of an hypothesisverification of an hypothesis

The hypothesis is generated within The hypothesis is generated within the mind of the data minerthe mind of the data miner

Exploratory tools such as OLAP and Exploratory tools such as OLAP and data visualization software are used data visualization software are used

Models tend to be used for Models tend to be used for description description

7

Inductive ModelingInductive Modeling

Bottom-up (from the data) Bottom-up (from the data) development of an hypothesisdevelopment of an hypothesis

The hypothesis is generated by the The hypothesis is generated by the technology directly from the datatechnology directly from the data

Statistical and machine learning Statistical and machine learning tools such as regression, decision tools such as regression, decision trees and artificial neural networks trees and artificial neural networks are usedare used

Models can be used for Models can be used for predictionprediction

8


Objective: Objective: Develop a general Develop a general model or model or hypothesis hypothesis from specific examplesfrom specific examples

Function approximation Function approximation (curve fitting)(curve fitting)

Classification Classification (concept learning, (concept learning, pattern recognition)pattern recognition)

x1

x2

AB

f(x)

x

9

Learning Theory and Learning Theory and GeneralizationGeneralization

10

Inductive Modeling = Inductive Modeling = LearningLearningBasic Framework for Inductive LearningBasic Framework for Inductive Learning

InductiveLearning System

Environment

TrainingExamples

TestingExamples

Induced Model or

Hypothesis

Output Classification

(x, f(x))

(x, h(x))

h(x) = f(x)?

A problem of representation and search for the best hypothesis, h(x).

~

11

Inductive Modeling Inductive Modeling = Data Mining= Data Mining

Ideally, an hypothesis (model) is:Ideally, an hypothesis (model) is:• Complete – covers all potential Complete – covers all potential

examplesexamples• Consistent – no conflicts Consistent – no conflicts • Accurate - able to generalize to Accurate - able to generalize to

previously unseen examplespreviously unseen examples• Valid – presents a truthValid – presents a truth• Transparent – human readable Transparent – human readable

knowledgeknowledge

12


Generalization Generalization The objective of learning is to achieve The objective of learning is to achieve

good good generalizationgeneralization to new cases, to new cases, otherwise just use a look-up table.otherwise just use a look-up table.

Generalization can be defined as a Generalization can be defined as a mathematical mathematical interpolationinterpolation or or regressionregression over a set of training points: over a set of training points:

f(x)

x

13

Inductive ModelingInductive ModelingGeneralizationGeneralization

Generalization accuracy can be Generalization accuracy can be guaranteed for a specified guaranteed for a specified confidence level given sufficient confidence level given sufficient number of examplesnumber of examples

Models can be validated for Models can be validated for accuracy by using a previously accuracy by using a previously unseen test set of examplesunseen test set of examples

14

Learning TheoryLearning Theory

PProbably robably AApproximately pproximately CCorrect (PAC)orrect (PAC) theory of learning (Leslie Valiant, 1984)theory of learning (Leslie Valiant, 1984)

Poses questions such as:Poses questions such as:• How many examples are needed for good How many examples are needed for good

generalization?generalization?• How long will it take to create a good model?How long will it take to create a good model?

Answers depend on:Answers depend on:• Complexity of the actual functionComplexity of the actual function• The desired level of accuracy of the model (75%)The desired level of accuracy of the model (75%)• The desired confidence in finding a model with The desired confidence in finding a model with

this the accuracy (19 times out of 20 = 95%)this the accuracy (19 times out of 20 = 95%)

15


Space of all possible examples

Where c and h disagree

c

h+ -

-

-

-

+

The true error of a hypothesis h is the probability thath will misclassify an instance drawn at random from X,

error(h) = P[c(x) h(x)]

-

-

- -

+

16


Three notions of error:Three notions of error: Training ErrorTraining Error

• How often How often training set is misclassified Test ErrorTest Error

• How often How often an independent test set is misclassified True ErrorTrue Error

• How often the entire population of possible How often the entire population of possible examples would be misclassifiedexamples would be misclassified

• Must be estimated from the Test ErrorMust be estimated from the Test Error

17

Linear and Non-Linear Linear and Non-Linear ProblemsProblems Linear ProblemsLinear Problems

• Linear functionsLinear functions• Linearly separableLinearly separable

classificationsclassifications Non-linear ProblemsNon-linear Problems

• Non-linear functionsNon-linear functions• Not linearly separableNot linearly separable classificationsclassifications

x1

x2

A

B

x1

x2

f(x)

x

A B

B

f(x)

18

Inductive BiasInductive Bias

Every inductive modeling system Every inductive modeling system has an Inductive Biashas an Inductive Bias

Consider a simple set of training Consider a simple set of training examples like the following: examples like the following:

f(x)

xGo to generalize.xls

19

Inductive BiasInductive Bias

Can you think of any biases that Can you think of any biases that you commonly use when you are you commonly use when you are learning something new?learning something new?

Is there one best inductive bias?Is there one best inductive bias?

20

Inductive Modeling Inductive Modeling MethodsMethods

Automated Exploration/DiscoveryAutomated Exploration/Discovery• e.g.. e.g.. discovering new market segmentsdiscovering new market segments• distance and probabilistic clustering algorithmsdistance and probabilistic clustering algorithms

Prediction/ClassificationPrediction/Classification• e.g.. e.g.. forecasting gross sales given current factorsforecasting gross sales given current factors• statistics (regression, K-nearest neighbour)statistics (regression, K-nearest neighbour)• artificial neural networks, genetic algorithmsartificial neural networks, genetic algorithms

Explanation/DescriptionExplanation/Description• e.g.. e.g.. characterizing customers by demographics characterizing customers by demographics

• inductive decision trees/rulesinductive decision trees/rules• rough sets, Bayesian belief netsrough sets, Bayesian belief nets

x1

x2

f(x)

x

if age > 35 and income < $35k then ...

AB

21

Common Statistical Common Statistical MethodsMethods

22

Linear RegressionLinear Regression

Y = b0 + b1 X1 + b2 X2 +...Y = b0 + b1 X1 + b2 X2 +... The coefficients The coefficients b0, b1b0, b1 … determine a … determine a

line (or hyperplane for higher dim.) that line (or hyperplane for higher dim.) that fits the datafits the data

Closed form solution via least squares Closed form solution via least squares (computes the smallest sum of squared (computes the smallest sum of squared distances between the examples and distances between the examples and predicted values of Y)predicted values of Y)

Inductive bias: The solution can be Inductive bias: The solution can be modeled by a straight line or hyperplanemodeled by a straight line or hyperplane

23

Linear RegressionLinear Regression

Y = b0 + b1 X1 + b2 X2 +...Y = b0 + b1 X1 + b2 X2 +... A great way to start since it A great way to start since it

assumes you are modeling a assumes you are modeling a simple function … Why?simple function … Why?

24

Logistic RegressionLogistic Regression

Y =Y = 1/(1+e 1/(1+e-Z-Z))

Where Where Z = b0 + b1 X1 + b2 X2 +… Z = b0 + b1 X1 + b2 X2 +… Output is [0,1] and represents probabilityOutput is [0,1] and represents probability The coefficients The coefficients b0, b1b0, b1 … determine an … determine an

S-shaped non-linear curve that best fits S-shaped non-linear curve that best fits datadata

The coefficients are estimated using an The coefficients are estimated using an iterative maximum-likelihood methoditerative maximum-likelihood method

Inductive bias: The solution can be modeled Inductive bias: The solution can be modeled by this S-shaped non-linear surfaceby this S-shaped non-linear surface

Y

Z0

1

25

Logistic RegressionLogistic Regression

Y =Y = 1/(1+e 1/(1+e-Z-Z))

Where Where Z = b0 + b1 X1 + b2 X2 +… Z = b0 + b1 X1 + b2 X2 +… Can be used for classification problemsCan be used for classification problems The output can be used as the The output can be used as the

probability of being of the class (or probability of being of the class (or positive)positive)

Alternatively, any value above a cut-off Alternatively, any value above a cut-off (typically 0.5) is classified as being a (typically 0.5) is classified as being a positive examplepositive example

… … A logistic regression A logistic regression Javascript page

Y

Z0

1

26

THE ENDTHE END

[email protected]@acadiau.ca

27

Learning TheoryLearning TheoryExample Space X(x,c(x))

Where c and h disagree

c

h+ -

-

-

-

+

The true error of a hypothesis h is the probability thath will misclassify an instance drawn at random from X,

err(h) = P[c(x) h(x)]

x = input attributesc = true class function (e.g. “likes product”)h = hypothesis (model)

28

GeneralizationGeneralization

PAC - A Probabilistic GuaranteePAC - A Probabilistic GuaranteeH = # possible hypotheses in modeling H = # possible hypotheses in modeling

systemsystem

= desired true error, where (0 < = desired true error, where (0 < < 1) < 1)

= desired confidence (1- = desired confidence (1- ), where (0 < ), where (0 < < < 1)1)

The the number of training examples required The the number of training examples required to select (with confidence to select (with confidence ) a hypothesis h ) a hypothesis h with err(h) < with err(h) < is given by is given by )

||(ln

1

H

m

Documents

1 COMP3503 Intro to Inductive Modeling with Daniel L. Silver Daniel L. Silver