Upload
matthew-nelson
View
217
Download
0
Embed Size (px)
Citation preview
1
COMP3503 COMP3503 Intro to Inductive ModelingIntro to Inductive Modeling
withwith
Daniel L. SilverDaniel L. Silver
2
AgendaAgenda
Deductive and Inductive ModelingDeductive and Inductive Modeling Learning Theory and Learning Theory and
GeneralizationGeneralization Common Statistical MethodsCommon Statistical Methods
3
The KDD ProcessThe KDD Process
Selection and Preprocessing
Data Mining
Interpretation and Evaluation
Data Consolidation
Knowledge
p(x)=0.02
DataWarehouse
Data Sources
Patterns & Models
Prepared Data
ConsolidatedData
4
Deductive and Inductive Deductive and Inductive ModelingModeling
5
Induction versus Induction versus DeductionDeduction
Model or General Rule
Deduction
Example AExample BExample C
Top-down verification
InductionBottom-up construction
6
Deductive ModelingDeductive Modeling
Top-down (toward the data) Top-down (toward the data) verification of an hypothesisverification of an hypothesis
The hypothesis is generated within The hypothesis is generated within the mind of the data minerthe mind of the data miner
Exploratory tools such as OLAP and Exploratory tools such as OLAP and data visualization software are used data visualization software are used
Models tend to be used for Models tend to be used for description description
7
Inductive ModelingInductive Modeling
Bottom-up (from the data) Bottom-up (from the data) development of an hypothesisdevelopment of an hypothesis
The hypothesis is generated by the The hypothesis is generated by the technology directly from the datatechnology directly from the data
Statistical and machine learning Statistical and machine learning tools such as regression, decision tools such as regression, decision trees and artificial neural networks trees and artificial neural networks are usedare used
Models can be used for Models can be used for predictionprediction
8
Inductive ModelingInductive Modeling
Objective: Objective: Develop a general Develop a general model or model or hypothesis hypothesis from specific examplesfrom specific examples
Function approximation Function approximation (curve fitting)(curve fitting)
Classification Classification (concept learning, (concept learning, pattern recognition)pattern recognition)
x1
x2
AB
f(x)
x
9
Learning Theory and Learning Theory and GeneralizationGeneralization
10
Inductive Modeling = Inductive Modeling = LearningLearningBasic Framework for Inductive LearningBasic Framework for Inductive Learning
InductiveLearning System
Environment
TrainingExamples
TestingExamples
Induced Model or
Hypothesis
Output Classification
(x, f(x))
(x, h(x))
h(x) = f(x)?
A problem of representation and search for the best hypothesis, h(x).
~
11
Inductive Modeling Inductive Modeling = Data Mining= Data Mining
Ideally, an hypothesis (model) is:Ideally, an hypothesis (model) is:• Complete – covers all potential Complete – covers all potential
examplesexamples• Consistent – no conflicts Consistent – no conflicts • Accurate - able to generalize to Accurate - able to generalize to
previously unseen examplespreviously unseen examples• Valid – presents a truthValid – presents a truth• Transparent – human readable Transparent – human readable
knowledgeknowledge
12
Inductive ModelingInductive Modeling
Generalization Generalization The objective of learning is to achieve The objective of learning is to achieve
good good generalizationgeneralization to new cases, to new cases, otherwise just use a look-up table.otherwise just use a look-up table.
Generalization can be defined as a Generalization can be defined as a mathematical mathematical interpolationinterpolation or or regressionregression over a set of training points: over a set of training points:
f(x)
x
13
Inductive ModelingInductive ModelingGeneralizationGeneralization
Generalization accuracy can be Generalization accuracy can be guaranteed for a specified guaranteed for a specified confidence level given sufficient confidence level given sufficient number of examplesnumber of examples
Models can be validated for Models can be validated for accuracy by using a previously accuracy by using a previously unseen test set of examplesunseen test set of examples
14
Learning TheoryLearning Theory
PProbably robably AApproximately pproximately CCorrect (PAC)orrect (PAC) theory of learning (Leslie Valiant, 1984)theory of learning (Leslie Valiant, 1984)
Poses questions such as:Poses questions such as:• How many examples are needed for good How many examples are needed for good
generalization?generalization?• How long will it take to create a good model?How long will it take to create a good model?
Answers depend on:Answers depend on:• Complexity of the actual functionComplexity of the actual function• The desired level of accuracy of the model (75%)The desired level of accuracy of the model (75%)• The desired confidence in finding a model with The desired confidence in finding a model with
this the accuracy (19 times out of 20 = 95%)this the accuracy (19 times out of 20 = 95%)
15
Learning TheoryLearning Theory
Space of all possible examples
Where c and h disagree
c
h+ -
-
-
-
+
The true error of a hypothesis h is the probability thath will misclassify an instance drawn at random from X,
error(h) = P[c(x) h(x)]
-
-
- -
+
16
Learning TheoryLearning Theory
Three notions of error:Three notions of error: Training ErrorTraining Error
• How often How often training set is misclassified Test ErrorTest Error
• How often How often an independent test set is misclassified True ErrorTrue Error
• How often the entire population of possible How often the entire population of possible examples would be misclassifiedexamples would be misclassified
• Must be estimated from the Test ErrorMust be estimated from the Test Error
17
Linear and Non-Linear Linear and Non-Linear ProblemsProblems Linear ProblemsLinear Problems
• Linear functionsLinear functions• Linearly separableLinearly separable
classificationsclassifications Non-linear ProblemsNon-linear Problems
• Non-linear functionsNon-linear functions• Not linearly separableNot linearly separable classificationsclassifications
x1
x2
A
B
x1
x2
f(x)
x
A B
B
f(x)
18
Inductive BiasInductive Bias
Every inductive modeling system Every inductive modeling system has an Inductive Biashas an Inductive Bias
Consider a simple set of training Consider a simple set of training examples like the following: examples like the following:
f(x)
xGo to generalize.xls
19
Inductive BiasInductive Bias
Can you think of any biases that Can you think of any biases that you commonly use when you are you commonly use when you are learning something new?learning something new?
Is there one best inductive bias?Is there one best inductive bias?
20
Inductive Modeling Inductive Modeling MethodsMethods
Automated Exploration/DiscoveryAutomated Exploration/Discovery• e.g.. e.g.. discovering new market segmentsdiscovering new market segments• distance and probabilistic clustering algorithmsdistance and probabilistic clustering algorithms
Prediction/ClassificationPrediction/Classification• e.g.. e.g.. forecasting gross sales given current factorsforecasting gross sales given current factors• statistics (regression, K-nearest neighbour)statistics (regression, K-nearest neighbour)• artificial neural networks, genetic algorithmsartificial neural networks, genetic algorithms
Explanation/DescriptionExplanation/Description• e.g.. e.g.. characterizing customers by demographics characterizing customers by demographics
• inductive decision trees/rulesinductive decision trees/rules• rough sets, Bayesian belief netsrough sets, Bayesian belief nets
x1
x2
f(x)
x
if age > 35 and income < $35k then ...
AB
21
Common Statistical Common Statistical MethodsMethods
22
Linear RegressionLinear Regression
Y = b0 + b1 X1 + b2 X2 +...Y = b0 + b1 X1 + b2 X2 +... The coefficients The coefficients b0, b1b0, b1 … determine a … determine a
line (or hyperplane for higher dim.) that line (or hyperplane for higher dim.) that fits the datafits the data
Closed form solution via least squares Closed form solution via least squares (computes the smallest sum of squared (computes the smallest sum of squared distances between the examples and distances between the examples and predicted values of Y)predicted values of Y)
Inductive bias: The solution can be Inductive bias: The solution can be modeled by a straight line or hyperplanemodeled by a straight line or hyperplane
23
Linear RegressionLinear Regression
Y = b0 + b1 X1 + b2 X2 +...Y = b0 + b1 X1 + b2 X2 +... A great way to start since it A great way to start since it
assumes you are modeling a assumes you are modeling a simple function … Why?simple function … Why?
24
Logistic RegressionLogistic Regression
Y =Y = 1/(1+e 1/(1+e-Z-Z))
Where Where Z = b0 + b1 X1 + b2 X2 +… Z = b0 + b1 X1 + b2 X2 +… Output is [0,1] and represents probabilityOutput is [0,1] and represents probability The coefficients The coefficients b0, b1b0, b1 … determine an … determine an
S-shaped non-linear curve that best fits S-shaped non-linear curve that best fits datadata
The coefficients are estimated using an The coefficients are estimated using an iterative maximum-likelihood methoditerative maximum-likelihood method
Inductive bias: The solution can be modeled Inductive bias: The solution can be modeled by this S-shaped non-linear surfaceby this S-shaped non-linear surface
Y
Z0
1
25
Logistic RegressionLogistic Regression
Y =Y = 1/(1+e 1/(1+e-Z-Z))
Where Where Z = b0 + b1 X1 + b2 X2 +… Z = b0 + b1 X1 + b2 X2 +… Can be used for classification problemsCan be used for classification problems The output can be used as the The output can be used as the
probability of being of the class (or probability of being of the class (or positive)positive)
Alternatively, any value above a cut-off Alternatively, any value above a cut-off (typically 0.5) is classified as being a (typically 0.5) is classified as being a positive examplepositive example
… … A logistic regression A logistic regression Javascript page
Y
Z0
1
27
Learning TheoryLearning TheoryExample Space X(x,c(x))
Where c and h disagree
c
h+ -
-
-
-
+
The true error of a hypothesis h is the probability thath will misclassify an instance drawn at random from X,
err(h) = P[c(x) h(x)]
x = input attributesc = true class function (e.g. “likes product”)h = hypothesis (model)
28
GeneralizationGeneralization
PAC - A Probabilistic GuaranteePAC - A Probabilistic GuaranteeH = # possible hypotheses in modeling H = # possible hypotheses in modeling
systemsystem
= desired true error, where (0 < = desired true error, where (0 < < 1) < 1)
= desired confidence (1- = desired confidence (1- ), where (0 < ), where (0 < < < 1)1)
The the number of training examples required The the number of training examples required to select (with confidence to select (with confidence ) a hypothesis h ) a hypothesis h with err(h) < with err(h) < is given by is given by )
||(ln
1
H
m