Upload
albert-orriols-puig
View
1.796
Download
1
Tags:
Embed Size (px)
Citation preview
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 4Slides based on Francisco Herrera course on Data Mining
Albert Orriols i Puigi l @ ll l [email protected]
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull
Recap of Lecture 3
Typically, techniques in ML have been divided in different paradigms
Inductive learning
Explanation-based learningp g
Analogy-based learning
Evolutionary learningEvolutionary learning
Connectionist Learning
Slide 2Artificial Intelligence Machine Learning
Recap of Lecture 3
Problems that we’ll studyD t l ifi ti C4 5 kNN N ï B1. Data classification: C4.5, kNN, Naïve Bayes …
2. Statistical learning: SVM
3. Association analysis: A-priori
4. Link mining: Page Rank
5. Clustering: k-means
6. Reinforcement learning: Q-learning, XCSg g,
7. Regression
8 Genetic Fuzzy Systems8. Genetic Fuzzy Systems
Slide 3Artificial Intelligence Machine Learning
Today’s Agenda
Situation: Where Are We?ClassificationClassificationPredictionClusteringAssociation D t Mi i S tData Mining Systems
Slide 4Artificial Intelligence Machine Learning
Situation: Where Are We?
The input consists of examples featured by different characteristicsdifferent characteristics
Slide 5Artificial Intelligence Machine Learning
Situation: Where Are We?What can we do with a bunch of examples?
Depend on the type of examples we may haveClassification: Find the class to which a new instance belongs tog
E.g.: Find whether a new patient has cancer or not
Numeric prediction: A variation of classification in which the output p pconsists of numeric classes
E.g.: Find the frequency of cancerous cell found
Regression: Find a function that fits your examplesE.g.: Find a function that controls your chain process
Association: Find association among your problem attributes or variables
E Fi d l ti h ti t ith hi h bl d iE.g.: Find relations such as a patient with high-blood-pressure is more likely to have heart-attack disease
Clustering: Process to cluster/group the instances into classes
Slide 6
Clustering: Process to cluster/group the instances into classesE.g.: Group clients whose purchases are similar
Artificial Intelligence Machine Learning
Data Classification
Test set
Information basedon experience
Knowledget ti
New instance
Dataset Learner Modelon experience extraction
Predicted Output
Training set
Slide 7Artificial Intelligence Machine Learning
Example of Data Classification
Data Set Classification Model How
The classification model can be implemented in several ways:• Rules• Decision trees• Decision trees• Mathematical formulae
Slide 8Artificial Intelligence Machine Learning
Classification as a Two-Step Process
Model usage: to classify future or unknown objectsg y jEstimate the accuracy of the model
The known label of test samples is compared with the labelThe known label of test samples is compared with the label predicted by the systemThe accuracy rate is the proportion of test examples that are y p p pcorrectly classified by the modelThe test set is independent of the training set
If the experts thing that the model is acceptableThen, use to the model to predict unknown examples
Slide 9Artificial Intelligence Machine Learning
Going to Real Worldkatydids
Definition: Given a collection of annotated data (in this case katydids a o a ed da a ( s case a yd dsand grasshoppers), decide what type of insect in the following one
grasshoppers
Slide 10Artificial Intelligence Machine Learning
Going to Real WorldHow can I put a katydid or a grasshopper into my p y g pp ycomputer?
Slide 11Artificial Intelligence Machine Learning
Going to Real WorldThus, the classification problem has been reduced to, p
InsectID
Abdomen L th
AntennaeL th
InsectClID Length Length Class
1 2.7 5.5 Grasshopper2 8.0 9.1 Katydid3 0 9 4 7 Grasshopper3 0.9 4.7 Grasshopper4 1.1 3.1 Grasshopper5 5.4 8.5 Katykid6 2.9 1.9 Grasshopper7 6.1 6.6 Katydid8 0.5 1.0 Grasshopper9 8.3 6.6 Katydid
10 8 1 4 7 Katydid
We have an observation with abdomen length 5 1 and
10 8.1 4.7 Katydid
We have an observation with abdomen length 5.1 and antennae length 7?
Slide 12Artificial Intelligence Machine Learning
Going to Real WorldActually, we could write thaty,
Slide 13
How do I classify this domain?Artificial Intelligence Machine Learning
How to Create Classification Models
We will study some of this methods:The decision tree C4 5The decision tree C4.5
The instance based classifier kNN
Slide 14
The probabilistic classifier Naïve Bayes
Artificial Intelligence Machine Learning
Regression or PredictionPrediction vs data classification
Similarities: Both learn from a data set
DiffDifference:In classification, each example has a class associatedI di ti h l h i l lIn prediction, each example has a numerical value associated
Slide 15Artificial Intelligence Machine Learning
How to Extract a Model?
Prediction works analogously to data classificationU l i h b ild d lUse an algorithm to build a model
Use this model to predict the new unknown example
Types of regressionLinear and multiple regressionNon-linear regression
Two of the most-used approaches to regressionpp gNeural networks
F l b d tFuzzy rule-based systems
Slide 16Artificial Intelligence Machine Learning
ClusteringThe clustering problemg p
Given a data base D={t1, t2, …, tn} of transactions and an integer value k, the clustering problem refers to define a ege a ue , e c us e g p ob e e e s o de e amapping f: D {1,…, k} where each ti is assigned to one cluster kj, 1<=j<=k
Main difference with classificationIn classification each example is labeled with a classIn classification, each example is labeled with a class
In clustering, examples are not labeled
Examples of clusteringSegment customer data base based on similar buying patternsG h i t i tGroup houses in a town into neighborhoods based on similar featuresIdentify new plant speciesIdentify similar web usage patterns
Slide 17Artificial Intelligence Machine Learning
Identify similar web usage patterns
Example of ClusteringPut these people in different clustersp p
Which are the keys?
Define what’s similar
Group similar things in different clusters
Size of the clusters?
Which type of clustering do I want?
Hierarchical clustering?
Partition-based clustering?
Slide 18Artificial Intelligence Machine Learning
Are They Similar?
Slide 19Artificial Intelligence Machine Learning
How to Group the Elements?
Slide 20Artificial Intelligence Machine Learning
Which Type of Clustering?Many types of clusteringy yp g
Hierarchical: Nested set of clusters
Partition-based: One set of clustersPartition-based: One set of clusters
Incremental: Each element handled at one time
Si lt All l t h dl d t thSimultaneous: All elements handled together
Overlapping/non-overlapping
Hierarchical Clustering Partition-based Clustering
Slide 21Artificial Intelligence Machine Learning
Association RulesGiven a set of items I={I1, I2, …, Im} and a database of { , , , }transactions D={t1, t2, …, tn} where ti={Ii1, Ii2, …, Iik} and Iij Є I
The association rule problem is to identify all the rules with form
X Y
R les ith minim m s pport and confidenceRules with minimum support and confidenceSupport: Fraction of transactions which contain both X and Y
Confidence: Measures of how often items in Y appear in transactions that contain X
Slide 22Artificial Intelligence Machine Learning
Example Association Rules
I = {Beer Bread Jelly Milk PeanutButter}I = {Beer, Bread, Jelly, Milk, PeanutButter}
Support of {Bread, PeanutButter} is 60%
Slide 23Artificial Intelligence Machine Learning
Example Association Rules
Slide 24Artificial Intelligence Machine Learning
Before Finishing…Some environments that contain algorithms to perform g pdata classification, regression, clustering and association rule mining
KEEL: http://www keel esKEEL: http://www.keel.es
Weka: http://www.cs.waikato.ac.nz/ml/weka/
Rapid Miner: http://rapid-i.com/content/blogcategory/38/69/
Slide 25Artificial Intelligence Machine Learning
Next Class
Start with data classificationC4.5
Slide 26Artificial Intelligence Machine Learning
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 4Slides based on Francisco Herrera course on Data Mining
Albert Orriols i Puigi l @ ll l [email protected]
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull