22
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1

Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Introduction to Data Mining and MachineLearning TechniquesIza Moise, Evangelos Pournaras, Dirk Helbing

Iza Moise, Evangelos Pournaras, Dirk Helbing 1

Page 2: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Overview

Main principles of data miningDefinitionSteps of a data mining processSupervised vs. unsupervised data mining

Applications

Data mining functionalities

Iza Moise, Evangelos Pournaras, Dirk Helbing 2

Page 3: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Definition

Data mining

is the automated process of discovering interesting (non-trivial, pre-viously unknown, insightful and potentially useful) information orpatterns, as well as descriptive, understandable, and predictive mod-els from (large-scale) data.

Also known as:

• Knowledge discovery in databases (KDD)

• Data analysis

• Information harvesting

• Business intelligence

Iza Moise, Evangelos Pournaras, Dirk Helbing 3

Page 4: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Goals

X search consistent patterns and/or systemic relationshipsbetween data

X validate the findings by applying the detected patterns to newsubsets of data

X predict new findings on new datasets

Iza Moise, Evangelos Pournaras, Dirk Helbing 4

Page 5: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Data Mining is...

• k-means clustering

• decision trees

• neural networks

• Bayesian networks

Iza Moise, Evangelos Pournaras, Dirk Helbing 5

Page 6: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Data Mining is not...

• Data warehousing

• SQL / Ad Hoc Queries / Reporting

• Software Agents

• Online Analytical Processing (OLAP)

• Data Visualization

Iza Moise, Evangelos Pournaras, Dirk Helbing 6

Page 7: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Knowledge Discovery Process

Iza Moise, Evangelos Pournaras, Dirk Helbing 7

Page 8: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Steps of a data mining process

1. Exploration

2. Model building and validation

3. Deployment → prediction

Iza Moise, Evangelos Pournaras, Dirk Helbing 8

Page 9: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Exploration

• data preparation→ data cleaning, data transformation etc.

• elaborate exploratory analyses using a wide variety ofgraphical and statistical methods, depending on the nature ofthe analytic problem

• determine the general nature of models that can be taken intoaccount in the next stage

Iza Moise, Evangelos Pournaras, Dirk Helbing 9

Page 10: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Model building and validation, and Deployment

• Model building and validation:→ consider various models and choose the best one→ validate the model on existing data

• Deployment:→ apply the model to new data

Iza Moise, Evangelos Pournaras, Dirk Helbing 10

Page 11: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Model building and validation, and Deployment

• Model building and validation:→ consider various models and choose the best one→ validate the model on existing data

• Deployment:→ apply the model to new data

Iza Moise, Evangelos Pournaras, Dirk Helbing 10

Page 12: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Model building and validation, and Deployment

• Model building and validation:→ consider various models and choose the best one→ validate the model on existing data

• Deployment:→ apply the model to new data

Iza Moise, Evangelos Pournaras, Dirk Helbing 10

Page 13: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Two styles of data mining: Predictive andDescriptive

1. Predictive→ Predict the value of a specific attribute (target or dependent)based on the value of other attributes (explanatory orpredictors)

2. Descriptive→ Derive patterns that summarise the relationships amongdata points

Iza Moise, Evangelos Pournaras, Dirk Helbing 11

Page 14: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Supervised vs. Unsupervised

Supervised

• predictive or directed

• useful when you have a specifictarget value to predict aboutyour data

• Classification, Regression,Anomaly Detection

Unsupervised

• descriptive or undirected

• finds hidden structure andrelation within the data

• Clustering, Association, FeatureExtraction

Iza Moise, Evangelos Pournaras, Dirk Helbing 12

Page 15: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Supervised Data Mining

• a pre-specified target variable

• explanatory variables or predictors and ↔ one (or more)dependent variables or target

• goal: specify relationships between predictors and target

• training: the model is given many training data where the targetis known

• testing: the model is applied to data for which the target isunknown

Iza Moise, Evangelos Pournaras, Dirk Helbing 13

Page 16: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Unsupervised Data Mining

• determine the existence of classes or clusters in the data

• exploratory analysis

• all variable are treated in the same way

Iza Moise, Evangelos Pournaras, Dirk Helbing 14

Page 17: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Overview

Main principles of data miningDefinitionSteps of a data mining processSupervised vs. unsupervised data mining

Applications

Data mining functionalities

Iza Moise, Evangelos Pournaras, Dirk Helbing 15

Page 18: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Fielded Applications

• Web mining→ PageRank, Search Engine, Recommenders, Social Networks

• Screening Images

• Diagnosis• Marketing and Sales

→ Market basket analysis, Direct marketing

• Biology, biomedicine, astronomy, chemistry

Iza Moise, Evangelos Pournaras, Dirk Helbing 16

Page 19: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Beers and Diapers

Iza Moise, Evangelos Pournaras, Dirk Helbing 17

Page 20: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

Overview

Main principles of data miningDefinitionSteps of a data mining processSupervised vs. unsupervised data mining

Applications

Data mining functionalities

Iza Moise, Evangelos Pournaras, Dirk Helbing 18

Page 21: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

1. Supervised Data MiningX ClassificationX RegressionX Outlier detectionX Frequent pattern mining

2. Unsupervised Data MiningX ClusteringX Feature Extraction

→ definition

→ real use-cases

→ method

→ pros and cons

Iza Moise, Evangelos Pournaras, Dirk Helbing 19

Page 22: Introduction to Data Mining and Machine Learning Techniques · 2016-04-06 · Definition Data mining is theautomatedprocess of discoveringinteresting(non-trivial, pre-viously unknown,

1. Supervised Data MiningX ClassificationX RegressionX Outlier detectionX Frequent pattern mining

2. Unsupervised Data MiningX ClusteringX Feature Extraction

→ definition

→ real use-cases

→ method

→ pros and cons

Iza Moise, Evangelos Pournaras, Dirk Helbing 19