8
27-18 września 2012 1 Data Mining dr Iwona Schab

27-18 września 20121 Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,

Embed Size (px)

Citation preview

Page 1: 27-18 września 20121 Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,

27-18 września 2012 1

Data Mining

dr Iwona Schab

Page 2: 27-18 września 20121 Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,

2

Semester timetable

ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING

1 Sources of data in business, administration, science and technology.

2 The process of discovering knowledge in data; the role of data mining in this process.

3 Data mining and Business Intelligence. 4 SEMMA methodology. 5 Data preparation: sampling, cleaning, normalization and

standardization. 6 Association rules discovery. 7 Classification problems: case studies.

Page 3: 27-18 września 20121 Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,

3

Semester timetable

8 Rule induction systems: algorithms, knowledge representation.

9 Decision trees: partition rules and pruning. 10 Classification based on probability distributions: naive

Bayes estimation and Bayesian networks. 11 Grouping problems - case studies. 12 Cluster analysis: combinatorial and hierarchical methods. 13 Modeling response to direct mail marketing. 14 Churn analysis. 15 Text mining. 16 Web mining. 17 Data mining in Life Science. 18 Comparative analysis of algorithms implemented in SAS

Enterprise Miner and WEKA software.

Page 4: 27-18 września 20121 Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,

4

Literature

Basic

Paolo Giudici, Applied Data Mining. Statistical Methods for Business and Industry, Wiley, New York 2011

Supplementary

Selected papers to be circulated

Daniel T.Larose, Discovering Knowledge in Data: An Introduction to Data Mining, Wiley, New York 2005

Daniel T.Larose, Data Mining Methods and Models, Wiley, New York 2006

Page 5: 27-18 września 20121 Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,

5

Statistical Analysis?

Page 6: 27-18 września 20121 Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,

6

Data Mining

to mine = to extract (e.g. precious, hidden resources from the Earth)

Different definition and understanding depending on user

New dyscipline developed from computing and statistics

In-depth search to find additional information (previously unnoticed in the mass of data available)

Data preparation and „structuring unstructured” needed

Machine learning = finding relations and regularities in data Generalisation from the observed data to new unobserved case

Page 7: 27-18 września 20121 Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,

7

KDD Process (Knowledge Discovery in Database)

Page 8: 27-18 września 20121 Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,

8

Software

www.sgh.waw.pl/ogolnouczelniane/ci/aplikacje/oprogramowanie/

SAS/STAT

SAS Enterprise Miner

--- Other: Statistica, SPSS WEKA