A Brief Tutorial On Data Mining-20140701

Preview:

DESCRIPTION

An introduction to data mining and RapidMiner, OMNI-Lab, 20140701

Citation preview

A BRIEF TUTORIAL ON DATA MINING

Xiaming Chen, OMNI-Lab 2014-07

OUTLINE

• Whats Data Mining?

• A Hands-on Practice

2

WHATS DATA MINING

WHATS DATA MINING

• Science: probability, statistics, graph theory etc.

• Techniques: clustering, classification, regression, prediction etc.

• A way to think about this world.

On textbooks

4

WHATS DATA MINING

• Science? Maybe

In reality

Research on Social Networks5

WHATS DATA MINING

• Prediction? Yes!

In reality

The Highest Creature Intelligence (100%)

Anti-Prediction

6

US Election, Bayes Selection!

WHATS DATA MINING

• The world, thinking? Spying!

In reality

7

“Illegal SPYING below!”

WHATS DATA MINING• You Need, You Learn, You Expert

8

Insights Thinking Programming

HANDS-ON PRACTICE

HANDS-ON PRACTICE• Tools to Facilitate Your Data Analysis

• Commercial

• SAS

• IBM SPSS

• Matlab etc.

• Free/Open Source

• RapidMiner + Weka

• R (my favor)

• Python + SciPy + scikit-learn

• Hadoop/Spark etc.

10

HANDS-ON PRACTICE• Example: RapidMiner + StoneFlakes

http://archive.ics.uci.edu/ml/datasets/StoneFlakes11

HANDS-ON PRACTICE• RapidMiner (ads-free)

• A Java-based IDE for ML, data mining, text mining etc.

• Modular design, graphic interface, zero-line coding

• Complete Process logic: data ETL, visualization, modeling, prediction, reports etc.

• Growing extension market

• CLI and API for other programs

• Call functions of Weka and RDownload: http://www.rapidminer.com/12

HANDS-ON PRACTICE

• StoneFlakes • StoneFlakes.csv: flake

attribute information

• annotation.csv: inventory properties

Formated: http://io.hsiamin.com/data/StoneFlakes.tar.gz13

HANDS-ON PRACTICE

• Demo

14

SUMMER COURSE• Spatial-temporal Data Analysis

• 郑宇,MSR

• 7.1 ~ 31, 2014

• 周⼆二、四下午2:00 ~ 5:40

• 闵⾏行上院316

15

http://www.hsiamin.com

Thanks caesar0301@github

16