Data mining project presentation

Classification Technique KNN in Data Mining

---on dataset “Iris”

Comp722 data miningKaiwen Qi, UNC

Spring 2012

Outline

Dataset introduction Data processing Data analysis KNN & Implementation Testing

Dataset Raw dataset Iris(http

://archive.ics.uci.edu/ml/datasets/Iris)

150 total records

50 records Iris Setosa

50 records Iris Versicolour

50 records Iris Virginica

5 Attributes

Sepal length in cm(continious number)

Sepal width in cm(continious number)

Petal length in cm(continious number)

Petal width in cm(continious number)

Class(nominal data: Iris Setosa Iris Versicolour Iris Virginica)

(a) Raw data

(b) Data organization (C) Data

organization

Classification Goal

Data Processing

Original data

Data Processing

• Balanced distribution

Data Analysis

Statistics

Data Analysis

Histogram

Data Analysis

Histogram

KNN algorithm

The unknown data, the green circle, is classified to be square when K is 5. The distance between two points is calculated with Euclidean distance d(p, q)= . .In this example, square is the majority in 5 nearest neighbors.

Advantage the skimpiness of implementation. It is

good at dealing with numeric attributes. Does not set up the model and just

imports the dataset with very low computer overhead.

Does not need to calculate the useful attribute subset. Compared with naïve Bayesian, we do not need to worry about lack of available probability data

Implementation of KNN Algorithm

Algorithm: KNN. Asses a classification label from training data for an unlabeled data

Input: K, the number of neighbors. Dataset that include training data Output: A string that indicates unknown tuple’s classification

Method: Create a distance array whose size is K Initialize the array with the distances between the unlabeled tuple with

first K records in dataset Let i=k+1 calculate the distance between the unlabeled tuple with the (k+1)th

record in dataset, if the distance is greater than the biggest distance in the array, replace the old max distance with the new distance; i=i+1

repeat step (4) until i is greater than dataset size(150) Count the class number in the array, the class of biggest number is

mining result

Implementation of KNN

Testing

Testing (K=7, total 150 tuples)

Testing Testing (K=7, 60% data as training data)

Testing

Input random distribution dataset

Random dataset

Accuracy test:

Performance

Comparison Decision tree

Advantage• comprehensibility • construct a decision tree

without any domain knowledge• handle high dimensional • By eliminating unrelated

attributes and tree pruning, it simplifies classification calculation

Disadvantage• requires good quality of training

data. • usually runs in memory • Not good at handling

continuous number features.

Advantage• relatively simply. • By simply calculating

attributes frequency from training data and without any other operations (e.g. sort, search),

Disadvantage• The assumption of

independence is not right

• No available probability data to calculate probability

Naïve Bayesian

Conclusion

KNN is a simple algorithm with high classification accuracy for dataset with continuous attributes.

It shows high performance with balanced distribution training data as input.

ThanksQuestion?

Data mining project presentation

Technology

Multifactor Dimensionality Reduction Laura Mustavich Introduction to Data Mining Final Project Presentation April 26, 2007

121 Mining Investment EMEA Conference Presentation€¦ · Gold Project and 100% of the Espina Gold Project . El Zorro Gold Project Presentation May 2020 Tesoro Resources Limited

Ministry of environment, mining and spatial planning ...€¦ · Ministryof environment, mining and spatial planning Departmentof project management Presentation content Waste sector

Mining Data Streams Presentation

seiaa.gujarat.gov.in Project — I(a) Ordinary Clay Mining Project — I(a) Bentonite mining Project Sand Stone mining Project ... Sintex Industries Ltd., Dist. Ahmedabad

Wongai Project Presentation - Bounty Mining€¦ · Wongai Project Presentation 16 September, 2014 Geos Mining Limited Andrew Todd, ... Carbonates % (daf) 0.02 0.21 COKING PROPERTIES

McEwen Mining Corporate Presentation

2011, PRESENTATION, Introducing key project opportunities in the Mongolian mining sector, Mr. Boris Matveev

Mining Presentation

Asteroid Mining Project

web-mining project

Red Eagle Mining Presentation

Sand Mining Project

02.20-21.2014, PRESENTATION, MINING INFRASTRUCTURE INVESTMENT SUPPORT PROJECT, B. Enkhbaatar

ARM Mining Indaba Presentation Mike Schmidt - · PDF fileARM Mining Indaba Presentation ... the reprocessing of lower grade dumps. ARM Coal ... Project with Transnet’s expansion

Mining Presentation-11142014

Project Details – Riverbed Sand Mining Projectenvironmentclearance.nic.in/.../060920173VYI3FXJProjectDetailssand.pdf · Project Details – Riverbed Sand Mining Project Sr. No

Orlaco presentation - Mining

Web Opinion Mining - Presentation

2015-03-19 UD Mining Presentation - MECSA Corporate Overview Projects Overview Mining and Exploration Tenure Project development priorities & Status (MDS & Broughton Projects)