19
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT

BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT

Embed Size (px)

Citation preview

BOĞAZİÇİ UNIVERSITY

DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS

MATLAB AS A DATA MINING ENVIRONMENT

Problem

• Command line features of MATLAB environment

• Absence of the tool including many data mining functions together.

• Hardness of using command line for novice users

• Need for developing interfaces for data mining functions

Solution

• Designing a data mining environment within MATLAB that combines many data mining functionalities

• Using GUI Design Environment (GUIDE) of MATLAB for interface design.

The Project

• This study is the continuation of the last year student project “Developing Data Mining Platform”

• In this project, data mining functions added to MATLAB is transformed to graphical user interfaces and provided usage of all these functions from interfaces

Methodology

• CRISP- DM Methodology

• Data Understanding

• Data Preparation

• Modeling

• Evaluation

MATLAB Environment• High-level language for technical computing • Development environment for managing code, files,

and data • Interactive tools for iterative exploration, design, and

problem solving • Mathematical functions for linear algebra, statistics,

Fourier analysis, filtering, optimization, and numerical integration

• 2-D and 3-D graphics functions for visualizing data • Tools for building custom graphical user interfaces • Functions for integrating MATLAB based algorithms

with external applications and languages, such as C, C++, Fortran, Java, COM, and Microsoft Excel

Menu Structures

• File

• Read• Read_From_File• Read_From_ODBC

• Save

• Save As...

• Exit

Menu Structures-File

• Read_From_File

• Retrieves data from text files and writes to spreadsheet format of the tool

• Read_From_ODBC

• Retrieves data from a data source via an ODBC driver and writes to spreadsheet format of the tool.

Menu Structures

• Data

• Run Matlab Command• Create List• Add List• Remove List• Set Meta• Add Meta• List Meta• Descriptives

Menu Structures- Data

• Run Matlab Command• Works as MATLAB Command Window

• Create List• Creates variable lists for data mining funtionalities

• Add List• Adds new variable names to a variable list and

merge lists.

• Remove List• Remove variable names from a variable list

Menu Structures- Data

• Set Meta• Sets metadata value of a variable

• Add Meta• Add new values to the metadata of a variable

• List Meta• Shows the the values and their metadata values

• Descriptives• Displays statistics of selected variable.

Menu Structures

• Preparation

• Missing_Value

• Sampling

• Transformation

• Discretization

Menu Structures- Preparation

• Missing_Value• Replaces the missing values of variables or

removes rows according to number of missing values in the row

• Sampling• Selects samples from specified data set with

selected sampling method

• Transformation• Transforms the columns into specified ranges

• Discretization• Transforms the data into dicrete values according to

given intervals

Menu Structures

• Functionality

• Association

• Classification

• Clustering

• Regression

Menu Structures- Functionality• Association• Extracts association rules from specified data set.

• Classification• Uses a neural network, finds errors and returns the trained

network and errors within a structure. • Supports cross validation and bootstrap tehniques.

• Clustering• Makes a K-means clustering and finds distances between

clusters and the size of clusters• Regression• Applies multiple linear regression• Finds beta values and errors and returns the beta values of

the regression model and errors within a structure.• Supports cross validation and bootstrap techniques.

• DEMO

Conclusion

• The user interfaces designed for data mining functions.

• This study handles some pre-processing functions and data models, like association different from previous work.

• It provides visuality to data mining functions and increasing user flexibility with embedding different data mining functions and models into the tool.

Recommendations

• Association tool can be embedded to the tool with modification and other data models and data mining functions can be extended

• The report capabilities of the tool can be improved and the functions and reports can serve from internet by using web services.

Thank you...