26
1 Improving quality of Improving quality of graduate students by data graduate students by data mining mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart University Bangkok, Thailand

1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

Embed Size (px)

Citation preview

Page 1: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

1

Improving quality of graduate Improving quality of graduate students by data miningstudents by data mining

Asst. Prof. Kitsana Waiyamai, Ph.D.

Dept. of Computer Engineering Faculty of Engineering, Kasetsart

University

Bangkok, Thailand

Page 2: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

2

Content

PART IPART I Introduction to data miningIntroduction to data mining Data mining technique: Data mining technique: association rule association rule

discoverydiscovery Data mining technique: Data mining technique: data data

classificationclassification

PART IIPART II Improving quality of graduate studentsImproving quality of graduate students

by data miningby data mining

ConclusionConclusion

Page 3: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

3

What Is Data Mining ?Knowledge Discovery from Data: KDD (Data Mining):

The process of nontrivial extraction of patterns from data. Patterns that are:

•implicit,

•previously unknown, and

•potentially useful

Patterns must be comprehensible for human users.

Page 4: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

4

Knowledge Discovery Process: Knowledge Discovery Process: Iterative & Interactive ProcessIterative & Interactive Process

Data sourcesDatabases, flat files,

Complex dataData

Warehouses

Preprocessing dataGathering, cleaning

and selecting data

Search for patterns: Data MiningNeural nets, machine learning,

statistics and others Analyst reviews output

Report findings

Take actions based on findings

Interpret results

Mining Objective

Page 5: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

5

What kind of data can be mined?

Relational databases

Data warehouses

Transactional databases and Flat files

Advanced DB systems and information repositories

Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases, multimedia databases Heterogeneous and legacy databases World Wide Web Bioinformatic data

Databases

DataWarehouse

Page 6: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

6

Two modes of data miningPredictive data mining Predict behavior based on historic data Use data with known results to build a model

that can be later used to explicitly predict values for different data

Methods: classification, prediction, … etc.

Descriptive data mining Describe patterns in existing data that may be

used to guide decisions Methods: Associations rule discovery,

Sequence pattern discovery, Clustering, … etc.

Page 7: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

7

Data Mining Techniques

Data ClusteringAssociation rule discoveryData ClassificationOutlier detectionData regressionEtc.

Page 8: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

8

Associations Rule DiscoveryAssociations Rule Discovery

ID Items Bought1 Juice,Coke,Beer2 Juice,Coke,Wine3 Juice,Gin4 Coke,Beer5 Juice,Coke,Beer,Wine6 Gin

Item Transactions the Item Occurs in

Juice 1,2,3,5Coke 1,2,4,5Beer 1,4,5Wine 2,5Gin 3,6

Rules with Support >50%

Supporting Transactions

Confidence

Juice=>Coke 1,2,5 75%

Coke=>Juice 1,2,5 75%

Coke=>Beer 1,4,5 75%Beer=>Coke 1,4,5 100%

Customer Transaction Database

Items associated with the transactionswhere they occur in

Association Rules WithMinimum Support of 50%

xY

Z

Page 9: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

9

Classification is the process of assigning new objects to predefined categories or classes

Given a set of labeled records Build a model

Predict labels for future unlabeled recordsExample: Age, Educational background, Annual income,

Current debts, Housing location => Making Decision

Degree=“Master” and Income=7500 => Credit=“Excellent”

Data Classification

Page 10: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

10

Three-Step Process of Classification

Model construction

Model Evaluation

Classification

Classifier Model

Training Data

Testing Data

Classifier Model Unseen Data

Page 11: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

11

Data Mining Tools

ANGOSS KnowledgeStudioIBM Intelligent MinerMetaputer PolyAnalystSAS Enterprise MinerSGI MinesetSPSS ClementineMany others

More at http://www.kdnuggets.com/software

Page 12: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

12

Data Mining ProjectsChecklist: Start with well-defined questions Define measures of success and failure

Main difficulty: No automation Understanding the problem Data preparation Selection of the right mining methods Interpretation

Page 13: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

13

Using Data Mining for Improving Quality

of Engineering Graduates Objective: Discover knowledge from large databases of

engineering student records.Discovered knowledge are useful in:

- Assisting in development of new curricula,- Improvement of existing curricula,- Helping students to select the appropriate major

Page 14: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

14

Using a data mining technique to help students in selecting their majors

Motivation: - Student major selection is very important factor for his/her success.- Lack of experience and information on each major.

Solution: - Find out the profiles of good students for each major using student profile database and course enrollment student databases (10 years)- Determine the most appropriate major for each student

Page 15: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

15

A Data Mining based Approach for Improving Quality of Engineering Graduates

DB2

SQL Server

course enrollment student databases

course enrollment student databases

student profile database

student profile database

Data Mining Tool

Java Servlet

User

Page 16: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

16

Data for Data MiningStu_cod

e Sex Addres

s Sch_GP

A ..... GPA

37058063

male

Bangkok

2.5 ..... 2.3

37058167

male

Songkla

3.4 ..... 3.2

........... .... ....... ...... .... ....

Stu_code Sub_code Term Year Grade

37058063

204111 1 2537 C+

37058063

403111 1 2537 D

37058063

208111 1 2537 B+

Student profile database

course enrollment student databases

Page 17: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

17

Data preparation a classification model

Stu_code

Sex

Address

Sch_GPA

.....

GPA

37058063

male

Bangkok

2.5 .....

2.3

37058167

male

Songkla

3.4 .....

3.2

...........

.... ....... ...... .... ....

Stu_code

Sub_code

Term

Year

Grade

37058063

204111

1 2537

C+

37058063

403111

1 2537

D

37058063

208111

1 2537

B+

Stu_code

Sex 204111

403111

… GPA

37058063

male

Medium

Low .... 2.3

37058167

male

High High ..... 3.2

....... .....

...... ....... ..... ......

+

Page 18: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

18

Global Classification ModelGlobal Decision Tree which determines which majors should be appropriate to which students.

Each internal node represents a test on student’s profile.

Each leaf node represents an appropriate major to be selected

Page 19: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

19

Drawbacks of Global Classification Model

- Low Precision ~ 50% due to the large number of majors- Number of students is different in each department => the model cannot predict correctly the best major to be selected.- The model proposes a unique major to be selected, a set of possible majors ordered by appropriateness score would be preferred.

Page 20: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

20

Classification Model for Each Major

- Decision tree predicts whether a student is likely to be a good student in a given major.

- Good students are those that graduate within 4 years and are at the first 40% ranking in a given major.

- Leaf nodes represent two class: Good and Bad

Page 21: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

21

Advantage of Major’s Classification Model

Good precision 80% The model predicts the best major to be

selected even if number of students in each major is different

Its proposes a set of possible majors to be selected ordered by appropriateness score.

Encountered problems

Database size Other factors that could affect student’s decision:

Teacher Preference, etc.

Page 22: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

22

Presentation of Discovered Knowledge

Page 23: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

23

Applying Association rule discovery for Grade prediction

Basket Analysis

204111

Medium

403111

High

417167

Medium

417168

Medium

Education

Page 24: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

24

Grade Prediction for the Coming Term

Page 25: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

25

Presentation of Discovered Knowledge

Page 26: 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart

26

Conclusion & Future works

Application of data mining in Education

Use data mining techniques for improving quality of engineering students

Apply data mining techniques to several other educational domains.