Upload
marshall-garrison
View
224
Download
5
Embed Size (px)
Citation preview
1
Improving quality of graduate Improving quality of graduate students by data miningstudents by data mining
Asst. Prof. Kitsana Waiyamai, Ph.D.
Dept. of Computer Engineering Faculty of Engineering, Kasetsart
University
Bangkok, Thailand
2
Content
PART IPART I Introduction to data miningIntroduction to data mining Data mining technique: Data mining technique: association rule association rule
discoverydiscovery Data mining technique: Data mining technique: data data
classificationclassification
PART IIPART II Improving quality of graduate studentsImproving quality of graduate students
by data miningby data mining
ConclusionConclusion
3
What Is Data Mining ?Knowledge Discovery from Data: KDD (Data Mining):
The process of nontrivial extraction of patterns from data. Patterns that are:
•implicit,
•previously unknown, and
•potentially useful
Patterns must be comprehensible for human users.
4
Knowledge Discovery Process: Knowledge Discovery Process: Iterative & Interactive ProcessIterative & Interactive Process
Data sourcesDatabases, flat files,
Complex dataData
Warehouses
Preprocessing dataGathering, cleaning
and selecting data
Search for patterns: Data MiningNeural nets, machine learning,
statistics and others Analyst reviews output
Report findings
Take actions based on findings
Interpret results
Mining Objective
5
What kind of data can be mined?
Relational databases
Data warehouses
Transactional databases and Flat files
Advanced DB systems and information repositories
Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases, multimedia databases Heterogeneous and legacy databases World Wide Web Bioinformatic data
Databases
DataWarehouse
6
Two modes of data miningPredictive data mining Predict behavior based on historic data Use data with known results to build a model
that can be later used to explicitly predict values for different data
Methods: classification, prediction, … etc.
Descriptive data mining Describe patterns in existing data that may be
used to guide decisions Methods: Associations rule discovery,
Sequence pattern discovery, Clustering, … etc.
7
Data Mining Techniques
Data ClusteringAssociation rule discoveryData ClassificationOutlier detectionData regressionEtc.
8
Associations Rule DiscoveryAssociations Rule Discovery
ID Items Bought1 Juice,Coke,Beer2 Juice,Coke,Wine3 Juice,Gin4 Coke,Beer5 Juice,Coke,Beer,Wine6 Gin
Item Transactions the Item Occurs in
Juice 1,2,3,5Coke 1,2,4,5Beer 1,4,5Wine 2,5Gin 3,6
Rules with Support >50%
Supporting Transactions
Confidence
Juice=>Coke 1,2,5 75%
Coke=>Juice 1,2,5 75%
Coke=>Beer 1,4,5 75%Beer=>Coke 1,4,5 100%
Customer Transaction Database
Items associated with the transactionswhere they occur in
Association Rules WithMinimum Support of 50%
xY
Z
9
Classification is the process of assigning new objects to predefined categories or classes
Given a set of labeled records Build a model
Predict labels for future unlabeled recordsExample: Age, Educational background, Annual income,
Current debts, Housing location => Making Decision
Degree=“Master” and Income=7500 => Credit=“Excellent”
Data Classification
10
Three-Step Process of Classification
Model construction
Model Evaluation
Classification
Classifier Model
Training Data
Testing Data
Classifier Model Unseen Data
11
Data Mining Tools
ANGOSS KnowledgeStudioIBM Intelligent MinerMetaputer PolyAnalystSAS Enterprise MinerSGI MinesetSPSS ClementineMany others
More at http://www.kdnuggets.com/software
12
Data Mining ProjectsChecklist: Start with well-defined questions Define measures of success and failure
Main difficulty: No automation Understanding the problem Data preparation Selection of the right mining methods Interpretation
13
Using Data Mining for Improving Quality
of Engineering Graduates Objective: Discover knowledge from large databases of
engineering student records.Discovered knowledge are useful in:
- Assisting in development of new curricula,- Improvement of existing curricula,- Helping students to select the appropriate major
14
Using a data mining technique to help students in selecting their majors
Motivation: - Student major selection is very important factor for his/her success.- Lack of experience and information on each major.
Solution: - Find out the profiles of good students for each major using student profile database and course enrollment student databases (10 years)- Determine the most appropriate major for each student
15
A Data Mining based Approach for Improving Quality of Engineering Graduates
DB2
SQL Server
course enrollment student databases
course enrollment student databases
student profile database
student profile database
Data Mining Tool
Java Servlet
User
16
Data for Data MiningStu_cod
e Sex Addres
s Sch_GP
A ..... GPA
37058063
male
Bangkok
2.5 ..... 2.3
37058167
male
Songkla
3.4 ..... 3.2
........... .... ....... ...... .... ....
Stu_code Sub_code Term Year Grade
37058063
204111 1 2537 C+
37058063
403111 1 2537 D
37058063
208111 1 2537 B+
Student profile database
course enrollment student databases
17
Data preparation a classification model
Stu_code
Sex
Address
Sch_GPA
.....
GPA
37058063
male
Bangkok
2.5 .....
2.3
37058167
male
Songkla
3.4 .....
3.2
...........
.... ....... ...... .... ....
Stu_code
Sub_code
Term
Year
Grade
37058063
204111
1 2537
C+
37058063
403111
1 2537
D
37058063
208111
1 2537
B+
Stu_code
Sex 204111
403111
… GPA
37058063
male
Medium
Low .... 2.3
37058167
male
High High ..... 3.2
....... .....
...... ....... ..... ......
+
18
Global Classification ModelGlobal Decision Tree which determines which majors should be appropriate to which students.
Each internal node represents a test on student’s profile.
Each leaf node represents an appropriate major to be selected
19
Drawbacks of Global Classification Model
- Low Precision ~ 50% due to the large number of majors- Number of students is different in each department => the model cannot predict correctly the best major to be selected.- The model proposes a unique major to be selected, a set of possible majors ordered by appropriateness score would be preferred.
20
Classification Model for Each Major
- Decision tree predicts whether a student is likely to be a good student in a given major.
- Good students are those that graduate within 4 years and are at the first 40% ranking in a given major.
- Leaf nodes represent two class: Good and Bad
21
Advantage of Major’s Classification Model
Good precision 80% The model predicts the best major to be
selected even if number of students in each major is different
Its proposes a set of possible majors to be selected ordered by appropriateness score.
Encountered problems
Database size Other factors that could affect student’s decision:
Teacher Preference, etc.
22
Presentation of Discovered Knowledge
23
Applying Association rule discovery for Grade prediction
Basket Analysis
204111
Medium
403111
High
417167
Medium
417168
Medium
Education
24
Grade Prediction for the Coming Term
25
Presentation of Discovered Knowledge
26
Conclusion & Future works
Application of data mining in Education
Use data mining techniques for improving quality of engineering students
Apply data mining techniques to several other educational domains.