19
START Introd uction Proble m Method s Disccu sion Evalua tion & Vali dation Conclu sion Refere nces END COMPARISON C4.5, NEURAL NETWORK AND NAÏVE BAYES ALGORITHM FOR TIMELY PREDICTION OF GRADUATION Presented in International Conference Paper Computer Science and Information Technology (CSIT-2013) JUNE 2013 By: Asep Saefulloh Himawan Arisantoso Moedjiono Nazori AZ 23/06/22 1

Presentation Exclusive Raharja Ubl Attahiriyah

Embed Size (px)

Citation preview

Page 1: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

COMPARISON C4.5, NEURAL NETWORK AND NAÏVE BAYES ALGORITHM FOR TIMELY

PREDICTION OF GRADUATION

Presented in International Conference Paper Computer Science and Information Technology

(CSIT-2013) JUNE 2013

By:

Asep Saefulloh Himawan

Arisantoso Moedjiono Nazori AZ

Senin 17 April 2023 1

Page 2: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

INTRODUCTION

Senin 17 April 2023 2

Prediction graduation timely is done currently only based forecaster of the data GPA (grade point average) and the IMK (Cumulative Quality Index) previous semester

Predictionis Similiar

Classifi cation

Estimation

only prediction is used to predict specific values that will occur in the

Future

Meanwhile, universities Raharja have a dataset AO (Attendance Online) and SIS (Student Information Services), which is not fully utilized. So far, there is a presumption of the forecaster university that the to predict the graduation rate exact time simply by looking at the data and the IMK previous GPA.

Page 3: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

INTRODUCTION

Senin 17 April 2023 3

From the problems

We conducted this study Which

To Conduct Classification data mining the dataset AO and SIS

Is already stored in the database DMQ to obtain predictions timely graduation.

Page 4: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

INTRODUCTION

Senin 17 April 2023 4

In this study to predict of graduation exact time, will be

done the comparison on three classification algorithms

data mining that is :

1. C4.5,

2. Naive Bayes

3. and Neural Network.

Data from DMQ which have been cleaned will be processed by

using tools Weka, examination of classification model of data

mining in this research applies cross validation, confusion matrix,

and curve ROC (Receiver Operating Characteristic).

Page 5: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

PROBLEM

Senin 17 April 2023 5

Problem formula is :

Is algorithm C45, Naive Bayes and Neural Network be algorithms which can be applied in determining the prediction of graduation timely?

Best which algorithm in determining prediction of graduation timely ?

From chosen algorithm does can present result of data forecast of classification of datamining by presenting graduation timely ?

Page 6: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

RESEARCH METHODS

Senin 17 April 2023 6

The study was designed using a model CRISP-DM (Cross Industy Standard Process for Data Mining), in this method there are 6 stages [7]:Research use Weka (Weikato Environment Knowledge and Analysis) tools 3.6.4 version, is one of the tools for data mining base on open source software (GPL) and using java engine.

Business/Research Understanding PhaseData obtained from secondary data from a database DMQ stored on a server Higher Education Prog.

Data Understanding Phase (Fase Pemahaman Data)Database DMQ as 5842. Processing performed on the data that is used by 7 attributes or variables used in the prediction of graduation timely is: Nim, Student Name, Study of Education, Department, GPA, IMK and Prediction. of 7 attributes 2, Predictor namely GPA and IMK and 1 attributes goal to graduate on time.

Data Preparation PhaseAfter performing a query against the database DMQ obtained 891 records that will be processed by Weka.

Modeling Phase In this study, using three algorithms are algorithms C4.5, Naive Bayes and Neural Network.

Evaluation PhaseEvaluation and validation is performed by using Confusion Matrix and the ROC curve (Receiver Operating Characteristic).

Deployment Phase At this stage rule applied to the model or the most accurate in predicted graduation on time and can then be used to evaluate new data.

Page 7: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

DISCUSSION

Senin 17 April 2023 7

This study aims to compare the accuracy of the resulting by engineering or data mining models namely algorithm C4.5, Naive Bayes, and Neural Network in making predictions for timely graduation. Algoritma C4.5/J48Steps to make the algorithm using data C4.5 totaling 891 training data, namely:a. Prepare training datab. Calculate the value of entropyc. Furthermore calculate the gain for each attribute and a select gain value

the highest. For example, for the attribute GPA will get Gain

Page 8: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

Senin 17 April 2023 8

Page 9: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

Senin 17 April 2023 9

Of the value of entropy and the gain obtained by Table 1, we then determine the next node, that node 1.1, and the calculation of entropy and the gain of each attribute of the GPA.

From Figure 2 dec is ion t ree above d iscovered ru les ru le as fo l lows:a . GPA is> = 3 .7 THEN Graduat ing on t imeb. GPA is> = 2 .7 THEN Graduat ing on t imec . GPA is> = 2 .0 THEN Graduat ing on t imed. GPA is <= 1 .99 THEN Graduat ion is not t imely

Figure 2. Decision Tree Classifier Trees J48

Page 10: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

Algorithm Naive Bayes

Senin 17 April 2023 10

Method Naive Bayes using training data record number of 891 as the C4.5 methods

In the training data contained 891 records with 729 cases of graduating on time, and 162 cases did not graduate on time, to determine the prior probability using the formula :

Page 11: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

Senin 17 April 2023 11

Page 12: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

Algorithm Neural Network

Senin 17 April 2023 12

Neural network using back propagation algorithm in 6 (six) of the lesson is to compute or initialize the value of initial weight between -0.1 to 1.0 for the input layer, hidden layer and the bias or threshold. These are generated from neural net training data using the tools Weka multilayerperceptron.

Figure 3. Neural Net The resulting MLP

Page 13: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

EVALUATION AND VALIDATION

Senin 17 April 2023 13

Comparison of test results of the three algorithms as shown in Table 3 are found the highest accuracy values obtained Neural Network and C4.5 Algorithm and lows that followed Naive Bayes, measurenment that get to be used for precision, recall dan accuracy.

Page 14: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

ROC Curve

Senin 17 April 2023 14

In each test the Weka basically will instantly appear values ROC (Receveir Operating Characteristic).

Figure 4. Plot for AUC on Algorithm C4.5 with Class LTW

Value Area Under the Curve (AUC) is 1 for the calculation of class the value graduated on time in the algorithm C4.5. As for the Neural Network value or Area Under the ROC curve Curve (AUC) is a class 1 for the calculation of the value of Pass Not the Right Time. Area Under Curve (AUC) using formula below

Page 15: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

ANALYSIS AND COMPARATIVE

Senin 17 April 2023 15

Of the three models, it can be seen that the value of accuracy, precision, sensitivity, recal, and the highest AUC values obtained in testing the model C4.5 and Neral Network with a balanced outcome and final Naive Bayes models as shown in Table 5 below:

For classification data mining, values AUC can be divided into several groups a. 0.90-1.00 = classification very goodb.0.80-0.90 = classification goodc. 0.70-0.80 = classification is quited. 0.60-0.70 = classification poore. 0.50-0.60 = classification false

can be concluded that the method C4.5, naïve bayes, and neural network is classified as very good as it has Area Under Curve (AUC) values between 0.90-1.00.

Page 16: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

Senin 17 April 2023 16

Figure 5. The Application Of Classification of Prediction of Graduation Timely with Engine Java

Page 17: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

CONCLUSION

Senin 17 April 2023 17

1. That algorithm C4.5, Naive Bayes, and Neural Network are algorithms

that can be used in determining prediction graduation time.

2. Best algorithm is the algorithm of the highest level of accuracy in the

classification model, namely C4.5 and Neural Network with rate

accuracy 100% while Naive Bayes 99.8878%. The third algorithm is

classified as very good value AUC (Area Under the Curve) between

0.90-1.00 so it can be used for predictive applications.

3. From the algorithm selected to show NIM, Student Name, GPA, IMK,

Prediction graduation timely is the result of classification datamining

using java engine.

Page 18: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

REFERENCES

Senin 17 April 2023 18

Page 19: Presentation Exclusive Raharja Ubl Attahiriyah

• START

• Introduction

• Problem

• Methods

• Disccusion

• Evaluation & Validation• Conclusion

• References

• END

THANK YOU FOR ATTENTION

Monday, April 17, 2023 19Senin 17 April 2023 19