Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
. . 2556 16-18 2556
Failure Analysis of A Ball Valve by using Data Mining
1* 2 3 1,2,3
E-mail: [email protected]*
Chantra Nakvachiratrakul1* Kasem Pipatpanyanukul2 Premkamol Preechaporn3 1,2,3Department of Industrial Engineering, Faculty of Engineering, Burapha University, Chonburi
E-mail: [email protected]*
(Performance)
40% (failure mode) 30% 2
(Ball valve) 4
Weka (Data mining)
Dataset 214 Input (Classify) Decision tree
Decision tree 76% 44.09
24.62 43.64%
Abstract
Performance records of a part repairing shop was adopted as a case study showing that about 40% of
repairing tasks had problem dealing with too much time consuming in analyzing its failure modes. In addition, it was found that about 30% of the repairing jobs needed twice or more time to duplicate due to the incorrect
diagnosis of the failure. This was caused by the inadequate experience (less than four months) of the
employees regarding ball valve repairing. This study presented Weka software for classifying the failure modes using the Data Mining for the repairing shop comprising 214 datasets pertaining to the incoming
parameters checking results, as well as actual failure modes. In fact, the datasets as such were the input of the classifying process with use of the Decision Tree, one of a �divide-and-conquer� approach to the problem
of learning a set independent instances. Despite the employee�s lack of reparation and analysis experience, this still enabled them to successfully determine ball valve repairing method by themselves. Overall, the
records from this study revealed that the Decision Tree could classify the ball valve repairing jobs with the
correctness of 76% resulting in the reduction of average time for ball valve failure mode analysis from 44.09
. . 2556 16-18 2556
minutes per task to 24.62 minutes per task. This means it could reduce the time for the analysis of up to
43.64%
Keywords: ball valve, Data Mining, classifying
1.
-
6
5-10
( 2-4
)
( 1 )
2.
2.1 Data mining
� � (Data mining)
(Algorithm)
(Artificial intelligent: AI) (Statistical method)
�
� [1] �
� [2]
�� (Decision tree method)
4 (Classification) �
� (overfitting)
Pruning
2.1.1 Decision Trees
Model
(Machine learning)
ID3 Quinlan, C4.5, C5, J48
CART J48
ID3 (top-down) [3]
J48 ID3 C4.5, C5 [4]
. . 2556 16-18 2556
(output) (input)
(class)
J48 WEKA (open source) Java
C4.5 ID3
J48 C4.5
ID3 [5] 2.2.2 Building
Classification Trees (split)
(
)
2
2
1 [6]
attribute
(attribute selection measure or goodness of split) [7]
Impurity measure attribute the entropy
expected information attribute
information gain entropy
S
class m classes class Ci (i=1,�,m) si
class Ci
Entropy expected information [7] m
iiim ppsssI
1221 )(log),...,,( (1)
),...,(...
)( 11
1
mjj
v
j
mjjssI
s
ssAE (2)
)(),...,,()( 21 AEsssIAGain m (3)
pi attribute class Ci si/s A
attribute v
3.
J48
123
3.1
3.1.1 (Collect data)
214
6 6
2 class 7 class
. . 2556 16-18 2556
2
6 class (B1-B6) 1 class (B7)
1 6
space 64 64 class
3.1.2
(Attributes) 5 Attributes
Testleak, Frontleak, Backseat, Testshell
DefectivePart Attribute 4
DefectivePart 375
214 3.1.3 Input Weka
Weka Input
CSV, ARFF Input CSV ARFF Attributes
5 Nominal Type
3.1.4 (Separate into Training and Test set)
10 (10 Folds � Cross-Validation)
3.1.5
(Supervised training algorithm) Decision tree J48
[7]
(1) create a node N; (2) if samples are all of the same class, C then
(3) return N as a leaf node labeled with the class C;
(4) if attribute-list is empty then
. . 2556 16-18 2556
(5) return N as leaf node labeled with
the most common class in sample; // majority voting
(6) select test-attribute, the attribute among attribute-list with the highest
information gain;
(7) label node N with test-attribute; (8) for each known value ai of test-attribute //
partition the samples (9) grow a branch from node N for the
condition test-attribute = ai; (10) let si be the set of sample in
samples for which test-attribute = ai; // a partition
(11) if si is empty then
(12) attach a leaf labeled with the most common class in
samples; (13) els attach the node returned by
Generate_dexision_tree(si, attribute-list-test-attribute);
4.
- Number of Leaves : 49
- Size of the tree : 60
- Correctly Classified Instances 164
76.6355 %
- Incorrectly Classified Instances 50
23.3645 %
- Mean absolute error 0.0083
- Root mean squared error 0.0821
5.
Decision tree
Decision
tree 76%
44.09 24.62
43.64%
[1] Usama M. Fayyad. Data mining and knowledge discovery: Making sense out of data. IEEE
Expert: Intelligent Systems and Their Applications, 11(5):20�25, 1996.
[2] R. Kruse G. Della Riccia and H. Lenz. Computational Intelligence in Data Mining.
Springer, New York, NY, USA, 2000. [3] Witten, I.H., Eibe, F.(2005), Data Mining:
Practical Machine Learning Tools and
Techniques, 2nd Edition, Morgan Kaufmann, San Fransisco.
[4] Quinlan, J.R. (1993) �C4.5: Programs for Machine Learning�. Morgan Kaufmann, San
Mateo, CA. [5] Aman Kumar Sharma et al., A Comparative
Study of Classification Algorithms for Spam Email Data Analysis, International Journal on
Computer Science and Engineering (IJCSE), Vol.
3 No. 5 May 2011, pp. 1890-1895 [6] A. Feelders. Classification trees.
http://www.cs.uu.nl/docs/vakken/adm/trees. [7] J. Han and M. Kamber (2001) �Data Mining:
Concepts and Techniques� Morgan Kaufmann Publishers