31
MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in fulfillment of the requirement for the award of the Doctor of Philosophy Faculty of Computer Science and Information Technology Universiti Tun Hussein Onn Malaysia MARCH 20 14

MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in

  • Upload
    dokhue

  • View
    231

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in

MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY

FOR EFFICIENT CATEGORICAL DATA CLUSTERING

RABIEI MAMAT

A thesis submitted in

fulfillment of the requirement for the award of the

Doctor of Philosophy

Faculty of Computer Science and Information Technology

Universiti Tun Hussein Onn Malaysia

MARCH 20 14

Page 2: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in

ABSTRACT

Clustering a set of categorical data into a homogenous class is a fundamental

operation in data mining. A number of clustering algorithms have been proposed and

have made an important contribution to the issues of clustering especially related to

the categorical data. Unfortunately, most of the clustering techniques are not

designed to address the issues of uncertainties inherent in the categorical data.

However, handling the data uncertainty is not an easy task. One method of handling

the data uncertainty in categorical data clustering is by identifying the partition

attribute in the information system. But, with this approach, the computational cost is

still a major issue and the resulting clusters is still dubious. Thus, in this thesis, the

concept of attribute relative which is based on the theory of soft-set is discussed and

consequently introduces an alternative technique to the partition attribute selection

approach for the used in the categorical data clustering. A technique which called

Maximum Total Attribute Relative (MTAR) is able to determine the partition

attribute of the categorical information system at the category level without

compromising the computational cost and at the same time enhance the legitimacy of

the resulting clusters. Experiments on sixteen (16) UCI-MLR benchmark datasets

demonstrate the potentials of MTAR to achieved lower computational time with the

improvements up to 90% as compared to TR, MMR, MDA and NSS. Experiments

also show the objects in the clusters produced by MTAR technique has obvious

similarities and the generated clusters also have better objects coverage

simultaneously increased the cluster validity up to 23% in term of entropy as

compared to MDA.

Page 3: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in

ABSTRAK

Pengklusteran satu set data berkategori ke dalam kelas homogen adalah operasi asas

dalam perlombongan data. Beberapa algoritma pengklusteran data telah dicadangkan

dan telah memberi sumbangan yang besar kepada isu-isu pengklusteran terutamanya

yang berkaitan dengan data berkategori. Malangnya, kebanyakan teknik

pengklusteran tidak direka untuk menangani isu-isu ketidakpastian yang wujud

daIam data berkategori. Walau bagaimanapun, pengendalian ketidakpastian data

bukanlah satu tugas yang mudah. Salah satu kaedah pengendalian ketidapastian data

dalam pengklusteran data berkategori ialah dengan mengenalpasti atribut partisi

dalarn sistem maklumat. Tetapi melalui pendekatan ini, kos komputasi masih

menjadi isu utama dan kluster-kluster yang dihasilkan masih diragui kesahihannya.

Justeru itu, di dalam tesis ini, konsep relatif atribut yang berasaskan kepada teori set-

lembut dibincangkan dan seterusnya memperkenalkan satu teknik alternatif kepada

pendekatan pemilihan atribut partisi untuk digunakan dalam pengklusteran data

berkategori. Teknik yang dipanggil sebagai Jumlah Atribut Relatif Maksimum

(MTAR) dapat menentukan atribut partisi sesebuah sistem maklumat berkategori

diperingkat kategori tanpa menjejaskan kos komputasi dan pada masa yang sama

meningkatkan kesahihan kluster-kluster yang dihasilkan. Uji kaji ke atas enam belas

(16) set data penanda aras UCI-MLR menunjukkan potensi MTAR untuk mencapai

masa komputasi yang lebih rendah dengan penambahbaikkan sehingga 90%

berbanding dengan TR, MMR, MDA dan NSS. Eksperimen juga menunjukkan objek

di dalam kluster-kluster yang dihasilkan oleh teknik MTAR mempunyai persamaan

yang jelas dan kluster-kluster yang dihasilkan juga mempunyai litupan objek yang

lebih baik dan pada masa yang sama meningkatkan kesahihan kluster sehingga 23%

berasaskan entropy berbanding dengan MDA.

Page 4: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 5: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 6: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 7: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 8: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 9: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 10: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 11: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 12: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 13: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 14: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 15: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 16: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 17: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 18: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 19: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 20: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 21: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 22: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 23: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 24: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 25: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 26: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 27: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 28: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 29: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 30: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in
Page 31: MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET … · MAXIMUM TOTAL ATTRIBUTE RELATIVE OF SOFT-SET THEORY FOR EFFICIENT CATEGORICAL DATA CLUSTERING RABIEI MAMAT A thesis submitted in

REFERENCES

Hwang, M. I. (1994). Decision making under time pressure: a model for

information systems research. Information & Management, 27(4), 197-203.

Neely, A. and Jarrar, Y. (2004), Extracting value from data - the performance

planning value chain. Business Process Management Journal, 10(5), 506-509

Lindley, D. V. (2006). Understanding Uncertainty. Wiley-Interscience, (1 lth

edition).

Zadeh, L. A. (1 965). Fuzzy sets. Information and control, 8(3), 338-353.

Pawlak, Z. (1982). Rough sets. International Journal of Computer &

Information Sciences, 11 (5) , 34 1-356.

Pawlak, Z. (1991). Rough sets-theoretical aspect of reasoning about data.

Proceedings of Kluwer Academic Publishers.

Atanassov, K. T. (1986). Intuitionistic fuzzy sets. Fuzzy sets and Systems,

20(1), 87-96.

Gorzalczany, M. B. (1987). A method of inference in approximate reasoning

based on interval-valued fuzzy sets. Fuzzy sets and systems, 21(1), 1-1 7.

Molodtsov, D. (1999). Soft set theory-first results. Computers &

Mathematics with Applications, 3 7(4), 19-3 1.

Maji, P. K., Biswas, R., & Roy, A. R. (2003). Soft set theory. Computers &

Mathematics with Applications, 45(4), 555-562.

Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern

Recognition Letters, 31 (8), 65 1-666.

Batiskakis, Y., Vazirgiannis, M., & Halkidi, M. (2001). On clustering

validation techniques. Journal of Intelligent Information System, 17(2-3):107-

145.

Han, J. & Kamber, M. (2006). Data Mining: Concept and Technique. Elsevier.