Prof Shaikh and Anis 2012 Pies PDF

8/2/2019 Prof Shaikh and Anis 2012 Pies PDF

1/4

Determine Data Mining Using Dynamic Data Base.

Mr. ABDUL JABBAR SHAIKH AZAD A, Mr. ANIS MAYODDIN KURESHI B,

Prof. DEELIP D. JOSHIC, Dr. RAMESH R. MANZA D & Dr. ASHOK NARAYAN PATIL E .

A & B: Department of Computer Science, P.S.G.V.P.s Mandals Arts, Commerce and Science CollegeShahada,C: Asst. Professor, Department of Computer Science, B.P. Arts, S.M.A Science And K.K.C CommerceCollege, Chalisgaon.D: Asst. Professor, Department of Computer Science, Dr. Babasaheb Ambedkar Marathwada University,Aurangabad.E: Principal, Vasantrao Naik College, Shahada.

[email protected],[email protected]

c.

Abstract:

In this paper, we have taken the critical review of some papers on Dynamic database. The goal isto make best data cleaning, data selection and data transformation. The totally Data Mining process is a

step in knowledge Discovery Process requires for the patterns recognition and model from the database.Some cases the problem is known, correct data is available as well and problem might occurs duplicate,missing and incorrect file.

Some researchers have used Data Mining for Dynamic Data base, such as Hebah H. O.

Nasereddin, Vijay Raghavan, Alaaeldin Hafez, Fernando Crespo, Richard Weber Yi Wang ,Shi-Xia Liu

and Jianhua Feng. They had used various methods like Fuzzy clustering, Customer segmentation, Fuzzy

clustering, Customer segmentation, data mining process in dynamic data mining process. Different

methods have been programmed and for accuracy in real data managing from the data base system. They

have used for those technique for research fields that are due to the expansion of both computer

hardware and software. Data mining is the way help organization make full use of the data stored in their

database used to different decision making this is true for all fields.

Keywords: Customer segmentation, Fuzzy clustering, Customer segmentation, Fuzzy clustering,

Knowledge Discovery Process (KDP) etc.

INTRODUCTION

Data mining is the task of discoveringinteresting and hidden patterns from large amountsof data where the data can be stored in databases,data warehouses, OLAP (On Line AnalyticalProcess) or other repository information. It is alsodefined as Knowledge Discovery in Databases(KDD). Data mining is the main process of

discovering meaningful pattern and relationshipsthat lie hidden within very large databases. Alsodefines data mining as the analysis of observationaldata sets to finds unsuspected relationship and tosummarize the data in novel ways that are bothunderstandable and useful to data owner. Datamining is a part of a process called KDD. Thisprocess consists basically of steps that areperformed before carrying out data mining, such asdata selection, data cleaning, pre-processing anddata transformation.

The major components of data mining asfollowing : Database, data warehouse or other

information repository; a server which isresponsible for the fetching the relevant data basedon the users data mining request, knowledge base

which is used to guide to search the informationfrom the database. Data mining engine consists of aset of functional modules, Pat tern evolutionmodule which interacts with the data miningmodules so as to focus the search towardsinteresting patterns and graphical user interfacewhich communicate users and the data miningsystem, allowing the user interaction with system.Data mining technique can discover informationthat many traditional business analysis andstatistical techniques fail to deliver. Management

information system should provide advancedcapabilities that give the user the power to askmore sophisticated and pertinent question. Itempowers the right people by providing thespecific information they need.

Hebah H. O. Nasereddin, Vijay Raghavanthey propose an approach that dynamically updatesknowledge obtained from the previous data miningprocess. Transactions over a long duration aredivided into a set of consecutive episodes. In ourapproach, information gained during the current

episode depends on the current set of transitionsand the discovered information from the databasethe last episode. They suggested discovering
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]


2/4

current data mining rules that have been discoveredin dynamic data mining that discover dependenciesamong values of an attribute is an importantresearch area. The problem of association miningalso referred to as the market basket problem isdefined as follows.

Let = {i1,i2, . . . , in} be a set of items and S ={s1, s2, . . ., sm} be a set of transactions, where each

transaction siS is a set of items that is si I. An

association rule denoted by XY,X,YI, andX

Y = , describes the existence of a relationshipbetween the two item sets X and Y. severalmeasures have been introduced to define thestrength of the relationship between item sets Xand Y such as SUPPORT , CONFIDENCE andINTEREST [1,2,5,7]. The definitions of themeasures, from a probabilities view point are givenbelow.

I. ),()X( YXPYSUPPORT , or the percentage oftransactions in the database that contain bothXand Y.

II. )X(P/)Y,X(P)YX(CONFIDENCE , orthe percentage of transactions containing Y inthose transactions containing X.

III. )Y)/P(X)P(YP(X,Y)INTEREST(X Represents a test of statistical independence.

SUPPORT for an item set S is calculated as

F

SFSSUPPORT

)()(

Where F(S)is the number of transactions having S,and Fis the total number of transactions.

For a minimum SUPPORT value MINSUP, Sis a large (or frequent) item set if

SUPPORT(S) MINSUP, or F(S) F*MINSUP.

Suppose we have divided the transaction set T intotwo sub sets T1 and T2, corresponding to two

consecutive time intervals, Where F1 is the numberof transactions in T1 and F2 is the number of

transactions in T2, (F=F1 + F2) and F1 is the numberof transactions having S in T1 and F2 (S) is the

number of transactions having S in T2, (F(S) =F1(S)

+F2(S)). By Calculating the SUPPORT of S, ineach of the two subsets, we get

1

1

1F

)S(F)S(SUPPORT and

2

2

2F

)S(F)S(SUPPORT

S is a large itemset if MINSUPFF

)S(F)S(F

21

21

, or

MINSUP*)FF()S(F)S(F 2121

In order to find out if S is a large itemset ornot, we consider four cases,

S is a large itemset in T1 and also a largeitemset in T2,

i.e., MINSUP*F)S(F 11 and

MINSUP*F)S(F22

.

S is a large itemset in T1 but a smallitemset in T2, i.e.,

MINSUP*F)S(F 11 and

MINSUP*F)S(F 22 .

S is a small itemset in T1 but a largeitemset in T2, i.e., MINSUP*F)S(F 11

and supmin*F)S(F 22 .

S is a small itemset in T1 and also a smallitemset in T2, i.e., MINSUP*F)S(F 11

and F2(S)< F2*MINSUP.

In the first and fourth cases, S is a large itemsetand a small itemset in transaction set T,respectively, while in the second and third cases, itis not clear to determine ifS is a small itemset or a

large itemset. Formally speaking, let SUPPORT(S)= MINSUP + , where 0 ifS is a large itemset,

and 0 if S is a small itemset. The above fourcases have the following characteristics,

10 and20 10 and20 10 and20 10 and20

S is a large itemset if

MINSUPFF

)MINSUP(*F)MINSUP(*F

21

2211

, or

FF(*MINSUP)MINSUP(*F)MINSUP(*F 212211

This can be written as 0*F*F 2211

Generally, let the transaction set Tbe divided into n

transaction subsets Ti 's, 1 i n. S is a large

itemset if 0*F ii

n

1i

, where Fi is the number

of transactions in Ti and i = SUPPORTi(S) -

MINSUP, 1 i n. -MINSUP i 1-MINSUP,1 i n.

For those cases where 0*Fii

n

1i

, there are two

options, either discard S as a large itemset (a smallitemsetwith no history record maintained), or keepit for future calculations (a small itemset withhistory record maintained). In this case, we are notgoing to report it as a large itemset, but its

ii

n

1i

*F

formula will be maintained and

checked through the future intervals. In this paper,

they have introduced a Dynamic Data Miningapproach. The proposed approach performsperiodically the data mining process on data


3/4

updates during a current episode and uses thatknowledge captured in the previous episode toproduce data mining rules.

Fernando Crespo and Richard Weber they

show dynamic data mining are increasingly

attracting attention from the respective research

community. On the other hand, user of installeddata mining system are also interested in the related

technique and will be even more since most of

these installation will need to be update in the

future. Data mining is part of an interactive process

called KDD (Knowledge Discovery in Database).

This process consists basically of steps that areperformed before doing data mining such as,

selection, pre-processing, transformation of data. If

future behaviour is very similar to past behaviour

using the initial data mining system could be

jostled. Here is where dynamic data mining comes

in, a new research area that is concerned results. It

becomes the user neglects changes in theenvironment and keeps on applying the initials

system without any updating. Every certain period

which depends on the particular application a new

system is developed using all the available data.

Based on the initial system and new data an

update of the classier is performed. It does not

require changes in subsequent processes, such as

design of marketing campaigns for customer

segments. Its disadvantages are that current

tendencies could not be detected.The recent developments of Dynamic data

mining are shows some area of data mining various

methods have been developed in order to find

usefull information in a set of data. Among the

most important ones are decision trees, neural

networks, association rules and clustering methods.For each of the data above mentioned data mining

methods tools updating have different aspects and

some updating approaches have been proposed to

do better. They used another method is that

Dynamic data mining using fuzzy clustering they

shows a methodology for dynamic data mining

using fuzzy clustering that assigns static objects to

dynamic classes. That is classes with the changing

structure over time. It starts with a given classier

and set of new objects that is that objects that

appeared after the creation of a classier and itsupdate is called a cycle. The length of such cycle

depends on the particular application we may want

to update buying behaviour of customers in a

supermarket once a year whereas a system for

dynamic machine monitoring should be updated

every 5 minutes.

Yi Wang ,Shi-Xia Liu and his friends they

shows in his paper Mining naturally smooth

evolution of clusters from Dynamic Data paper

many clustered algorithms have been proposed to

partition a set of static data points groups, they

consider an evolutionary clustering problems where

the input data points may move, disappeared and

emerge. These changes should be result in a

smooth evolution of the clusters. Mining thisnaturally smooth evolution is valuable for

providing an aggregated view of the numerous

individual behaviours. They solve this novel and

generalized from of clustering problem byconverting it into a Bayesian learning problem.

Analogous to that the EM clustering algorithmconverts the problem of clustering a static data, say

X, set into learning a Gaussian mixture model X.

By utilizing characteristics of evolutionary

clustering problems, they derive a new

unsupervised learning algorithm which is useful

most efficient than the algorithms used to learntraditional variable duration HSMMs. Because the

HSMM models the probabilistic relationship

between the dynamic data set corresponding

evolving clusters. They can interpret the learned

parameters as the evolving clusters intuitivelyusing the Viterbi filtering techniques. Because

learning as HSMM is in fact learning an optional

Viterbi filter. They evaluate the effectiveness of

this method experiments on both synthetic data and

real data.

They show in this paper coherence over 1

t T by modelling the underlying stochastic

process that generates X by a hidden semi Markov

model (HSMM). Analogous to that the EM

clustering algorithm clusters static data points by

learning a Gaussian mixture model, his method

mine the evolution of the clusters from dynamic

data points by learning a hidden semi-Markovmodel (HSMM). The model output probability

density function (pdf) on each hidden state of the

HSMM by a Gaussian mixture model, which

describes the clusters of an Xt X. By utilizing

characterizing of the evaluator clustering problem,

they derive a new unsupervised learning algorithm

which is much more useful for the describing all

the data from the dynamic databases.

CONCLUSIONIn this paper we get some critical review

on Dynamic database. Above researcher has done

data mining Database using Dynamic database.They certainly contributed a lot to the development

of data base dynamically. But there is the need to

overcome the demerits of above researches onDynamic database as per the analysis it is found

that the paper of Hebah H. O. Nasereddin, Vijay

Raghavan they propose an approach that

dynamically updates knowledge obtained from the

previous data mining process is good because of

KDD technique. But the limitation of their study isdefine more suitable useful for data base

dynamically. As future work they will tested with

the different datasets that cover large sputum of

different data mining application. Such as web siteaccess analysis for improvements in e-commerce


4/4

advertising, fraud detection, screening and

investigation, retail site product analysis and

customer segmentation. After the analysis ofFernando Crespo and Richard Weber we come to

the conclusion that presented a methodology for

dynamic data mining based on fuzzy clustering,

which allows updates of the underlying classier.We help some methods like fuzzy or possibility

clustering technique can be used as well.When weconsider the work ofYi Wang, Shi-Xia Liu and his

friends they proposed to solve a novel and

interesting clustering problem. This problem is

totally different from the dynamic clustering

problems under studying. We are trying to solve

problem by converting in to a Bayesian learningproblem. We also try to describe process

accompanying visualization methods to present the

mined smooth evolution intuitively and

comprehensively.

REFERENCES

[1] R. Agrawal, T. Imilienski, and A. Swami, "MiningAssociation Rules between Sets of Items in LargeDatabases," Proc. of the ACM SIGMOD Int'l Conf. On

Management of data, May 1993.

[2] R. Agrawal, and R. Srikant, "Fast Algorithms for MiningAssociation Rules," Proc. Of the 20 th VLDB

Conference, Santiago, Chile, 1994.

[3] R. Agrawal, J. Shafer, "Parallel Mining of AssociationRules," IEEE Transactions on Knowledge and Data

Engineering, Vol. 8, No. 6, Dec. 1996.

[4] C. Agrawal, and P. Yu, "Mining Large Itemsets forAssociation Rules," Bulletin of the IEEE Computer

Society Technical Committee on Data Engineering, 1997.

[5] S. Brin, R. Motwani, et al, "Dynamic Itemset Countingand Implication Rules for Market Basket Data,"

SIGMOD Record (SCM Special Interset Group on

Management of Data), 26,2, 1997.[6] S. Chaudhuri, "Data Mining and Database Systems:

Where is the Intersection," Bulletin of the IEEEComputer Society Technical Committee on Data

Engineering, 1997.

[7] M. Chen, J. Han, and P. Yu, "Data Mining: An Overviewfrom a Database Prospective", IEEE Trans. Knowledge

and Data Engineering, 8, 1996.

[8] M. Chen, J. Park, and P. YU, "Data Mining for PathTraversal Patterns in a Web Environment", Proc. 16th

Untl. Conf. Distributed Computing Systems, May 1996.

[9] D. Cheung, J. Han, et al, " Maintenance of DiscoveredAssociation Rules in Large Databases: An Incremental

Updating Technique", In Proc. 12th Intl. Conf. On DataEngineering, New Orleans, Louisiana, 1996.

[10] U. Fayyed, G. Shapiro, et al, "Advances in KnowledgeDiscovery and Data Mining", AAAI/MIT Press, 1996.

[11] A. Hafez, J. Deogun, and V. Raghavan ,"The Item-SetTree: A Data Structure for Data Mining", DaWaK' 99

Conference, Firenze, Italy, Aug. 1999.

[12] C. Kurzke, M. Galle, and M. Bathelt, "WebAssist: a userprofile specific information retrieval assistant," Seventh

International World Wide Web Conference, Brisbone,

Australia, April 1998.[13] M. Langheinrichl, A. Nakamura, et al ,"Un-intrusive

Customization Techniques for Web Advertising," The

Eighth International World Wide Web Conference,Toronto, Canada, May 1999

[14] H. Mannila, H. Toivonen, and A. Verkamo, "EfficientAlgorithms for Discovering Association Rules," AAAIWorkshop on Knowledge Discovery in databases (KDD-

94) , July 1994.

[15] M. Perkowitz and O. Etzioni, "Adaptive Sites:Automatically Learning from User Access Patterns", InProc. 6th Int. World Wide Web Conf., santa Clara,

California, April 1997.

[16] P. Pitkow, "In Search of Reliable Usage Data on theWWW", In Proc. 6th Int. World Wide Web Conf., santa

Clara, California, April 1997.

[17] G. Rossi, D. Schwabe, and F. Lyardet, "Improving WebInformation Systems with Navigational Patterns," TheEighth International World Wide Web Conference,

Toronto, Canada, May 1999.[18] N. Serbedzija, "The Web Supercomputing Environment,"

Seventh International World Wide Web Conference,

Brisbone, Australia, April 1998.[19] T. Sullivan, "Reading Reader Reaction: A Proposal for

Inferential Analysis of Web Server Log Files", In Proc.

3rd Conf. Human Factors & The Web, Denver, Colorado,June 1997.

[20] C. Wills, and M. Mikhailov, "Towards a BetterUnderstanding of Web Resources and Server Responses

for Improved Caching," The Eighth International World

Wide Web Conference, Toronto, Canada, May 1999.

[21] M. Zaki, S. Parthasarathy, et al, " New Algorithms forFast Discovery of Association Rules," Proc. Of the 3 rd

Int'l Conf. On Knowledge Discovery and data Mining(KDD-97), AAAI Press, 1997.

[22] C.M. Antunes, A.L. Oliveira, Temporal data mining: anoverview, Workshop on Temporal Data Mining(KDD2001), San Francisco, September 2001.

[23] M. Bastian, H. Kirschnk, R. Weber, TRIP: automatic tracstate identication and prediction as basis for improvedtrac management services, Proc. of the Second Workshop

on Information Technology, Cooperative Research

between Chile and Germany, 1517 January 2001,Berlin, Germany.

[24] J.C. Bezdek, J. Keller, R. Krishnapuram, N.R. Pal, FuzzyModels and Algorithms for Pattern Recognition andImage Processing, Kluwer, Boston, London, Dordrecht,

1999.

[25] M. Black, R.J. Hickey, Maintaining the performance of alearned classier under concept drift, Intell. Data Anal. 3(6) (1999) 453474.

[26] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classication,2nd Edition, Wiley, New York, Chichester, 2001.

Documents

Prof Shaikh and Anis 2012 Pies PDF