Upload
abdul-jabbar-shaikh
View
219
Download
0
Embed Size (px)
Citation preview
8/2/2019 Prof Shaikh and Anis 2012 Pies PDF
1/4
Determine Data Mining Using Dynamic Data Base.
Mr. ABDUL JABBAR SHAIKH AZAD A, Mr. ANIS MAYODDIN KURESHI B,
Prof. DEELIP D. JOSHIC, Dr. RAMESH R. MANZA D & Dr. ASHOK NARAYAN PATIL E .
A & B: Department of Computer Science, P.S.G.V.P.s Mandals Arts, Commerce and Science CollegeShahada,C: Asst. Professor, Department of Computer Science, B.P. Arts, S.M.A Science And K.K.C CommerceCollege, Chalisgaon.D: Asst. Professor, Department of Computer Science, Dr. Babasaheb Ambedkar Marathwada University,Aurangabad.E: Principal, Vasantrao Naik College, Shahada.
[email protected],[email protected]
c.
Abstract:
In this paper, we have taken the critical review of some papers on Dynamic database. The goal isto make best data cleaning, data selection and data transformation. The totally Data Mining process is a
step in knowledge Discovery Process requires for the patterns recognition and model from the database.Some cases the problem is known, correct data is available as well and problem might occurs duplicate,missing and incorrect file.
Some researchers have used Data Mining for Dynamic Data base, such as Hebah H. O.
Nasereddin, Vijay Raghavan, Alaaeldin Hafez, Fernando Crespo, Richard Weber Yi Wang ,Shi-Xia Liu
and Jianhua Feng. They had used various methods like Fuzzy clustering, Customer segmentation, Fuzzy
clustering, Customer segmentation, data mining process in dynamic data mining process. Different
methods have been programmed and for accuracy in real data managing from the data base system. They
have used for those technique for research fields that are due to the expansion of both computer
hardware and software. Data mining is the way help organization make full use of the data stored in their
database used to different decision making this is true for all fields.
Keywords: Customer segmentation, Fuzzy clustering, Customer segmentation, Fuzzy clustering,
Knowledge Discovery Process (KDP) etc.
INTRODUCTION
Data mining is the task of discoveringinteresting and hidden patterns from large amountsof data where the data can be stored in databases,data warehouses, OLAP (On Line AnalyticalProcess) or other repository information. It is alsodefined as Knowledge Discovery in Databases(KDD). Data mining is the main process of
discovering meaningful pattern and relationshipsthat lie hidden within very large databases. Alsodefines data mining as the analysis of observationaldata sets to finds unsuspected relationship and tosummarize the data in novel ways that are bothunderstandable and useful to data owner. Datamining is a part of a process called KDD. Thisprocess consists basically of steps that areperformed before carrying out data mining, such asdata selection, data cleaning, pre-processing anddata transformation.
The major components of data mining asfollowing : Database, data warehouse or other
information repository; a server which isresponsible for the fetching the relevant data basedon the users data mining request, knowledge base
which is used to guide to search the informationfrom the database. Data mining engine consists of aset of functional modules, Pat tern evolutionmodule which interacts with the data miningmodules so as to focus the search towardsinteresting patterns and graphical user interfacewhich communicate users and the data miningsystem, allowing the user interaction with system.Data mining technique can discover informationthat many traditional business analysis andstatistical techniques fail to deliver. Management
information system should provide advancedcapabilities that give the user the power to askmore sophisticated and pertinent question. Itempowers the right people by providing thespecific information they need.
Hebah H. O. Nasereddin, Vijay Raghavanthey propose an approach that dynamically updatesknowledge obtained from the previous data miningprocess. Transactions over a long duration aredivided into a set of consecutive episodes. In ourapproach, information gained during the current
episode depends on the current set of transitionsand the discovered information from the databasethe last episode. They suggested discovering
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]8/2/2019 Prof Shaikh and Anis 2012 Pies PDF
2/4
current data mining rules that have been discoveredin dynamic data mining that discover dependenciesamong values of an attribute is an importantresearch area. The problem of association miningalso referred to as the market basket problem isdefined as follows.
Let = {i1,i2, . . . , in} be a set of items and S ={s1, s2, . . ., sm} be a set of transactions, where each
transaction siS is a set of items that is si I. An
association rule denoted by XY,X,YI, andX
Y = , describes the existence of a relationshipbetween the two item sets X and Y. severalmeasures have been introduced to define thestrength of the relationship between item sets Xand Y such as SUPPORT , CONFIDENCE andINTEREST [1,2,5,7]. The definitions of themeasures, from a probabilities view point are givenbelow.
I. ),()X( YXPYSUPPORT , or the percentage oftransactions in the database that contain bothXand Y.
II. )X(P/)Y,X(P)YX(CONFIDENCE , orthe percentage of transactions containing Y inthose transactions containing X.
III. )Y)/P(X)P(YP(X,Y)INTEREST(X Represents a test of statistical independence.
SUPPORT for an item set S is calculated as
F
SFSSUPPORT
)()(
Where F(S)is the number of transactions having S,and Fis the total number of transactions.
For a minimum SUPPORT value MINSUP, Sis a large (or frequent) item set if
SUPPORT(S) MINSUP, or F(S) F*MINSUP.
Suppose we have divided the transaction set T intotwo sub sets T1 and T2, corresponding to two
consecutive time intervals, Where F1 is the numberof transactions in T1 and F2 is the number of
transactions in T2, (F=F1 + F2) and F1 is the numberof transactions having S in T1 and F2 (S) is the
number of transactions having S in T2, (F(S) =F1(S)
+F2(S)). By Calculating the SUPPORT of S, ineach of the two subsets, we get
1
1
1F
)S(F)S(SUPPORT and
2
2
2F
)S(F)S(SUPPORT
S is a large itemset if MINSUPFF
)S(F)S(F
21
21
, or
MINSUP*)FF()S(F)S(F 2121
In order to find out if S is a large itemset ornot, we consider four cases,
S is a large itemset in T1 and also a largeitemset in T2,
i.e., MINSUP*F)S(F 11 and
MINSUP*F)S(F22
.
S is a large itemset in T1 but a smallitemset in T2, i.e.,
MINSUP*F)S(F 11 and
MINSUP*F)S(F 22 .
S is a small itemset in T1 but a largeitemset in T2, i.e., MINSUP*F)S(F 11
and supmin*F)S(F 22 .
S is a small itemset in T1 and also a smallitemset in T2, i.e., MINSUP*F)S(F 11
and F2(S)< F2*MINSUP.
In the first and fourth cases, S is a large itemsetand a small itemset in transaction set T,respectively, while in the second and third cases, itis not clear to determine ifS is a small itemset or a
large itemset. Formally speaking, let SUPPORT(S)= MINSUP + , where 0 ifS is a large itemset,
and 0 if S is a small itemset. The above fourcases have the following characteristics,
10 and20 10 and20 10 and20 10 and20
S is a large itemset if
MINSUPFF
)MINSUP(*F)MINSUP(*F
21
2211
, or
FF(*MINSUP)MINSUP(*F)MINSUP(*F 212211
This can be written as 0*F*F 2211
Generally, let the transaction set Tbe divided into n
transaction subsets Ti 's, 1 i n. S is a large
itemset if 0*F ii
n
1i
, where Fi is the number
of transactions in Ti and i = SUPPORTi(S) -
MINSUP, 1 i n. -MINSUP i 1-MINSUP,1 i n.
For those cases where 0*Fii
n
1i
, there are two
options, either discard S as a large itemset (a smallitemsetwith no history record maintained), or keepit for future calculations (a small itemset withhistory record maintained). In this case, we are notgoing to report it as a large itemset, but its
ii
n
1i
*F
formula will be maintained and
checked through the future intervals. In this paper,
they have introduced a Dynamic Data Miningapproach. The proposed approach performsperiodically the data mining process on data
8/2/2019 Prof Shaikh and Anis 2012 Pies PDF
3/4
updates during a current episode and uses thatknowledge captured in the previous episode toproduce data mining rules.
Fernando Crespo and Richard Weber they
show dynamic data mining are increasingly
attracting attention from the respective research
community. On the other hand, user of installeddata mining system are also interested in the related
technique and will be even more since most of
these installation will need to be update in the
future. Data mining is part of an interactive process
called KDD (Knowledge Discovery in Database).
This process consists basically of steps that areperformed before doing data mining such as,
selection, pre-processing, transformation of data. If
future behaviour is very similar to past behaviour
using the initial data mining system could be
jostled. Here is where dynamic data mining comes
in, a new research area that is concerned results. It
becomes the user neglects changes in theenvironment and keeps on applying the initials
system without any updating. Every certain period
which depends on the particular application a new
system is developed using all the available data.
Based on the initial system and new data an
update of the classier is performed. It does not
require changes in subsequent processes, such as
design of marketing campaigns for customer
segments. Its disadvantages are that current
tendencies could not be detected.The recent developments of Dynamic data
mining are shows some area of data mining various
methods have been developed in order to find
usefull information in a set of data. Among the
most important ones are decision trees, neural
networks, association rules and clustering methods.For each of the data above mentioned data mining
methods tools updating have different aspects and
some updating approaches have been proposed to
do better. They used another method is that
Dynamic data mining using fuzzy clustering they
shows a methodology for dynamic data mining
using fuzzy clustering that assigns static objects to
dynamic classes. That is classes with the changing
structure over time. It starts with a given classier
and set of new objects that is that objects that
appeared after the creation of a classier and itsupdate is called a cycle. The length of such cycle
depends on the particular application we may want
to update buying behaviour of customers in a
supermarket once a year whereas a system for
dynamic machine monitoring should be updated
every 5 minutes.
Yi Wang ,Shi-Xia Liu and his friends they
shows in his paper Mining naturally smooth
evolution of clusters from Dynamic Data paper
many clustered algorithms have been proposed to
partition a set of static data points groups, they
consider an evolutionary clustering problems where
the input data points may move, disappeared and
emerge. These changes should be result in a
smooth evolution of the clusters. Mining thisnaturally smooth evolution is valuable for
providing an aggregated view of the numerous
individual behaviours. They solve this novel and
generalized from of clustering problem byconverting it into a Bayesian learning problem.
Analogous to that the EM clustering algorithmconverts the problem of clustering a static data, say
X, set into learning a Gaussian mixture model X.
By utilizing characteristics of evolutionary
clustering problems, they derive a new
unsupervised learning algorithm which is useful
most efficient than the algorithms used to learntraditional variable duration HSMMs. Because the
HSMM models the probabilistic relationship
between the dynamic data set corresponding
evolving clusters. They can interpret the learned
parameters as the evolving clusters intuitivelyusing the Viterbi filtering techniques. Because
learning as HSMM is in fact learning an optional
Viterbi filter. They evaluate the effectiveness of
this method experiments on both synthetic data and
real data.
They show in this paper coherence over 1
t T by modelling the underlying stochastic
process that generates X by a hidden semi Markov
model (HSMM). Analogous to that the EM
clustering algorithm clusters static data points by
learning a Gaussian mixture model, his method
mine the evolution of the clusters from dynamic
data points by learning a hidden semi-Markovmodel (HSMM). The model output probability
density function (pdf) on each hidden state of the
HSMM by a Gaussian mixture model, which
describes the clusters of an Xt X. By utilizing
characterizing of the evaluator clustering problem,
they derive a new unsupervised learning algorithm
which is much more useful for the describing all
the data from the dynamic databases.
CONCLUSIONIn this paper we get some critical review
on Dynamic database. Above researcher has done
data mining Database using Dynamic database.They certainly contributed a lot to the development
of data base dynamically. But there is the need to
overcome the demerits of above researches onDynamic database as per the analysis it is found
that the paper of Hebah H. O. Nasereddin, Vijay
Raghavan they propose an approach that
dynamically updates knowledge obtained from the
previous data mining process is good because of
KDD technique. But the limitation of their study isdefine more suitable useful for data base
dynamically. As future work they will tested with
the different datasets that cover large sputum of
different data mining application. Such as web siteaccess analysis for improvements in e-commerce
8/2/2019 Prof Shaikh and Anis 2012 Pies PDF
4/4
advertising, fraud detection, screening and
investigation, retail site product analysis and
customer segmentation. After the analysis ofFernando Crespo and Richard Weber we come to
the conclusion that presented a methodology for
dynamic data mining based on fuzzy clustering,
which allows updates of the underlying classier.We help some methods like fuzzy or possibility
clustering technique can be used as well.When weconsider the work ofYi Wang, Shi-Xia Liu and his
friends they proposed to solve a novel and
interesting clustering problem. This problem is
totally different from the dynamic clustering
problems under studying. We are trying to solve
problem by converting in to a Bayesian learningproblem. We also try to describe process
accompanying visualization methods to present the
mined smooth evolution intuitively and
comprehensively.
REFERENCES
[1] R. Agrawal, T. Imilienski, and A. Swami, "MiningAssociation Rules between Sets of Items in LargeDatabases," Proc. of the ACM SIGMOD Int'l Conf. On
Management of data, May 1993.
[2] R. Agrawal, and R. Srikant, "Fast Algorithms for MiningAssociation Rules," Proc. Of the 20 th VLDB
Conference, Santiago, Chile, 1994.
[3] R. Agrawal, J. Shafer, "Parallel Mining of AssociationRules," IEEE Transactions on Knowledge and Data
Engineering, Vol. 8, No. 6, Dec. 1996.
[4] C. Agrawal, and P. Yu, "Mining Large Itemsets forAssociation Rules," Bulletin of the IEEE Computer
Society Technical Committee on Data Engineering, 1997.
[5] S. Brin, R. Motwani, et al, "Dynamic Itemset Countingand Implication Rules for Market Basket Data,"
SIGMOD Record (SCM Special Interset Group on
Management of Data), 26,2, 1997.[6] S. Chaudhuri, "Data Mining and Database Systems:
Where is the Intersection," Bulletin of the IEEEComputer Society Technical Committee on Data
Engineering, 1997.
[7] M. Chen, J. Han, and P. Yu, "Data Mining: An Overviewfrom a Database Prospective", IEEE Trans. Knowledge
and Data Engineering, 8, 1996.
[8] M. Chen, J. Park, and P. YU, "Data Mining for PathTraversal Patterns in a Web Environment", Proc. 16th
Untl. Conf. Distributed Computing Systems, May 1996.
[9] D. Cheung, J. Han, et al, " Maintenance of DiscoveredAssociation Rules in Large Databases: An Incremental
Updating Technique", In Proc. 12th Intl. Conf. On DataEngineering, New Orleans, Louisiana, 1996.
[10] U. Fayyed, G. Shapiro, et al, "Advances in KnowledgeDiscovery and Data Mining", AAAI/MIT Press, 1996.
[11] A. Hafez, J. Deogun, and V. Raghavan ,"The Item-SetTree: A Data Structure for Data Mining", DaWaK' 99
Conference, Firenze, Italy, Aug. 1999.
[12] C. Kurzke, M. Galle, and M. Bathelt, "WebAssist: a userprofile specific information retrieval assistant," Seventh
International World Wide Web Conference, Brisbone,
Australia, April 1998.[13] M. Langheinrichl, A. Nakamura, et al ,"Un-intrusive
Customization Techniques for Web Advertising," The
Eighth International World Wide Web Conference,Toronto, Canada, May 1999
[14] H. Mannila, H. Toivonen, and A. Verkamo, "EfficientAlgorithms for Discovering Association Rules," AAAIWorkshop on Knowledge Discovery in databases (KDD-
94) , July 1994.
[15] M. Perkowitz and O. Etzioni, "Adaptive Sites:Automatically Learning from User Access Patterns", InProc. 6th Int. World Wide Web Conf., santa Clara,
California, April 1997.
[16] P. Pitkow, "In Search of Reliable Usage Data on theWWW", In Proc. 6th Int. World Wide Web Conf., santa
Clara, California, April 1997.
[17] G. Rossi, D. Schwabe, and F. Lyardet, "Improving WebInformation Systems with Navigational Patterns," TheEighth International World Wide Web Conference,
Toronto, Canada, May 1999.[18] N. Serbedzija, "The Web Supercomputing Environment,"
Seventh International World Wide Web Conference,
Brisbone, Australia, April 1998.[19] T. Sullivan, "Reading Reader Reaction: A Proposal for
Inferential Analysis of Web Server Log Files", In Proc.
3rd Conf. Human Factors & The Web, Denver, Colorado,June 1997.
[20] C. Wills, and M. Mikhailov, "Towards a BetterUnderstanding of Web Resources and Server Responses
for Improved Caching," The Eighth International World
Wide Web Conference, Toronto, Canada, May 1999.
[21] M. Zaki, S. Parthasarathy, et al, " New Algorithms forFast Discovery of Association Rules," Proc. Of the 3 rd
Int'l Conf. On Knowledge Discovery and data Mining(KDD-97), AAAI Press, 1997.
[22] C.M. Antunes, A.L. Oliveira, Temporal data mining: anoverview, Workshop on Temporal Data Mining(KDD2001), San Francisco, September 2001.
[23] M. Bastian, H. Kirschnk, R. Weber, TRIP: automatic tracstate identication and prediction as basis for improvedtrac management services, Proc. of the Second Workshop
on Information Technology, Cooperative Research
between Chile and Germany, 1517 January 2001,Berlin, Germany.
[24] J.C. Bezdek, J. Keller, R. Krishnapuram, N.R. Pal, FuzzyModels and Algorithms for Pattern Recognition andImage Processing, Kluwer, Boston, London, Dordrecht,
1999.
[25] M. Black, R.J. Hickey, Maintaining the performance of alearned classier under concept drift, Intell. Data Anal. 3(6) (1999) 453474.
[26] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classication,2nd Edition, Wiley, New York, Chichester, 2001.