Prof Shaikh and Anis 2012 Pies PDF

Embed Size (px)

Citation preview

  • 8/2/2019 Prof Shaikh and Anis 2012 Pies PDF

    1/4

    Determine Data Mining Using Dynamic Data Base.

    Mr. ABDUL JABBAR SHAIKH AZAD A, Mr. ANIS MAYODDIN KURESHI B,

    Prof. DEELIP D. JOSHIC, Dr. RAMESH R. MANZA D & Dr. ASHOK NARAYAN PATIL E .

    A & B: Department of Computer Science, P.S.G.V.P.s Mandals Arts, Commerce and Science CollegeShahada,C: Asst. Professor, Department of Computer Science, B.P. Arts, S.M.A Science And K.K.C CommerceCollege, Chalisgaon.D: Asst. Professor, Department of Computer Science, Dr. Babasaheb Ambedkar Marathwada University,Aurangabad.E: Principal, Vasantrao Naik College, Shahada.

    [email protected],[email protected]

    c.

    Abstract:

    In this paper, we have taken the critical review of some papers on Dynamic database. The goal isto make best data cleaning, data selection and data transformation. The totally Data Mining process is a

    step in knowledge Discovery Process requires for the patterns recognition and model from the database.Some cases the problem is known, correct data is available as well and problem might occurs duplicate,missing and incorrect file.

    Some researchers have used Data Mining for Dynamic Data base, such as Hebah H. O.

    Nasereddin, Vijay Raghavan, Alaaeldin Hafez, Fernando Crespo, Richard Weber Yi Wang ,Shi-Xia Liu

    and Jianhua Feng. They had used various methods like Fuzzy clustering, Customer segmentation, Fuzzy

    clustering, Customer segmentation, data mining process in dynamic data mining process. Different

    methods have been programmed and for accuracy in real data managing from the data base system. They

    have used for those technique for research fields that are due to the expansion of both computer

    hardware and software. Data mining is the way help organization make full use of the data stored in their

    database used to different decision making this is true for all fields.

    Keywords: Customer segmentation, Fuzzy clustering, Customer segmentation, Fuzzy clustering,

    Knowledge Discovery Process (KDP) etc.

    INTRODUCTION

    Data mining is the task of discoveringinteresting and hidden patterns from large amountsof data where the data can be stored in databases,data warehouses, OLAP (On Line AnalyticalProcess) or other repository information. It is alsodefined as Knowledge Discovery in Databases(KDD). Data mining is the main process of

    discovering meaningful pattern and relationshipsthat lie hidden within very large databases. Alsodefines data mining as the analysis of observationaldata sets to finds unsuspected relationship and tosummarize the data in novel ways that are bothunderstandable and useful to data owner. Datamining is a part of a process called KDD. Thisprocess consists basically of steps that areperformed before carrying out data mining, such asdata selection, data cleaning, pre-processing anddata transformation.

    The major components of data mining asfollowing : Database, data warehouse or other

    information repository; a server which isresponsible for the fetching the relevant data basedon the users data mining request, knowledge base

    which is used to guide to search the informationfrom the database. Data mining engine consists of aset of functional modules, Pat tern evolutionmodule which interacts with the data miningmodules so as to focus the search towardsinteresting patterns and graphical user interfacewhich communicate users and the data miningsystem, allowing the user interaction with system.Data mining technique can discover informationthat many traditional business analysis andstatistical techniques fail to deliver. Management

    information system should provide advancedcapabilities that give the user the power to askmore sophisticated and pertinent question. Itempowers the right people by providing thespecific information they need.

    Hebah H. O. Nasereddin, Vijay Raghavanthey propose an approach that dynamically updatesknowledge obtained from the previous data miningprocess. Transactions over a long duration aredivided into a set of consecutive episodes. In ourapproach, information gained during the current

    episode depends on the current set of transitionsand the discovered information from the databasethe last episode. They suggested discovering

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/2/2019 Prof Shaikh and Anis 2012 Pies PDF

    2/4

    current data mining rules that have been discoveredin dynamic data mining that discover dependenciesamong values of an attribute is an importantresearch area. The problem of association miningalso referred to as the market basket problem isdefined as follows.

    Let = {i1,i2, . . . , in} be a set of items and S ={s1, s2, . . ., sm} be a set of transactions, where each

    transaction siS is a set of items that is si I. An

    association rule denoted by XY,X,YI, andX

    Y = , describes the existence of a relationshipbetween the two item sets X and Y. severalmeasures have been introduced to define thestrength of the relationship between item sets Xand Y such as SUPPORT , CONFIDENCE andINTEREST [1,2,5,7]. The definitions of themeasures, from a probabilities view point are givenbelow.

    I. ),()X( YXPYSUPPORT , or the percentage oftransactions in the database that contain bothXand Y.

    II. )X(P/)Y,X(P)YX(CONFIDENCE , orthe percentage of transactions containing Y inthose transactions containing X.

    III. )Y)/P(X)P(YP(X,Y)INTEREST(X Represents a test of statistical independence.

    SUPPORT for an item set S is calculated as

    F

    SFSSUPPORT

    )()(

    Where F(S)is the number of transactions having S,and Fis the total number of transactions.

    For a minimum SUPPORT value MINSUP, Sis a large (or frequent) item set if

    SUPPORT(S) MINSUP, or F(S) F*MINSUP.

    Suppose we have divided the transaction set T intotwo sub sets T1 and T2, corresponding to two

    consecutive time intervals, Where F1 is the numberof transactions in T1 and F2 is the number of

    transactions in T2, (F=F1 + F2) and F1 is the numberof transactions having S in T1 and F2 (S) is the

    number of transactions having S in T2, (F(S) =F1(S)

    +F2(S)). By Calculating the SUPPORT of S, ineach of the two subsets, we get

    1

    1

    1F

    )S(F)S(SUPPORT and

    2

    2

    2F

    )S(F)S(SUPPORT

    S is a large itemset if MINSUPFF

    )S(F)S(F

    21

    21

    , or

    MINSUP*)FF()S(F)S(F 2121

    In order to find out if S is a large itemset ornot, we consider four cases,

    S is a large itemset in T1 and also a largeitemset in T2,

    i.e., MINSUP*F)S(F 11 and

    MINSUP*F)S(F22

    .

    S is a large itemset in T1 but a smallitemset in T2, i.e.,

    MINSUP*F)S(F 11 and

    MINSUP*F)S(F 22 .

    S is a small itemset in T1 but a largeitemset in T2, i.e., MINSUP*F)S(F 11

    and supmin*F)S(F 22 .

    S is a small itemset in T1 and also a smallitemset in T2, i.e., MINSUP*F)S(F 11

    and F2(S)< F2*MINSUP.

    In the first and fourth cases, S is a large itemsetand a small itemset in transaction set T,respectively, while in the second and third cases, itis not clear to determine ifS is a small itemset or a

    large itemset. Formally speaking, let SUPPORT(S)= MINSUP + , where 0 ifS is a large itemset,

    and 0 if S is a small itemset. The above fourcases have the following characteristics,

    10 and20 10 and20 10 and20 10 and20

    S is a large itemset if

    MINSUPFF

    )MINSUP(*F)MINSUP(*F

    21

    2211

    , or

    FF(*MINSUP)MINSUP(*F)MINSUP(*F 212211

    This can be written as 0*F*F 2211

    Generally, let the transaction set Tbe divided into n

    transaction subsets Ti 's, 1 i n. S is a large

    itemset if 0*F ii

    n

    1i

    , where Fi is the number

    of transactions in Ti and i = SUPPORTi(S) -

    MINSUP, 1 i n. -MINSUP i 1-MINSUP,1 i n.

    For those cases where 0*Fii

    n

    1i

    , there are two

    options, either discard S as a large itemset (a smallitemsetwith no history record maintained), or keepit for future calculations (a small itemset withhistory record maintained). In this case, we are notgoing to report it as a large itemset, but its

    ii

    n

    1i

    *F

    formula will be maintained and

    checked through the future intervals. In this paper,

    they have introduced a Dynamic Data Miningapproach. The proposed approach performsperiodically the data mining process on data

  • 8/2/2019 Prof Shaikh and Anis 2012 Pies PDF

    3/4

    updates during a current episode and uses thatknowledge captured in the previous episode toproduce data mining rules.

    Fernando Crespo and Richard Weber they

    show dynamic data mining are increasingly

    attracting attention from the respective research

    community. On the other hand, user of installeddata mining system are also interested in the related

    technique and will be even more since most of

    these installation will need to be update in the

    future. Data mining is part of an interactive process

    called KDD (Knowledge Discovery in Database).

    This process consists basically of steps that areperformed before doing data mining such as,

    selection, pre-processing, transformation of data. If

    future behaviour is very similar to past behaviour

    using the initial data mining system could be

    jostled. Here is where dynamic data mining comes

    in, a new research area that is concerned results. It

    becomes the user neglects changes in theenvironment and keeps on applying the initials

    system without any updating. Every certain period

    which depends on the particular application a new

    system is developed using all the available data.

    Based on the initial system and new data an

    update of the classier is performed. It does not

    require changes in subsequent processes, such as

    design of marketing campaigns for customer

    segments. Its disadvantages are that current

    tendencies could not be detected.The recent developments of Dynamic data

    mining are shows some area of data mining various

    methods have been developed in order to find

    usefull information in a set of data. Among the

    most important ones are decision trees, neural

    networks, association rules and clustering methods.For each of the data above mentioned data mining

    methods tools updating have different aspects and

    some updating approaches have been proposed to

    do better. They used another method is that

    Dynamic data mining using fuzzy clustering they

    shows a methodology for dynamic data mining

    using fuzzy clustering that assigns static objects to

    dynamic classes. That is classes with the changing

    structure over time. It starts with a given classier

    and set of new objects that is that objects that

    appeared after the creation of a classier and itsupdate is called a cycle. The length of such cycle

    depends on the particular application we may want

    to update buying behaviour of customers in a

    supermarket once a year whereas a system for

    dynamic machine monitoring should be updated

    every 5 minutes.

    Yi Wang ,Shi-Xia Liu and his friends they

    shows in his paper Mining naturally smooth

    evolution of clusters from Dynamic Data paper

    many clustered algorithms have been proposed to

    partition a set of static data points groups, they

    consider an evolutionary clustering problems where

    the input data points may move, disappeared and

    emerge. These changes should be result in a

    smooth evolution of the clusters. Mining thisnaturally smooth evolution is valuable for

    providing an aggregated view of the numerous

    individual behaviours. They solve this novel and

    generalized from of clustering problem byconverting it into a Bayesian learning problem.

    Analogous to that the EM clustering algorithmconverts the problem of clustering a static data, say

    X, set into learning a Gaussian mixture model X.

    By utilizing characteristics of evolutionary

    clustering problems, they derive a new

    unsupervised learning algorithm which is useful

    most efficient than the algorithms used to learntraditional variable duration HSMMs. Because the

    HSMM models the probabilistic relationship

    between the dynamic data set corresponding

    evolving clusters. They can interpret the learned

    parameters as the evolving clusters intuitivelyusing the Viterbi filtering techniques. Because

    learning as HSMM is in fact learning an optional

    Viterbi filter. They evaluate the effectiveness of

    this method experiments on both synthetic data and

    real data.

    They show in this paper coherence over 1

    t T by modelling the underlying stochastic

    process that generates X by a hidden semi Markov

    model (HSMM). Analogous to that the EM

    clustering algorithm clusters static data points by

    learning a Gaussian mixture model, his method

    mine the evolution of the clusters from dynamic

    data points by learning a hidden semi-Markovmodel (HSMM). The model output probability

    density function (pdf) on each hidden state of the

    HSMM by a Gaussian mixture model, which

    describes the clusters of an Xt X. By utilizing

    characterizing of the evaluator clustering problem,

    they derive a new unsupervised learning algorithm

    which is much more useful for the describing all

    the data from the dynamic databases.

    CONCLUSIONIn this paper we get some critical review

    on Dynamic database. Above researcher has done

    data mining Database using Dynamic database.They certainly contributed a lot to the development

    of data base dynamically. But there is the need to

    overcome the demerits of above researches onDynamic database as per the analysis it is found

    that the paper of Hebah H. O. Nasereddin, Vijay

    Raghavan they propose an approach that

    dynamically updates knowledge obtained from the

    previous data mining process is good because of

    KDD technique. But the limitation of their study isdefine more suitable useful for data base

    dynamically. As future work they will tested with

    the different datasets that cover large sputum of

    different data mining application. Such as web siteaccess analysis for improvements in e-commerce

  • 8/2/2019 Prof Shaikh and Anis 2012 Pies PDF

    4/4

    advertising, fraud detection, screening and

    investigation, retail site product analysis and

    customer segmentation. After the analysis ofFernando Crespo and Richard Weber we come to

    the conclusion that presented a methodology for

    dynamic data mining based on fuzzy clustering,

    which allows updates of the underlying classier.We help some methods like fuzzy or possibility

    clustering technique can be used as well.When weconsider the work ofYi Wang, Shi-Xia Liu and his

    friends they proposed to solve a novel and

    interesting clustering problem. This problem is

    totally different from the dynamic clustering

    problems under studying. We are trying to solve

    problem by converting in to a Bayesian learningproblem. We also try to describe process

    accompanying visualization methods to present the

    mined smooth evolution intuitively and

    comprehensively.

    REFERENCES

    [1] R. Agrawal, T. Imilienski, and A. Swami, "MiningAssociation Rules between Sets of Items in LargeDatabases," Proc. of the ACM SIGMOD Int'l Conf. On

    Management of data, May 1993.

    [2] R. Agrawal, and R. Srikant, "Fast Algorithms for MiningAssociation Rules," Proc. Of the 20 th VLDB

    Conference, Santiago, Chile, 1994.

    [3] R. Agrawal, J. Shafer, "Parallel Mining of AssociationRules," IEEE Transactions on Knowledge and Data

    Engineering, Vol. 8, No. 6, Dec. 1996.

    [4] C. Agrawal, and P. Yu, "Mining Large Itemsets forAssociation Rules," Bulletin of the IEEE Computer

    Society Technical Committee on Data Engineering, 1997.

    [5] S. Brin, R. Motwani, et al, "Dynamic Itemset Countingand Implication Rules for Market Basket Data,"

    SIGMOD Record (SCM Special Interset Group on

    Management of Data), 26,2, 1997.[6] S. Chaudhuri, "Data Mining and Database Systems:

    Where is the Intersection," Bulletin of the IEEEComputer Society Technical Committee on Data

    Engineering, 1997.

    [7] M. Chen, J. Han, and P. Yu, "Data Mining: An Overviewfrom a Database Prospective", IEEE Trans. Knowledge

    and Data Engineering, 8, 1996.

    [8] M. Chen, J. Park, and P. YU, "Data Mining for PathTraversal Patterns in a Web Environment", Proc. 16th

    Untl. Conf. Distributed Computing Systems, May 1996.

    [9] D. Cheung, J. Han, et al, " Maintenance of DiscoveredAssociation Rules in Large Databases: An Incremental

    Updating Technique", In Proc. 12th Intl. Conf. On DataEngineering, New Orleans, Louisiana, 1996.

    [10] U. Fayyed, G. Shapiro, et al, "Advances in KnowledgeDiscovery and Data Mining", AAAI/MIT Press, 1996.

    [11] A. Hafez, J. Deogun, and V. Raghavan ,"The Item-SetTree: A Data Structure for Data Mining", DaWaK' 99

    Conference, Firenze, Italy, Aug. 1999.

    [12] C. Kurzke, M. Galle, and M. Bathelt, "WebAssist: a userprofile specific information retrieval assistant," Seventh

    International World Wide Web Conference, Brisbone,

    Australia, April 1998.[13] M. Langheinrichl, A. Nakamura, et al ,"Un-intrusive

    Customization Techniques for Web Advertising," The

    Eighth International World Wide Web Conference,Toronto, Canada, May 1999

    [14] H. Mannila, H. Toivonen, and A. Verkamo, "EfficientAlgorithms for Discovering Association Rules," AAAIWorkshop on Knowledge Discovery in databases (KDD-

    94) , July 1994.

    [15] M. Perkowitz and O. Etzioni, "Adaptive Sites:Automatically Learning from User Access Patterns", InProc. 6th Int. World Wide Web Conf., santa Clara,

    California, April 1997.

    [16] P. Pitkow, "In Search of Reliable Usage Data on theWWW", In Proc. 6th Int. World Wide Web Conf., santa

    Clara, California, April 1997.

    [17] G. Rossi, D. Schwabe, and F. Lyardet, "Improving WebInformation Systems with Navigational Patterns," TheEighth International World Wide Web Conference,

    Toronto, Canada, May 1999.[18] N. Serbedzija, "The Web Supercomputing Environment,"

    Seventh International World Wide Web Conference,

    Brisbone, Australia, April 1998.[19] T. Sullivan, "Reading Reader Reaction: A Proposal for

    Inferential Analysis of Web Server Log Files", In Proc.

    3rd Conf. Human Factors & The Web, Denver, Colorado,June 1997.

    [20] C. Wills, and M. Mikhailov, "Towards a BetterUnderstanding of Web Resources and Server Responses

    for Improved Caching," The Eighth International World

    Wide Web Conference, Toronto, Canada, May 1999.

    [21] M. Zaki, S. Parthasarathy, et al, " New Algorithms forFast Discovery of Association Rules," Proc. Of the 3 rd

    Int'l Conf. On Knowledge Discovery and data Mining(KDD-97), AAAI Press, 1997.

    [22] C.M. Antunes, A.L. Oliveira, Temporal data mining: anoverview, Workshop on Temporal Data Mining(KDD2001), San Francisco, September 2001.

    [23] M. Bastian, H. Kirschnk, R. Weber, TRIP: automatic tracstate identication and prediction as basis for improvedtrac management services, Proc. of the Second Workshop

    on Information Technology, Cooperative Research

    between Chile and Germany, 1517 January 2001,Berlin, Germany.

    [24] J.C. Bezdek, J. Keller, R. Krishnapuram, N.R. Pal, FuzzyModels and Algorithms for Pattern Recognition andImage Processing, Kluwer, Boston, London, Dordrecht,

    1999.

    [25] M. Black, R.J. Hickey, Maintaining the performance of alearned classier under concept drift, Intell. Data Anal. 3(6) (1999) 453474.

    [26] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classication,2nd Edition, Wiley, New York, Chichester, 2001.