8
Effective Pattern Discovery For Text Mining Guide : Mr. S.Rajesh Presenting By: A.Priayanka (10UC1A0557) K.Akshitha (10UC1A0524) R.Deexith (10UC1A0543) A.Sai Prasad (10UC1A0502)

Effective pattern discovery for text mining

Embed Size (px)

DESCRIPTION

this is the part of effective pattern discovery for text mining till system specification this PPT is prepared by Deexith and presented by me and my team mates

Citation preview

Page 1: Effective pattern discovery for text mining

Effective Pattern Discovery For Text Mining

Guide : Mr. S.Rajesh

Presenting By:A.Priayanka (10UC1A0557) K.Akshitha (10UC1A0524)R.Deexith (10UC1A0543)A.Sai Prasad (10UC1A0502)

Page 2: Effective pattern discovery for text mining

Abstract

• Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy.

• Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based Ones, but many experiments do not support this hypothesis.

• This presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance.

Page 3: Effective pattern discovery for text mining

EXISTING SYSTEM

Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis.

Page 4: Effective pattern discovery for text mining

Problems in Existing System

1. Phrases have inferior statistical properties to terms, 2. They have low frequency of occurrence, and 3. There are large numbers of redundant and noisy phrases among them

Page 5: Effective pattern discovery for text mining

PROPOSED SYSTEM

• We provide an effective pattern discovery technique, which first calculates discovered specificities of patterns and then evaluates term weights according to the distribution of terms in the discovered patterns rather than the distribution in documents for solving the misinterpretation problem.

• The proposed approach can improve the accuracy of evaluating term weights because discovered patterns are more specific than whole documents.

• We also conduct numerous experiments on the latest data collection, Reuters Corpus Volume 1 (RCV1) and Text Retrieval Conference (TREC) filtering topics, to evaluate the proposed technique.

Page 6: Effective pattern discovery for text mining

System Specifications

Hardware Requirements: System : Pentium IV 2.4 GHz.Hard Disk : 40 GB.Floppy Drive : 1.44 Mb.Monitor : 14’ Colour Monitor.Mouse : Optical Mouse.RAM : 512 Mb.Keyboard : 101 Keyboard. Software Requirements: Operating system : Windows XP and IISCoding Language : ASP .NET with C# SP1Data Base : SQL Server 2005. 

Page 7: Effective pattern discovery for text mining

Conclusion

Many data mining techniques have been proposed in the last decade. These techniques include association rule mining, frequent item set mining, sequential pattern mining, maximum pattern mining, and closed pattern mining. However, using these discovered knowledge (or patterns) in the field of text mining is difficult and ineffective. The reason is that some useful long patterns with high specificity lack in support (i.e., the low-frequency problem). We argue that not all frequent short patterns are useful. Hence, misinterpretations of patterns derived from data mining techniques lead to the ineffective performance. In this research work, an effective pattern discovery technique has been proposed to overcome the low-frequency and misinterpretation problems for text mining. The proposed technique uses two processes, pattern deploying and pattern evolving, to refine the discovered patterns in text documents. The experimental results show that the proposed model outperforms not only other pure data mining-based methods and the concept based model, but also term-based state-of-the-art models, such as BM25 and SVM-based models.

Page 8: Effective pattern discovery for text mining