26
Data Mining With Big Data Guide: Prof. Prashant G. Ahire Presented by : Miss.Rupa Solapure Roll no. 259

Data minig with Big data analysis

Embed Size (px)

Citation preview

Data Mining With Big Data

Guide: Prof. Prashant G. Ahire

Presented by : Miss.Rupa Solapure

Roll no. 259

Agenda

Problem DefinitionObjectives Literature SurveyArchitecture/Big Data mining algorithmExisting System/Mathematical modelAdvantagesDisadvantages/LimitationsCharacteristics of Big DataBig Data and it’s challengesBig Data mining ToolsApplications of Big DataReferences

Problem Definition:

Big Data consists of huge modules, difficult, growing data sets with

numerous and , independent sources. With the fast development of

networking, storage of data, and the data gathering capacity, Big Data are

now quickly increasing in all science and engineering domains, as well as

animal, genetic and biomedical sciences. This paper elaborates a HACE

theorem that states the characteristics of the Big Data revolution, and

proposes a Big Data processing model from the data mining view.

Objective:

This requires carefully designed algorithms to analyze model correlations

between distributed sites, and fuse decisions from multiple sources to gain a best

model out of the Big Data. Developing a safe and sound information sharing

protocol is a major challenge.

To support Big Data mining, high-performance computing platforms are

required, which impose systematic designs to unleash the full power of the Big

Data. Big data as an emerging trend and the need for Big data mining is rising in

all science and engineering domains.

Literature SurveyTitle/Year Keywords Concept/Abstract Author

“Data Mining With Big Data,Jan 2014”

Big Data,data Mining,Heterogeneity,Autonomous sources,Complex,and Evolving associations.

This paper presents a HACE theorem that characterizes the features of Big Data revolutions,processing model from data mining.

Xindong Wu, Fellow, IEEE, Xingquan Zhu, Senior Member, IEEE,Gong-Qing Wu, and Wei Ding

“The Survey of Data Mining ApplicationsAnd Feature Scope,,June 2012”

Data mining task, Data mining life cycle , Visualization of the data mining model , Data mining Methods,sData mining applications.

This paper imparts morenumber of applications of the data mining and also o focuses scope of the data mining which will helpful in the further research.

Neelamadhab Padhy1, Dr. Pragnyaban Mishra 2, and Rasmita Panigrahi3

“Review on Data Mining with Big Data..Dec 2014”

Big Data, data mining, heterogeneity, autonomous sources, complex and evolving associations.

This data-driven model involves demand-driven aggregation of information sources, mining and analysis, security and privacy considerations.

Savita Suryavanshi, Prof. Bharati Kale.

“SURVEY ON BIG DATA MININGPLATFORMS, ALGORITHMS ANDCHALLENGES.sep2014”

big data, big data mining platforms, big data mining algorithms, big data mining challenges, data mining.

This paper gives A review on various big data mining platforms, algorithms and challenges is also discussed in this paper.

SHERIN A1, Dr S UMA2, SARANYA K3, SARANYA VANI M4.

Architecture:

Fig.: Big data Memory evolution

Data Mining Algorithm

Decision tree induction classification algorithms

Evolutionary based classification algorithms

Partitioning based clustering algorithms

Hierarchical based clustering algorithms

Hierarchical based clustering algorithms

Hierarchical based clustering algorithms

Model based clustering algorithms

Existing System:

The rise of Big Data applications where data collection has grown tremendous doubly and is beyond the ability of commonly used software tools to capture, manage, and process within a “tolerable elapsed time.”

The most fundamental challenge for Big Data applications is to explore the large volumes of data and extract useful information or knowledge for future actions.

In many situations, the knowledge extraction process has to be very efficient and close to real time because storing all observed data is nearly infeasible.

The unprecedented data volumes require an effective data analysis and prediction platform to achieve fast response and real-time classification for such Big Data.

In model level it will produce local pattern. This pattern will be produced after

mined local data.

By sharing these local patterns with other local sites, we can produce a single

global pattern.

At the knowledge level, model correlation analysis investigates the relevance

between models generated from various data sources to determine how related

the data sources are correlated to each other, and how to form accurate decisions

based on models built from autonomous sources

Continue…

Big Data Big Data is a comprehensive term for any collection of data sets so large and multifarious that it becomes difficult to process them using conventional data processing applications.

There are two types of Big Data: structured and unstructured.

Structured data

Structured data are numbers and words that can be easily categorized and analyzed. These data are generated by things like network sensors embedded in electronic devices, smart phones, and global positioning system (GPS) devices. Structured data also include things like sales figures, account balances, and transaction data.

Unstructured data

Unstructured data include more multifarious information, such as customer reviews from feasible websites, photos and other multimedia, and comments on social networking sites. These data can not be separated into categorized or analyzed numerically.

Big Data Characteristic(HACE Theorem)

Figure . The blind men and the enormous elephant: the restricted view of each blind man leads to a biased conclusion.

HACE theorem suggests that the key characteristics of the Big Data are:

A. Huge with various and miscellaneous data sources

B. Autonomous Sources with circulated & disperse Control

C. Complex and Evolving associations

Applications of Data MiningMarketing

Analysis of consumer behaviour Advertising campaigns Targeted mailings Segmentation of customers, stores, or products

Finance Creditworthiness of clients Performance analysis of finance investments Fraud detection

Manufacturing Optimization of resources Optimization of manufacturing processes Product design based on customer requirements

Health Care Discovering patterns in X-ray images Analyzing side effects of drugs Effectiveness of treatments

Big Data Mining Algorithm

Big data applications have so many sources to gather information. If we want to mine data, we need to gather all distributed data to the centralized site.But it is prohibited because of high data transmission cost and privacy concerns. Most of the mining levels order to achieve the pattern of correlations, or patterns can be discovered from combined variety of sources. The global data mining is done through two steps process.

Model levelKnowledge level.

Each and every local sites use local data to calculate the data statistics and it share this information in order to achieve global data distribution in their data level.

Data Mining Challenges With Big Data

Fig. a conceptual view of the Big Data processing framework

DISADVANTAGES OF EXISTING SYSTEM

To explore Big Data, we have analysed several challenges at the

data, model, and system levels.

The challenges at Tier I focus on data accessing and arithmetic

computing procedures. Because Big Data are often stored at

different locations and data volumes may continuously grow, an

effective computing platform will have to take distributed large-

scale data storage into consideration for computing.

PROPOSED SYSTEM

We propose a HACE theorem to model Big Data characteristics. The

characteristics of HACH make it an extreme challenge for

discovering useful knowledge from the Big Data.

ADVANTAGES OF PROPOSED SYSTEM

Provide most relevant and most accurate social sensing feedback to

better understand our society at real time.

ADVANTAGES OF PROPOSED SYSTEM

Provide most relevant and most accurate social sensing feedback to

better understand our society at real time.

Characteristics of Big Data

Fig. Five Vs of BIG DATA

Volume- The quantity of data

Variety - categorizing the data

Velocity- speed of generation of data or the speed

of processing the data

Variability- Inconsistency

Complexity- Managing the data

Continue…

BIG Data Mining Tools

Hadoop

Apache S4

Strom

Apache Mahout

MOA

Fig.: Big Data processing

Conclusion:

Because of Increase in the amount of data in the field of genomics,

meteorology, biology, environmental research, it becomes difficult to handle

the data, to find Associations, patterns and to analyze the large data sets.

As an organization collects more data at this scale, formalizing the process of

big data analysis will become paramount.The paper describes methods for

different algorithms used to handle such large data sets. And it gives an

overview of architecture and algorithms used in large data sets.

References

McKinsy Global Institute, Big Data: The next frontier for

innovation, competition and productivity- May 2011

Xindong Wu, Xinguan Zhu, Gong-Qing Wu, Wei Ding, 2013,

Data Mining with Big Data

Ahmed and Karypis 2012, Rezwan Ahmed, George Karpis,

Algorithms for mining the evolution of conserved relational states in

dynamic network

IEEE, Data Mining with Big Data, January 2014

Oracle, June 2013,Unstructured Data Management with Oracle

Database 12c