Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #analysis #data...

Preview:

Citation preview

Big DATABy- Yash Bheda (1524008) Janhavi Jaltare(1524011) Krisha Udani()Binal Savla (1524003)

Table of ContentsTopicsHistory of Big DataBig DataArchitecture for NetworkNetwork Analysis AlgorithmBig Data network analysingNetwork Application Summary

1.0: History of Big Data Big data is a relative term describing when the data in

an organization is to be stored and managed by timely decision making.

Time Data Generation

Processing

Initially Employee generated data

Single Processor

Modern times User generated data

Parallel Processing(Multiple processors using servers)Recently System

generated data

Contents Big data generated by user and system

are mostly unstructured.Traditional Data Big Data Documents Photographs Finances Audio and Videos Stock Recording 3D Models Personnel Files Simulation

Location Data

BIG data

Big Data represents the way this information is analysed to help open Opportunities.

A deep need exists for the structure to parse the data to separate out the unwanted and find the useful threads to uncover opportunities.

Input information

New processing techniques Better results

Management approach Traditionally

ModernData input Storing Analysing

Data input Analysing Storing

4 V’s of BIG data Volume :vast amounts of data generated every

second.

Velocity:speed at which new data generated moves around.

Variability :messiness or trustworthiness of the data. It means inconsistent data flow with periodic peaks.

Variety :different types of data we can now use.

Variety of data

Big Data Classification Why classify? Complex situations 4 Vs Results

From classifying big data to choosing a big data solution

Defining a logical

architecture

Understanding atomic patterns for big data solutions

Understanding composite patterns to use for big data solutions

Choosing a solution pattern for a big data

solution

Determining the viability of a business problem for

a big data solution

Selecting the right products to implement a big data solution

Parallel processing

Mappers and Reducers Map-Reduce job =

- Map function (input->key-value pairs)+-Reduce function(key and list values->output).

Map() procedure (method) that performs filtering and sorting.

Reduce() method that performs a summary operation

NATURAL JOIN- MAPPING Join of R(A,B) with S(B,C) is the set of tuples (a,b,c).

Mapper need to send R(a,b) and S (b,c) to the same reducer, so they can be joined there.

Mapper output:key=B-value,value=relation and othe component (A or C).

-Example:R(1,2)-> (2,(R,1)) S(2,3)-> (2.(S,3))

Mapping TuplesR(1,2) —> —>(2,(R,1))

R(4,2) —> —>(2,(R,4))

S(2,3) —> —>(2,(S,3))

S(5,6) —> —>(5,(S,6))

MapperFor

R(1,2)

MapperFor

R(4,2)

MapperFor

S(2,3)

MapperFor

S(5,6)

Grouping Phase

There is a reduce for each key.

Every key-value pair generated by any mapper is sent to the reducer for its key.

Mapping Tuples —>(2,(R,1)) (2,(R,1))

(2,(R,4)) —>(2,(R,4)) (2,(S,3))

—>(2,(S,3))

(5,(S,6)) —>(5,(S,6))

MapperFor R(1,2)

MapperFor

R(4,2)

MapperFor

S(2,3)

MapperFor

S(5,6)

ReducerFor B=2

Reducer forB=5

Constructing Value-list The input to each reducer is organized by the system into a

pair: - The Key. - The List of values associated with that key.

THE VALUE-LIST FORMAT

(2,[(R,1), (R,4), (S,3)])—>

(5,[(S,6)])—>

Reducer forB=2

Reducer forB=5

The reduce Function For Join Given key b and a list of values that are either (R, ) or (S,, output each triple (,b,).

-Thus, the number of outputs made by a reducer is the product of the number of R’s on the list and the numbers of S’s on the list.

OUTPUT OF THE REDUCERS

(2,[(R,1), (R,4), (S,3)])—>

(5,[(S,6)])—>

Reducer forB=2

Reducer forB=5

—>(1,2,3), (4,2,3)

 

Network Resources Related to Big DataThe network's capability to absorb and transfer big data traffic is made up of six elements:1. Bandwidth2. Network delay3. Security4. Data delivery accuracy5. Availability6. Resiliency

Network Monitoring of Big Data● Most monitoring systems deal with major changes,

failures, configuration data, and traffic reporting.● The monitoring function itself is a producer of big

data. Therefore, the network data needs to be analyzed with big data applications.

● Traffic trends, where applications are located, what caused the traffic, and what network resources are available to effectively carry the traffic are all part of the network big data information.

Network Monitoring Strategies● Ensure that your monitoring tools collect the network information with enough

granularity to produce detailed statistical representations.

● You will need a dashboard that continuously provides alerts and alarms when traffic changes occur that are outside acceptable.

● Create short-term reports rapidly so that traffic changes that could impair the network operation can be discovered as soon as possible.

● If a cloud service is employed, do you have the traffic data from the cloud delivered in real time so you can make decisions before a problem worsens?

Benefits of Big Data Network Monitoring

1. Load balancing2. Data Filtering3. Real-time data analysis4. Managing Virtual resources

Big Data Impact

Network Applications Big data for network design Big data for network management Big data for network resource optimization Big data for network security and privacy Big data for network economics and pricing Big data for network performance evaluation Parallel and distributed algorithms for Big Data

Online services Netflix actually does comparison of their

show banners and gives each customer what appeals to them

Targeted marketing and advertising Using 'tracking cookies' Facebook can

collect information about each website you are visiting

It is possible to accurately predict a range of highly sensitive personal attributes simply by analysing the ‘Likes’

Network Security & Bigdata Software-Defined Networking (SDN)-based

controllers and Big Data analytics within and about the data network

Analyzes network security attacks and potential risks immediately, which prevents security breaches.

Eg:Behavior analysis software to prevent the misuse of crutial data.

Implementation Network partitioning is crucial in setting up big data

environments.

Heavy demands from applications do not impact other mission-critical workloads

Prepare now for big data scalability later

Yahoo is running more than 42,000 nodes in its big data environment, in 2013 the average number of nodes in a big data cluster was just over 100

Summary Big data helps better analysis and

market prediction. Helps develop better logistic and

accuracy in systems and reduces redundancy.

The characteristic 4 v’s support the management and utilization of massive data.

Recommended