View
133
Download
5
Category
Preview:
Citation preview
Big DATABy- Yash Bheda (1524008) Janhavi Jaltare(1524011) Krisha Udani()Binal Savla (1524003)
Table of ContentsTopicsHistory of Big DataBig DataArchitecture for NetworkNetwork Analysis AlgorithmBig Data network analysingNetwork Application Summary
1.0: History of Big Data Big data is a relative term describing when the data in
an organization is to be stored and managed by timely decision making.
Time Data Generation
Processing
Initially Employee generated data
Single Processor
Modern times User generated data
Parallel Processing(Multiple processors using servers)Recently System
generated data
Contents Big data generated by user and system
are mostly unstructured.Traditional Data Big Data Documents Photographs Finances Audio and Videos Stock Recording 3D Models Personnel Files Simulation
Location Data
BIG data
Big Data represents the way this information is analysed to help open Opportunities.
A deep need exists for the structure to parse the data to separate out the unwanted and find the useful threads to uncover opportunities.
Input information
New processing techniques Better results
Management approach Traditionally
ModernData input Storing Analysing
Data input Analysing Storing
4 V’s of BIG data Volume :vast amounts of data generated every
second.
Velocity:speed at which new data generated moves around.
Variability :messiness or trustworthiness of the data. It means inconsistent data flow with periodic peaks.
Variety :different types of data we can now use.
Variety of data
Big Data Classification Why classify? Complex situations 4 Vs Results
From classifying big data to choosing a big data solution
Defining a logical
architecture
Understanding atomic patterns for big data solutions
Understanding composite patterns to use for big data solutions
Choosing a solution pattern for a big data
solution
Determining the viability of a business problem for
a big data solution
Selecting the right products to implement a big data solution
Parallel processing
Mappers and Reducers Map-Reduce job =
- Map function (input->key-value pairs)+-Reduce function(key and list values->output).
Map() procedure (method) that performs filtering and sorting.
Reduce() method that performs a summary operation
NATURAL JOIN- MAPPING Join of R(A,B) with S(B,C) is the set of tuples (a,b,c).
Mapper need to send R(a,b) and S (b,c) to the same reducer, so they can be joined there.
Mapper output:key=B-value,value=relation and othe component (A or C).
-Example:R(1,2)-> (2,(R,1)) S(2,3)-> (2.(S,3))
Mapping TuplesR(1,2) —> —>(2,(R,1))
R(4,2) —> —>(2,(R,4))
S(2,3) —> —>(2,(S,3))
S(5,6) —> —>(5,(S,6))
MapperFor
R(1,2)
MapperFor
R(4,2)
MapperFor
S(2,3)
MapperFor
S(5,6)
Grouping Phase
There is a reduce for each key.
Every key-value pair generated by any mapper is sent to the reducer for its key.
Mapping Tuples —>(2,(R,1)) (2,(R,1))
(2,(R,4)) —>(2,(R,4)) (2,(S,3))
—>(2,(S,3))
(5,(S,6)) —>(5,(S,6))
MapperFor R(1,2)
MapperFor
R(4,2)
MapperFor
S(2,3)
MapperFor
S(5,6)
ReducerFor B=2
Reducer forB=5
Constructing Value-list The input to each reducer is organized by the system into a
pair: - The Key. - The List of values associated with that key.
THE VALUE-LIST FORMAT
(2,[(R,1), (R,4), (S,3)])—>
(5,[(S,6)])—>
Reducer forB=2
Reducer forB=5
The reduce Function For Join Given key b and a list of values that are either (R, ) or (S,, output each triple (,b,).
-Thus, the number of outputs made by a reducer is the product of the number of R’s on the list and the numbers of S’s on the list.
OUTPUT OF THE REDUCERS
(2,[(R,1), (R,4), (S,3)])—>
(5,[(S,6)])—>
Reducer forB=2
Reducer forB=5
—>(1,2,3), (4,2,3)
Network Resources Related to Big DataThe network's capability to absorb and transfer big data traffic is made up of six elements:1. Bandwidth2. Network delay3. Security4. Data delivery accuracy5. Availability6. Resiliency
Network Monitoring of Big Data● Most monitoring systems deal with major changes,
failures, configuration data, and traffic reporting.● The monitoring function itself is a producer of big
data. Therefore, the network data needs to be analyzed with big data applications.
● Traffic trends, where applications are located, what caused the traffic, and what network resources are available to effectively carry the traffic are all part of the network big data information.
Network Monitoring Strategies● Ensure that your monitoring tools collect the network information with enough
granularity to produce detailed statistical representations.
● You will need a dashboard that continuously provides alerts and alarms when traffic changes occur that are outside acceptable.
● Create short-term reports rapidly so that traffic changes that could impair the network operation can be discovered as soon as possible.
● If a cloud service is employed, do you have the traffic data from the cloud delivered in real time so you can make decisions before a problem worsens?
Benefits of Big Data Network Monitoring
1. Load balancing2. Data Filtering3. Real-time data analysis4. Managing Virtual resources
Big Data Impact
Network Applications Big data for network design Big data for network management Big data for network resource optimization Big data for network security and privacy Big data for network economics and pricing Big data for network performance evaluation Parallel and distributed algorithms for Big Data
Online services Netflix actually does comparison of their
show banners and gives each customer what appeals to them
Targeted marketing and advertising Using 'tracking cookies' Facebook can
collect information about each website you are visiting
It is possible to accurately predict a range of highly sensitive personal attributes simply by analysing the ‘Likes’
Network Security & Bigdata Software-Defined Networking (SDN)-based
controllers and Big Data analytics within and about the data network
Analyzes network security attacks and potential risks immediately, which prevents security breaches.
Eg:Behavior analysis software to prevent the misuse of crutial data.
Implementation Network partitioning is crucial in setting up big data
environments.
Heavy demands from applications do not impact other mission-critical workloads
Prepare now for big data scalability later
Yahoo is running more than 42,000 nodes in its big data environment, in 2013 the average number of nodes in a big data cluster was just over 100
Summary Big data helps better analysis and
market prediction. Helps develop better logistic and
accuracy in systems and reduces redundancy.
The characteristic 4 v’s support the management and utilization of massive data.
Recommended