Upload
soumya-soni
View
149
Download
0
Embed Size (px)
Citation preview
Big Data ManagementBilwa Upadhye - FPM03
Chetna Chauhan – FPM04Leon Dukkipati – PGP0686
Manzoor Ul Akram – FPM05 Soumya Soni – PGP06105
IIM Rohtak
05/02/2023 Big Data 2
What is Big Data?• The exponential growth and availability of data, both
structured and unstructured.
Structured Data• Data that resides in a fixed field within a record or file
is called structured data. This includes data contained in relational databases and spreadsheets.
Unstructured Data• Text and multimedia content like e-mail messages,
word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. The data doesn't fit neatly in a database
• 80 – 90% data in any organization is unstructured
05/02/2023 Big Data 3
Data Being Generated Everyday• eBay – 100 PB• Google – 100 PB• Facebook - 600 PB • Twitter – 100 TB• NSA – 29 TB • 90% of the data in the world today has been
created in the last two years alone Examples :• Sensors used to gather climate information, posts
to social media sites, digital pictures and videos, purchase transaction records, cell phone GPS signals, UID information, patient information etc.
Source : http://wikibon.org
05/02/2023 Big Data 4
05/02/2023 Big Data 5
What is Big Data Management?• Organization, administration and governance of
large volumes of both structured and unstructured data
• Tools used:Hadoop, NoSQL, Platfora
• Big data management is important to business, and society, because more data may lead to more accurate analyses.
05/02/2023 Big Data 6
RDBMS vs. Big Data Management Technologies
RDBMS• Structured data• ER model defined
perfectly• Less amount of data• Relational data base
management system• Applications: IIM
Rohtak
Big data management technologies• Unstructured data,
semi-structured data, unstructured data
• No perfect ER model• Large amount of data• Node based flat
structure• Healthcare, retail,
Google, IBM
Comparison of scalabilities
05/02/2023 Big Data 8
Hadoop• Open source software framework – JAVA• Fundamental assumption• Storage part: HDFS ( Hadoop distributed file
system)• Processing part: Map reduce• Working of Hadoop
05/02/2023 Big Data 9
What does Hadoop do?Map reduce
divides application into blocks
HDFS creates multiple
replicas of data blocks
HDFS places data blocks on different nodes around cluster
Map reduce accesses
data
Map reduce processes
data
05/02/2023 Big Data 10
NoSQL • Non SQL database• Provides mechanism for storage and retrieval of
data• Horizontal scaling
Platfora• Software works with open source software
framework Hadoop• When user queries database, software delivers
answer in real time
05/02/2023 Big Data 11
HDFS• Highly fault - tolerant and is designed to be
deployed on low-cost hardware
• Provides high throughput access to application data and is suitable for applications that have large data sets
• Relaxes a few POSIX requirements to enable streaming access to file system data
05/02/2023 Big Data 12
Properties of HDFS
Large: Thousands of server machines
Replicated data blocks
Failure is norm Fast detection and recovery of faults
Properties of HDFS
05/02/2023 Big Data 13
Map Reduce• Programming model for processing large data sets
• Developed by Google for internal search applications
• Currently used by Yahoo, Amazon, IBM etc
• The run time partitions the input and provides it to different Map instances
05/02/2023 Big Data 14
How does Map reduce work?Partitionin
g the input
Mapping of instancesMap (key, value)
(key’, value’)
Collection of the (key’,
value’) pairs
Distribution to
reduce functions
Each reduce produces single
file output
05/02/2023 Big Data
Map-Reduce Execution Engine(Example: Word Count)
15
Users only provide the “Map” and “Reduce” functions
Potential Value of Big Data• $300 billion potential
annual value to US health care.
• $600 billion potential annual consumer surplus from using personal location data.
• 60% potential in retailers’ operating margins.
India – Big Data• Gaining attraction
• Huge market opportunities for IT services (82.9% of revenues) and analytics firms (17.1 % )
• Current market size is $200 million. By 2015 $1 billion
• The opportunity for Indian service providers lies in offering services around Big Data implementation and analytics for global multinationals
18
Big Data Challenges
• Hard to quantify value to the enterprise
• Data Scientists roles are difficult to fill
• Difficult to design effective visualization and reporting of new data sets
Big Data in Education
• Goal of improving retention and graduation rates
• Developing a more pro-active relationship with students to help them be more successful during and after graduation
• Approach:1. Online Applications for
Education2. Forums3. Help desk4. Student Demographic and Operational Information
05/02/2023 Big Data 21
Narendra Modi wins 2014 Lok Sabha Elections
05/02/2023 Big Data 22
THE FIRST PRIME MINISTER TO USE BIG DATAWhat makes Modi’s use of big data so impressive
Volume of Data : 814 million voters
Variety of data – 12 different languages -- 900,000 PDF’s amounting -- 25 million pages -- heterogeneous, non-uniform data
For what purpose did he use Big Data ?-> To drive donations, enroll volunteers, and improve the effectiveness of everything from door knocks…to social media
BJP’s website, planted cookies on all computers that visited its site - for customised advertisements.
#IndiaVotes
Source : dataconomy.com
05/02/2023 Big Data 23
References• http://dataconomy.com/narendra-modi-first-prime-
minister-use-big-data-analytics/• http://dataconomy.com/narendra-modi-first-prime-
minister-use-big-data-analytics/• http://blog.pivotal.io/data-science-pivotal/case-
studies/big-data-in-education-analyzing-student-clusters-to-influence-success-and-retention
05/02/2023 Big Data 24
Thank You