24
Big Data Management Bilwa Upadhye - FPM03 Chetna Chauhan – FPM04 Leon Dukkipati – PGP0686 Manzoor Ul Akram – FPM05 Soumya Soni – PGP06105 IIM Rohtak

Big data management

Embed Size (px)

Citation preview

Page 1: Big data management

Big Data ManagementBilwa Upadhye - FPM03

Chetna Chauhan – FPM04Leon Dukkipati – PGP0686

Manzoor Ul Akram – FPM05 Soumya Soni – PGP06105

IIM Rohtak

Page 2: Big data management

05/02/2023 Big Data 2

What is Big Data?• The exponential growth and availability of data, both

structured and unstructured.

Structured Data• Data that resides in a fixed field within a record or file

is called structured data. This includes data contained in relational databases and spreadsheets.

Unstructured Data• Text and multimedia content like e-mail messages,

word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents.  The data doesn't fit neatly in a database

• 80 – 90% data in any organization is unstructured

Page 3: Big data management

05/02/2023 Big Data 3

Data Being Generated Everyday• eBay – 100 PB• Google – 100 PB• Facebook - 600 PB • Twitter – 100 TB• NSA – 29 TB • 90% of the data in the world today has been

created in the last two years alone Examples :• Sensors used to gather climate information, posts

to social media sites, digital pictures and videos, purchase transaction records, cell phone GPS signals, UID information, patient information etc.

Source : http://wikibon.org

Page 4: Big data management

05/02/2023 Big Data 4

Page 5: Big data management

05/02/2023 Big Data 5

What is Big Data Management?• Organization, administration and governance of

large volumes of both structured and unstructured data

• Tools used:Hadoop, NoSQL, Platfora

• Big data management is important to business, and society, because more data may lead to more accurate analyses.

Page 6: Big data management

05/02/2023 Big Data 6

RDBMS vs. Big Data Management Technologies

RDBMS• Structured data• ER model defined

perfectly• Less amount of data• Relational data base

management system• Applications: IIM

Rohtak

Big data management technologies• Unstructured data,

semi-structured data, unstructured data

• No perfect ER model• Large amount of data• Node based flat

structure• Healthcare, retail,

Google, IBM

Page 7: Big data management

Comparison of scalabilities

Page 8: Big data management

05/02/2023 Big Data 8

Hadoop• Open source software framework – JAVA• Fundamental assumption• Storage part: HDFS ( Hadoop distributed file

system)• Processing part: Map reduce• Working of Hadoop

Page 9: Big data management

05/02/2023 Big Data 9

What does Hadoop do?Map reduce

divides application into blocks

HDFS creates multiple

replicas of data blocks

HDFS places data blocks on different nodes around cluster

Map reduce accesses

data

Map reduce processes

data

Page 10: Big data management

05/02/2023 Big Data 10

NoSQL • Non SQL database• Provides mechanism for storage and retrieval of

data• Horizontal scaling

Platfora• Software works with open source software

framework Hadoop• When user queries database, software delivers

answer in real time

Page 11: Big data management

05/02/2023 Big Data 11

HDFS• Highly fault - tolerant and is designed to be

deployed on low-cost hardware

• Provides high throughput access to application data and is suitable for applications that have large data sets

• Relaxes a few POSIX requirements to enable streaming access to file system data

Page 12: Big data management

05/02/2023 Big Data 12

Properties of HDFS

Large: Thousands of server machines

Replicated data blocks

Failure is norm Fast detection and recovery of faults

Properties of HDFS

Page 13: Big data management

05/02/2023 Big Data 13

Map Reduce• Programming model for processing large data sets

• Developed by Google for internal search applications

• Currently used by Yahoo, Amazon, IBM etc

• The run time partitions the input and provides it to different Map instances

Page 14: Big data management

05/02/2023 Big Data 14

How does Map reduce work?Partitionin

g the input

Mapping of instancesMap (key, value)

(key’, value’)

Collection of the (key’,

value’) pairs

Distribution to

reduce functions

Each reduce produces single

file output

Page 15: Big data management

05/02/2023 Big Data

Map-Reduce Execution Engine(Example: Word Count)

15

Users only provide the “Map” and “Reduce” functions

Page 16: Big data management

Potential Value of Big Data• $300 billion potential

annual value to US health care.

• $600 billion potential annual consumer surplus from using personal location data.

• 60% potential in retailers’ operating margins.

Page 17: Big data management

India – Big Data• Gaining attraction

• Huge market opportunities for IT services (82.9% of revenues) and analytics firms (17.1 % )

• Current market size is $200 million. By 2015 $1 billion

• The opportunity for Indian service providers lies in offering services around Big Data implementation and analytics for global multinationals

Page 18: Big data management

18

Big Data Challenges

• Hard to quantify value to the enterprise

• Data Scientists roles are difficult to fill

• Difficult to design effective visualization and reporting of new data sets

Page 19: Big data management

Big Data in Education

• Goal of improving retention and graduation rates

• Developing a more pro-active relationship with students to help them be more successful during and after graduation

• Approach:1. Online Applications for

Education2. Forums3. Help desk4. Student Demographic and Operational Information

Page 20: Big data management
Page 21: Big data management

05/02/2023 Big Data 21

Narendra Modi wins 2014 Lok Sabha Elections

Page 22: Big data management

05/02/2023 Big Data 22

THE FIRST PRIME MINISTER TO USE BIG DATAWhat makes Modi’s use of big data so impressive

Volume of Data : 814 million voters

Variety of data – 12 different languages -- 900,000 PDF’s amounting -- 25 million pages -- heterogeneous, non-uniform data

For what purpose did he use Big Data ?-> To drive donations, enroll volunteers, and improve the effectiveness of everything from door knocks…to social media

BJP’s website, planted cookies on all computers that visited its site - for customised advertisements.

#IndiaVotes

Source : dataconomy.com

Page 23: Big data management

05/02/2023 Big Data 23

References• http://dataconomy.com/narendra-modi-first-prime-

minister-use-big-data-analytics/• http://dataconomy.com/narendra-modi-first-prime-

minister-use-big-data-analytics/• http://blog.pivotal.io/data-science-pivotal/case-

studies/big-data-in-education-analyzing-student-clusters-to-influence-success-and-retention

Page 24: Big data management

05/02/2023 Big Data 24

Thank You