19
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

Page 2: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

A Working Definition of Big Data

”Data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.“ Wikipedia, 4/26/2011

Page 3: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

A Better Definition of Big Data

”The intersection of scale-out data analysis tools with scale-out data storage.“ Rob Peglar, May 2011

Page 4: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

As Good as it Gets Definition of Big Data

”I don’t want to have to run file system check on the @#$% thing, ever.“ All the Storage Admins I Know, June 2011

Page 5: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

What is Big Data – in a Datacenter

File-Based Enterprise Apps

NetApp

File-based Enterprise IT Applications

Vertical Line-of-Business Markets

Home Directories Virtualization File Archiving

M&E Life Sciences

R&D Engineering

Internet/ Web 2.0 Gov’t Oil &

Gas Higher

Ed

File-Based Data

NAS Access

>100TB File System

& &

Page 6: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

By 2012, 80% of all storage capacity sold will be for file-based data

Source:

Page 7: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

Big Data Companies – Coming out of the Woodwork

Cloudera Mu Sigma Infochimps Riptano Pervasive IRI Jive (bought Proximal

Labs) Karmasphere Infobright

nPario Qlik Datasift MetrixLab Alpine Data EMC (bought

Greenplum) IBM (bought Netezza) HP (bought Vertica) Teradata (bought Aster)

Page 8: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

What’s the Big Deal about Big Data?

McKinsey calls it “the next frontier for innovation, competition and productivity” McKinsey Global Institute, May 2011

Fueled by an explosion of smart devices handhelds, tablets, cameras

Human-oriented devices

Non-human-oriented devices sensors, embedded CPUs

Social networking messages & data grow exponentially Twitter feeds, Facebook updates, LinkedIn messages

Increasingly, business is conducted digitally – or digitized

Big Data is global – any source to any target

Page 9: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

Social Media – Not as Easy as Some Think

Page 10: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

What’s the Big Deal about Big Data?

Some research by McKinsey - McKinsey Global Institute, May 2011

$6000 worth of HDD can store all recorded original music But not all the copies of it!

5 billion mobile phones in use in 2010 – and growing Moving to multiple devices per person; ~7 billion people now on earth

30 billion content pieces shared by Facebook users per month in 2011

Digital data is growing globally at 40% per annum Compare to IT budgets which are growing at 5% per annum

Estimate is 2012 is 28 EB by enterprises and 36 EB by consumers

Total data stored in 2011 is 295 exabytes (accumulated in history)

1,300 exabytes/yr (1.3 ZB) of data transferred on the Internet by 2016

Page 11: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

What’s the Big Deal about Big Data?

More research by McKinsey - McKinsey Global Institute, May 2011

Estimated value of healthcare data is $300B just in US E.g. CDC public health warnings, cancer genomics, drug design

Tapping into value could reduce US HC spend by 8% I.e. stay within normal inflation instead of hyper-inflation

$600B est. commercial value of consumer location data E.g. from smartphones, tablets, GPS devices, etc.

140,000 new data analyst/data scientist positions and 1.5 million more data managers needed to tap into value

Transactional data, positioning data, captured data Consumption meters, usage tracking, embedded devices creating

Page 12: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

Big Data Applications and Management

Big data is nearly all file-based, not block-based Hadoop is an application written to analyze big data

open source, Java-based

Big data can mean billions to trillions of files Each file can be gigabytes to terabytes in size

Directed graph analysis, Collaborative Filtering, A/B testing, Associative Rule Learning, Classification, Natural Language processing, Data Mining, Pattern Matching, Sentiment Analysis, Comparative Effectiveness, Clinical Decision Support are examples of big data techniques

This means petabytes to exabytes of data Enterprises ingesting > 1PB data per day within 5 yrs

LCF to SLAC data transfer goal = 1 PB in eight hours over ESnet

Page 13: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

Big Data Applications and Management

Popular systems for Big Data and its analysis: BigTable (Google, built on GFS – structured big data)

Cassandra – open-source DBMS for distributed data

Dynamo (Amazon, distributed data system)

Hadoop – the Big Data system of choice for many

Map/Reduce – software framework for data reduction

Pig – software for analysis of very large datasets

Stream processors – for real-time event data (sensors)

All these systems rely on massive collections of files, read/written sequentially into compute clusters

Page 14: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

Social Networking Analysis Courtesy of NSF Workshop on Social Modeling

Page 15: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

The Internet in 60 seconds from GoGlobe.com

Page 16: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

Big Data’s Impact on Business

Big data allows companies to experiment digitally “What if” scenarios – simulations – extrapolations

Big data can allow companies to segment populations Based on analysis of individual’s contributed data

Financial services & insurance have huge potential Each client’s characteristics can be digitally analyzed

Consumer products & retail have huge potential Loyalty program data – growing exponentially

Security and management are top challenges

Page 17: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

How do you manage and design for Big Data? Big data necessitates a scale-out architecture

Must grow with ingestion rates & provide archive space

Big data must be protected on ingestion But not necessarily backed up

Much big data is temporal – ingest, crunch, archive

Big data is optimally managed as a single filesystem

No links, no stubs, no multiple mount points, no cataloging

Typical/traditional RAID does not match big data

Big data is typically write-once, processed sequentially GB/sec for data; IOPS for metadata; scale linearly

Page 18: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

Petabytes are not the challenge Exabytes are the real challenge until around 2024 Zettabytes are the challenge 2024 and beyond 1 TB systems in 2000; 1 PB in 2008; 1 EB in 2016

The architecture of systems for big data is key Patterson, Gibson, Katz: RAID paper (1988) 4 TB drives coming early ‘13; 6 & 8 TB in 2014 10 GB/sec/rack @1PB 100 GB/sec/rack @ 10PB HAMR promising; ~60 TB drives in 2016? RAID + unstructured a bad match – drive BER To meet the challenge we must do file-level encoding

The Conundrum

Page 19: 2012 SNIA Analytics and Big Data Summit. © Insert Your ... · Big data necessitates a scale-out architecture Must grow with ingestion rates & provide archive space Big data must

2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.

THANK YOU