Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
A Working Definition of Big Data
”Data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.“ Wikipedia, 4/26/2011
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
A Better Definition of Big Data
”The intersection of scale-out data analysis tools with scale-out data storage.“ Rob Peglar, May 2011
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
As Good as it Gets Definition of Big Data
”I don’t want to have to run file system check on the @#$% thing, ever.“ All the Storage Admins I Know, June 2011
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
What is Big Data – in a Datacenter
File-Based Enterprise Apps
NetApp
File-based Enterprise IT Applications
Vertical Line-of-Business Markets
Home Directories Virtualization File Archiving
M&E Life Sciences
R&D Engineering
Internet/ Web 2.0 Gov’t Oil &
Gas Higher
Ed
File-Based Data
NAS Access
>100TB File System
& &
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
By 2012, 80% of all storage capacity sold will be for file-based data
Source:
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
Big Data Companies – Coming out of the Woodwork
Cloudera Mu Sigma Infochimps Riptano Pervasive IRI Jive (bought Proximal
Labs) Karmasphere Infobright
nPario Qlik Datasift MetrixLab Alpine Data EMC (bought
Greenplum) IBM (bought Netezza) HP (bought Vertica) Teradata (bought Aster)
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
What’s the Big Deal about Big Data?
McKinsey calls it “the next frontier for innovation, competition and productivity” McKinsey Global Institute, May 2011
Fueled by an explosion of smart devices handhelds, tablets, cameras
Human-oriented devices
Non-human-oriented devices sensors, embedded CPUs
Social networking messages & data grow exponentially Twitter feeds, Facebook updates, LinkedIn messages
Increasingly, business is conducted digitally – or digitized
Big Data is global – any source to any target
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
Social Media – Not as Easy as Some Think
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
What’s the Big Deal about Big Data?
Some research by McKinsey - McKinsey Global Institute, May 2011
$6000 worth of HDD can store all recorded original music But not all the copies of it!
5 billion mobile phones in use in 2010 – and growing Moving to multiple devices per person; ~7 billion people now on earth
30 billion content pieces shared by Facebook users per month in 2011
Digital data is growing globally at 40% per annum Compare to IT budgets which are growing at 5% per annum
Estimate is 2012 is 28 EB by enterprises and 36 EB by consumers
Total data stored in 2011 is 295 exabytes (accumulated in history)
1,300 exabytes/yr (1.3 ZB) of data transferred on the Internet by 2016
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
What’s the Big Deal about Big Data?
More research by McKinsey - McKinsey Global Institute, May 2011
Estimated value of healthcare data is $300B just in US E.g. CDC public health warnings, cancer genomics, drug design
Tapping into value could reduce US HC spend by 8% I.e. stay within normal inflation instead of hyper-inflation
$600B est. commercial value of consumer location data E.g. from smartphones, tablets, GPS devices, etc.
140,000 new data analyst/data scientist positions and 1.5 million more data managers needed to tap into value
Transactional data, positioning data, captured data Consumption meters, usage tracking, embedded devices creating
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
Big Data Applications and Management
Big data is nearly all file-based, not block-based Hadoop is an application written to analyze big data
open source, Java-based
Big data can mean billions to trillions of files Each file can be gigabytes to terabytes in size
Directed graph analysis, Collaborative Filtering, A/B testing, Associative Rule Learning, Classification, Natural Language processing, Data Mining, Pattern Matching, Sentiment Analysis, Comparative Effectiveness, Clinical Decision Support are examples of big data techniques
This means petabytes to exabytes of data Enterprises ingesting > 1PB data per day within 5 yrs
LCF to SLAC data transfer goal = 1 PB in eight hours over ESnet
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
Big Data Applications and Management
Popular systems for Big Data and its analysis: BigTable (Google, built on GFS – structured big data)
Cassandra – open-source DBMS for distributed data
Dynamo (Amazon, distributed data system)
Hadoop – the Big Data system of choice for many
Map/Reduce – software framework for data reduction
Pig – software for analysis of very large datasets
Stream processors – for real-time event data (sensors)
All these systems rely on massive collections of files, read/written sequentially into compute clusters
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
Social Networking Analysis Courtesy of NSF Workshop on Social Modeling
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
The Internet in 60 seconds from GoGlobe.com
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
Big Data’s Impact on Business
Big data allows companies to experiment digitally “What if” scenarios – simulations – extrapolations
Big data can allow companies to segment populations Based on analysis of individual’s contributed data
Financial services & insurance have huge potential Each client’s characteristics can be digitally analyzed
Consumer products & retail have huge potential Loyalty program data – growing exponentially
Security and management are top challenges
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
How do you manage and design for Big Data? Big data necessitates a scale-out architecture
Must grow with ingestion rates & provide archive space
Big data must be protected on ingestion But not necessarily backed up
Much big data is temporal – ingest, crunch, archive
Big data is optimally managed as a single filesystem
No links, no stubs, no multiple mount points, no cataloging
Typical/traditional RAID does not match big data
Big data is typically write-once, processed sequentially GB/sec for data; IOPS for metadata; scale linearly
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
Petabytes are not the challenge Exabytes are the real challenge until around 2024 Zettabytes are the challenge 2024 and beyond 1 TB systems in 2000; 1 PB in 2008; 1 EB in 2016
The architecture of systems for big data is key Patterson, Gibson, Katz: RAID paper (1988) 4 TB drives coming early ‘13; 6 & 8 TB in 2014 10 GB/sec/rack @1PB 100 GB/sec/rack @ 10PB HAMR promising; ~60 TB drives in 2016? RAID + unstructured a bad match – drive BER To meet the challenge we must do file-level encoding
The Conundrum
2012 SNIA Analytics and Big Data Summit. © Insert Your Company Name. All Rights Reserved.
THANK YOU