Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Copyright © 2012, SAS Institute Inc. All rights reserved.
High Performance Analytics and the Challenges of Big Data
Toronto Area SAS Users Group12 Dec 2014
Charu Shankar, SAS Technical Training Specialist
Copyright © 2012, SAS Institute Inc. All rights reserved.
What is Big DataThriving in the Big Data eraOur Perspective – the Analytics Gap1.1. Volume1.2 Variety1.3 Velocity
2.1 Problem #1 Data Prep time part of problem2.2 Problem #2 Shortage of talent2.3 Problem #3 Our working ways don’t help
3.1 Some definitions3.2 What can data mining models tell us?3.3 How can HPA help?Questions
Agenda
Copyright © 2012, SAS Institute Inc. All rights reserved.
When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making
Big Data is RELATIVE not ABSOLUTE
What is BIG DATA
Copyright © 2012, SAS Institute Inc. All rights reserved.
VOLUME
VARIETY
VELOCITY
VALUE
TODAY THE FUTURE
DA
TA
SIZ
E
THRIVING IN THE BIG DATA ERAThriving in the BIG DATA era
Copyright © 2012, SAS Institute Inc. All rights reserved.
Most organizations:
� Can’t generate the information they need.
� Can’t generate information fast enough to act on it.
� Continue to incur huge costs due to uninformed decisions and misguided strategies.
The opportunities afforded by analytics have never been greater.
THE ANALYTICS GAPOUR
PERSPECTIVE
Copyright © 2012, SAS Institute Inc. All rights reserved.
Data is a corporate asset yet org are not leveraging the asset like they do labour & capital assets they normally have.
Does this look familiar?
Copyright © 2012, SAS Institute Inc. All rights reserved.
Data is no longer in megabytes or gigabytes
We’re talking PetabytesAnd that is 10 15
1.1 VOLUME
Copyright © 2012, SAS Institute Inc. All rights reserved.
• If the average MP3 encoding for mobile is around 1MB per minute, and the average song lasts about 4 minutes, then a petabyte of songs would last over 2,000 years playing continuously.
• If the average smartphone camera photo is 3MB in size and the average printed photo is 8.5 inches wide, then the assembled petabyte of photos placed side by side would be over 48,000 miles long - almost long enough to wrap around the equator twice.
• 1 petabyte is enough to store the DNA of the entire population of the US – and then clone them, twice.
Putting a Petabyte in perspective
Wes Biggs, chief technology officer at Adfonic
Copyright © 2012, SAS Institute Inc. All rights reserved.
Big data on social media
73% of online adults use a social networking site of some kind
684 million daily active users on Facebook
500 million tweets per day in 2013
Copyright © 2012, SAS Institute Inc. All rights reserved.
The New
LinkedINTwitterInstagramTumblrGoogle+VineOoovooAsk.fmYik YakWhatsAppWhisperYIKES!
The Old
Print Media TelevisionRadio
And it was only a 1-way monologue
1.2 Variety – And this is a real life experience
Copyright © 2012, SAS Institute Inc. All rights reserved.
VELOCITY
1.3 Velocity. Big data is coming at high velocity. Are you Ready ?
Copyright © 2012, SAS Institute Inc. All rights reserved.
IDENTIFY /
FORMULATE
PROBLEM
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATEMODEL
DEPLOYMODEL
EVALUATE /MONITORRESULTS
THE ANALYTICS LIFECYCLE
Data is the number one challenge in the adoption or use of business analytics.
Companies continue to struggle with data accuracy, consistency, and even access.
Bloomberg BusinessWeek Survey 2011
• Consumes up to 80% of the project
• Specific to the data and the analysis
DATAPREPARATION
2.1 Problem #1 Data Prep time part of problem
Copyright © 2012, SAS Institute Inc. All rights reserved.
A single electronic medical record (EMR) system from one cancer center showed lab results for Albumin, a protein measured in cancer patients, in over 30 ways.
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.2 Problem #2 – Shortage of Talent
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.2 Problem #2 – Shortage of Talent
Who is a data scientist?
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.3 Problem #3 – Our working ways don’t help
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.2 Problem #2 – Our working ways
Copyright © 2012, SAS Institute Inc. All rights reserved.
1. HPA is the ability to rapidly perform complex analysis on big data, enabling you to solve problems that you thought were unsolvable. HP on the front of
a proc. 2. HPA Server - lifts data into memory. When it sees HP PROC it splits into
worker nodes to split up sorting data, summarizing data, and even the sort
it splits up to do the work parallely
3. SAS VA provides a drag and drop web interface to enable you to quickly explore huge amounts of data.
4. Hadoop Think of it as an infinitely expandable filing cabinet 5. That has the ability to help you summarize
what is stored in it5. SAS LASR Server - is part of HPAS(High performance analytic server). Its
role is to push data into Memory.
3.1 Some definitions
Copyright © 2012, SAS Institute Inc. All rights reserved.
� Data Mining Models
� Which products are customers likely to buy?
� Which workers are likely to quit/resign/be fired?
� Text Models
� What are people saying about my products and services? Can I detect emerging issues from customer feedback or service claims?
� Forecasting Models
� How many products will be sold this year, next year?
� How does this break down into each product over the next 3 months, 6 months?
� Operations Research
� What is the optimal inventory and stock to be held of each of the products to minimize out of stock and overall holding costs?
� What is the least cost route for transporting goods from warehouses to final destinations? (PRESCRIPTIVE)
3.2 What questions should we be asking?
Copyright © 2012, SAS Institute Inc. All rights reserved.
Range penetration -
salary level compared to peers
3.2 What can data mining models tell us?
Copyright © 2012, SAS Institute Inc. All rights reserved.
TELCO -cust satisfaction at a telco, wait time is imp, then I might take action to put best customers head of the line. I can influence cust satisfaction by understanding underlying factors & then taking action to influence purchasing behaviour.
HEALTH -The next cure for cancer lies in big data. If we had a way to track, monitor, store & retrieve cancer patients’ way of life, we would be able to draw inferences to lead us to cure.
The value of harvesting big data in different industries
Copyright © 2012, SAS Institute Inc. All rights reserved.
example-HPA in unemployment statistics
Saskatchewan-5%Alberta - 4.5%Ontario - 7.9% Looks like labour doesn't move easily.
Copyright © 2012, SAS Institute Inc. All rights reserved.
HPA value another exampleMore labour economics, this time about your work. The Data Scientist
EMC Survey 65% of the respondents expect demand for data scientists to outstrip availability over the next five years
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
Key Takeaways of working with big data using HPA
• Working with entire data no longer just a sample
• Leverage real time data access
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .sas.com
Thanks for attendingQUESTIONS???
Charu Shankar, SAS institute Inc.
BLOG http://blogs.sas.com/content/sastraining/author/charushankar/
LINKEDIN http://ca.linkedin.com/in/charushankar
TWITTER https://twitter.com/CharuSAS
EMAIL [email protected]