Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Copyright © 2011, SAS Institute Inc. All rights reserved.
Shifting Paradigms with SAS VanSUG User Gorup Meeting 02NOV2011 Jim Metcalf Senior Testing Director of Advanced Analytics
Transforming the World™
2
Copyright © 2011, SAS Institute Inc. All rights reserved.
3
Copyright © 2011, SAS Institute Inc. All rights reserved.
About us: SAS Research & Development
1000+ software developers in Cary, Pune, Beijing
Ph.D. specialists in statistics, data mining, optimization, applied math, numerical analysis, …
• Software used by statisticians, researchers, data miners
• Analytical components for business solutions
4
Copyright © 2011, SAS Institute Inc. All rights reserved.
What’s Involved in Producing Statistical Software?
1. Listening to customers
2. Keeping up with advances in technical, visual and statistical methodologies
3. Monitoring market advances
4. Lots of discussions!
5. Designing, writing, testing code
6. Writing user documentation
7. Providing technical support and training
8. Consulting with customers
9. Presenting to customers
5
Copyright © 2011, SAS Institute Inc. All rights reserved.
6
Copyright © 2011, SAS Institute Inc. All rights reserved. 6
7
Copyright © 2011, SAS Institute Inc. All rights reserved.
8
Copyright © 2011, SAS Institute Inc. All rights reserved.
Early SAS History
1966 SAS structure and language conceived by Jim Barr
1968 Barr & Goodnight integrate multiple regression and ANOVA into SAS
1971 SAS use at universities begins.
1972 SAS ’72 is delivered to market.
1973 John Sall joins, adding econometrics, time series, and matrix algebra
1976 SAS Institute is incorporated by Jim, Jim, John and Jane.
Jim Barr Jim Goodnight John Sall Jane Helwig
Students ponder the meaning of JCL.
9
Copyright © 2011, SAS Institute Inc. All rights reserved.
10
Copyright © 2011, SAS Institute Inc. All rights reserved.
Worldwide Advanced Analytics Tools Revenue by Vendor, 2009-2010 (Top 3)
IDC, Worldwide Business Intelligence Tools 2010 Vendor Shares, Doc #228442, June 2011
Company Revenue (SM) Share (%)
SAS
IBM
Microsoft
2009-2010 Growth (%)
2009 2010 2009 2010
529.0
236.2
27.3
582.5
268.5
32.2
34.7
15.5
1.8
35.2
16.2
1.9
10.1
13.7
17.8
11
Copyright © 2011, SAS Institute Inc. All rights reserved.
Starting out Strong: SAS on the IBM mainframe ~$400MM (USD) annually in revenue for SAS
12
Copyright © 2011, SAS Institute Inc. All rights reserved.
SAS® 6.09 Usability Experience…circa 1990
A Fetching and Unforgettable color palette
13
Copyright © 2011, SAS Institute Inc. All rights reserved. 13
SAS has evolved considerably since however…..
14
Copyright © 2011, SAS Institute Inc. All rights reserved.
Sir Tim Berners-Lee
Marc Andreesen
MOSAIC
$173 Billion Market Cap
Paradigm Shifter!
15
Copyright © 2011, SAS Institute Inc. All rights reserved. 15
Power Reporting
Web Reporting
Information Delivery Framework
Information Consumers Domain Experts Power
User
Business
Analyst
Info
Tech
Large% Small%
Web Report Viewing
Analytic
Reporting
SAS Audience Before
16
Copyright © 2011, SAS Institute Inc. All rights reserved. 16
Power Reporting
Web Reporting
Information Delivery Framework
Information Consumers Domain Experts Power
User
Business
Analyst
Info
Tech
Large% Small%
Web Report Viewing
Analytic
Reporting
SAS Audience Before
17
Copyright © 2011, SAS Institute Inc. All rights reserved.
SAS Shifting Paradigms!
18
Copyright © 2011, SAS Institute Inc. All rights reserved.
SAS® Retail Space Management on the iPad
showing Shelf layouts
SAS® Social Media Analytics
on the iPad
19
Copyright © 2011, SAS Institute Inc. All rights reserved.
General Purpose Mobile BI Roambi ES for SAS!
Blackberry
iPad and iPhone
20
Copyright © 2011, SAS Institute Inc. All rights reserved.
21
Copyright © 2011, SAS Institute Inc. All rights reserved.
45 years of development and investment
Most-powerful 4th Generation Language in the world
Extract, manipulate, analyze, format, report using >1000 functions and >350 procedures
Platform agnostic—35 hosts
Total R&D investment since 1976 is $6.5B
The SAS Language and core SAS platform Still Relevant, Still Elegant, Still Stunningly Awesome!
SAS programs represent business, statistical and analytic processes and are a valuable tangible asset containing expert knowledge
22
Copyright © 2011, SAS Institute Inc. All rights reserved.
Paradigm Shifter?
No, this is a
Ferrari Shifter
23
Copyright © 2011, SAS Institute Inc. All rights reserved.
Alfred Wegener 1880-1930
Father of Plate Tectonics
Revolutionized geology
Tied everything together
Diversity of species
Earthquakes
Volcanoes
Rock cycle (deposition, deformation, metamorphism)
Paradigm Shifter!
Wegener in Fort Lauderdale with wife
Australia goes long while India fakes right
Hey, these fit together just like puzzle pieces!
24
Copyright © 2011, SAS Institute Inc. All rights reserved.
Tectonic Plates and those who study them
ho study them
Kathleen Hodgkinson
Seismologist
Unnamed Fanatic
Seismologist
25
Copyright © 2011, SAS Institute Inc. All rights reserved.
Cross-section from the edge of the North American plate at the subduction front, across Vancouver Island to the Strait of Georgia. This section shows the Pacific Rim Terrane in brown and the Crescent Terrane in red. Cretaceous and
Tertiary sedimentary rocks are shown in yellow.
26
Copyright © 2011, SAS Institute Inc. All rights reserved.
27
Copyright © 2011, SAS Institute Inc. All rights reserved.
Paradigm Shifter Dr. Herb Dragert
Identified new crustal slip mechanism called Episodic Tremor and Slip back in 2001. Along the Cascadia Subduction Zone predicted it to occur every 13 to 16 months…..and it does!
From Rogers, Mazzotti, Dragert and Kao, Geological Geoscience Centre “A Review of Episodic Tremor and Slip Observations in the Northern Cascadia Subduction Zone and Hazard Implications”
28
Copyright © 2011, SAS Institute Inc. All rights reserved.
Crustal Deformation Measurement GPS Stations
GPS accurate to 100th of a millimeter. X, Y, Z of each station is averaged daily.
29
Copyright © 2011, SAS Institute Inc. All rights reserved.
Albert Head GPS Easting since 2004 Measured in mm and averaged weekly—40 mm “East” since 2004
30
Copyright © 2011, SAS Institute Inc. All rights reserved.
Albert Head Periodicity of Movement Spectra computed using traditional autocorrelation in PROC TIMESERIES with DIF=(1)
57.8 weeks (14.5 months!)
31
Copyright © 2011, SAS Institute Inc. All rights reserved.
Non-traditional: Singular Spectrum Analysis
Singular Spectrum Analysis (SSA) decomposes time series into principal components
Good for long time series where patterns are difficult to visualize and analyze
Has few model assumptions
Decomposes time series into spectral groupings via ―Eigenspectra‖
Spectral groupings can then be individually analyzed for signal
Part of SAS/ETS PROC TIMESERIES using the SSA option
32
Copyright © 2011, SAS Institute Inc. All rights reserved.
Albert Head Periodicity of Movement Eigenspectra plot identifies signal components and separates them from noise
33
Copyright © 2011, SAS Institute Inc. All rights reserved.
Eigenspectra Display in PROC TIMESERIES Using SSA option
―Eigen‖ German for ―proper‖ or ―characteristic‖
Imagine a physics formula that describes a pendulum
Coefficient matrix ―A‖ solves the pendulum formula
You can perturb the system Ay=(Lamda)y
Lamda is the Eigenvalue
For what value of Lamda is the matrix (A-Lamda) singular?
Find determinant of that matrix of coefficients
The Lambdas that fulfill this are called Eigenvalues
Eigenvalues represent a family of stable solutions. Lower valued Eigenvalues are less stable for a family of differential equations.
34
Copyright © 2011, SAS Institute Inc. All rights reserved.
Albert Head Periodicity of Movement Group 3 component
35
Copyright © 2011, SAS Institute Inc. All rights reserved.
Albert Head Periodicity of Movement Group 3 Power spectrum with significantly more power at 14.5 months using SSA
57.8 weeks (14.5 months!)
36
Copyright © 2011, SAS Institute Inc. All rights reserved.
Borehole Strainmeters Measure crustal deformation to 1 part in 1 billion with sampling every 10 minutes
Station B928
Station B009
37
Copyright © 2011, SAS Institute Inc. All rights reserved.
Borehole Strainmeters near Victoria and Vancouver
B928
B009
38
Copyright © 2011, SAS Institute Inc. All rights reserved.
39
Copyright © 2011, SAS Institute Inc. All rights reserved.
40
Copyright © 2011, SAS Institute Inc. All rights reserved.
Borehole Strainmeter alias of high frequency signals Non-visible 6.4 Vancouver Island Earthquake with 10 minute sample rate at B928 PROC TIMESERIES with DIF=(1)
Note: The periodicity in the signal is crustal strain from the tides. But….it’s
too coarsely sampled to see the earthquake of 09SEP.
09SEP Vancouver Island EQ
41
Copyright © 2011, SAS Institute Inc. All rights reserved.
Borehole Strainmeter with 1 second sample rate Vancouver Island Earthquake with PROC TIMESERIES at B928
Zooming in and DIF=(1) DIF=(0)
Dominant EQ Period is ~3.5s
42
Copyright © 2011, SAS Institute Inc. All rights reserved.
43
Copyright © 2011, SAS Institute Inc. All rights reserved.
Japanese EQ of 11MAR at Station B928 Tsunami wave with sample rate of 10 minutes
This 9.0 EQ is so powerful and long-lasting
the signal is not aliased even with a 10
minute sample rate
DIF=(1) DIF=(0)
Note ―ringing‖ for days afterward
superimposed on diurnal tidal variation
44
Copyright © 2011, SAS Institute Inc. All rights reserved.
Japanese EQ 11MAR at Station B009 1S sample rate
Dominant EQ Period is ~11s from
Tsunami Wave
45
Copyright © 2011, SAS Institute Inc. All rights reserved.
Predicting Earthquakes in the Future: Ultra Low Frequency Magnetic event just hours prior to Loma Prieta EQ 1989
After Fraser-Smith et al, 1990
46
Copyright © 2011, SAS Institute Inc. All rights reserved.
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Sunday, August 14, 2011 10:03 PM
To: Jim Metcalf
Subject: Re: ULF perturbation for 11MAR Japan EQ?
Dear Dr.Metcalf,
In response to yr inquiry,we have observed a definite ionospheric precursor on March 5 and 6 on the
VLF propagation path from the NLK(USA)-CHF(Chofu) path. This is submitted to an international
journal. As for ULF emissions we are now analysing the ULF data,but it seems to me that something exists in
possible asociation with this EQ.
REgards,
Masashi Hayakawa
47
Copyright © 2011, SAS Institute Inc. All rights reserved.
Acknowledgements
Kathleen Hodgkinson, UNAVCO
Herb Dragert, Geological Survey of Canada
Christine Puskas, UNAVCO
Fran Boler, UNAVCO
John Langbein, United States Geological Survey
Stephen Earle, Malaspina University-College
"U N A V C O, a non-profit university-governed consortium,
facilitates geoscience research and education using geodesy."
48
Copyright © 2011, SAS Institute Inc. All rights reserved.
01. 02. 03. 04. 05.
High-
Performance
Computing
Process
Automation
Business
Visualization
Data
Management
SaaS
Key SAS R&D Initiatives
49
Copyright © 2011, SAS Institute Inc. All rights reserved.
Trends in Platform Architecture
From: Sutter, H. The Free Lunch Is Over. A Fundamental Turn Toward Concurrency in Software
Clock speeds leveled out in 2003
Where are the 20 GHz processors?
50
Copyright © 2011, SAS Institute Inc. All rights reserved.
Massively Parallel Server Appliances EMC Greenplum and Teradata
analytic appliances
Provides
MPP database
MPP computing environment
Client-side operation from standard SAS session
Commoditization is coming!
51
Copyright © 2011, SAS Institute Inc. All rights reserved.
Analytical Tiers and HPA Procedures
Tier Examples Class SAS Procedures
Hindsight Descriptive statistics,
summarization
HPSUMMARY, MEANS, RANK,
UNIVARIATE
Cross-tabulation FREQ
Reporting REPORT, TABULATE
Insight—
descriptive modeling
Correlation analysis
Variable clustering
Factor analysis
Principal component analysis
Relationships
among
variables
REG, CORR,
VARCLUS
FACTOR
PRINCOMP
HPREG, HPREDUCE
Foresight—
predictive modeling
Linear models
Generalized linear models
Linear
elements
HPREG,
HPLOGISTIC
Nonlinear least-squares and
maximum likelihood
Nonlinear
elements
HPNLIN
Neural networks HPNEURAL
Linear mixed models Random
effects
HPLMIXED
Decision methods HPFOREST
Optimization Optimization TBD
52
Copyright © 2011, SAS Institute Inc. All rights reserved.
SAS Procedures
Single-threaded Multi-threaded
Not aware of distributed Aware of distributed computing environment computing environment
SAS/ACCESS for data read SAS/ACCESS for parsing support
Runs on client Runs on client or DBMS appliance
Brings distributed data Runs alongside distributed to client data source
Large I/O In-Memory Analytics
proc logistic data=TD.mydata;
class A B C;
model y(event=‘1’) = A B B*C;
run;
proc hplogistic data=TD.mydata;
class A B C;
model y(event=‘1’) = A B B*C;
run;
Then and Now
53
Copyright © 2011, SAS Institute Inc. All rights reserved.
HPLOGISTIC
Logistic Regression
Requested by banks
Needs to and does accommodate ―big data‖
1.1. billion observations, 7 regressors
High Performance environment runs in 52 seconds
Single threaded environment runs in 11 to 27 hours
Major features
Fits logistic and multinomial models
Does model selection
54
Copyright © 2011, SAS Institute Inc. All rights reserved.
HPREG
Requested by banks
Needs to and does handle ―big data‖
Stepwise selection in 1 minute: 120 million observations
161 effects, 446 parameters
Features from:
REG - regression diagnostics
GLM - classification variables
GLMSELECT - model selection
55
Copyright © 2011, SAS Institute Inc. All rights reserved.
High-Performance Markdown Optimization
Macy’s
Over 3 million products in over 700 stores
82 million product-locations in active markdown plans
Weekly optimization
100s of millions of pricing decisions in optimization each week
2—3 year sales history are combined with current week’s data
Data partitioned in thousands of groups
Estimation and forecasting step independent for groups
Optimization step at higher level
56
Copyright © 2011, SAS Institute Inc. All rights reserved.
Custo
mers
Visa Classic / Direct Mail
Visa Classic / Call Center
Visa Classic / Branch
Visa Gold / Direct Mail
Visa Gold / Call Center
Visa Gold / Branch
Home Equity Loan / Direct Mail
Home Equity Loan / Call Center
Home Equity Loan / Branch
MO is an Offer Assignment Problem
Tens of millions of customers Hundreds of offers
Billions of binary variables!
57
Copyright © 2011, SAS Institute Inc. All rights reserved.
Financial Services
Deciding the amount for credit line increases
Deciding the APR for balance transfer offers
Cross-sell and upsell in retail banking: savings accounts, home equity loans, credit cards, lines of credit, etc.
Telco
Targeted bundle offers: calling plans, text messaging, etc.
Customer retention through churn prevention offers
Retail
Personalized coupon offers
Other
Collections
Loyalty offers (Hotels, Casinos)
Applications
58
Copyright © 2011, SAS Institute Inc. All rights reserved.
Initial Performance Results for HP MO
1 captain
1 thread
128 captains
4 threads
Speed-up
Pre-distribute data 1hr 24min 1hr 24min
Prepare proc input (est.) 1hr 1min 30sec 122x
MO solver 5hr 29min 6min 15sec 53x
Total optimization time 6hr 30min 6min 45sec 58x
26 million customers, 910 offers, 21 linking constraints, 152 million rows of contact history
1 captain
1 thread
128 captains
4 threads
Speed-up
Pre-distribute data 3min 3min
Prepare proc input (est.) 1min 10sec 2sec 35x
MO solver 1hr 28min 2min 52sec 31x
Total optimization time 1hr 29min 2min 54sec 31x
925k customers, 674 offers, 674 linking constraints
59
Copyright © 2011, SAS Institute Inc. All rights reserved.
PROC OPTGRAPH Centrality Computation Network related algorithm
Computationally expensive
Identify influencers in a network
Centrality measures: closeness, betweenness, pagerank, etc.
Challenge
Size of network
Centrality computation is very computationally expensive
# nodes # links 1 captain
1 thread
20 captains
8 threads Speed-up
113,074 300,001 1hr18min 58sec 80x
60
Copyright © 2011, SAS Institute Inc. All rights reserved.
Analytical Product Releases in 2009 and 2010
• SAS/ETS 9.22
• SAS/OR 9.22
• SAS/STAT 9.22
Coupled with
SAS 9.2 M3
2009 1st Quarter 2010 2nd Quarter 2010 3rd Quarter 2010
• SAS/IML 9.22
• Enterprise Miner 6.2
4th Quarter 2010
• SAS 9.2
• Enterprise Miner
• We are committed to releasing features simultaneously as
well as asynchronously with the platform
61
Copyright © 2011, SAS Institute Inc. All rights reserved.
Analytical Products Releases in 2011
• Content Categorization
• Credit Scoring for
Enterprise Miner
• Enterprise Miner
• Enterprise Miner for
Desktop
• Forecast Server
• High-Performance
Forecasting
• Model Manager
• Ontology Management
• Rapid Predictive
Modeler
• SAS/ETS
• SAS/IML
• SAS/IML Studio
• SAS/INSIGHT
• SAS/LAB
• SAS/OR
• SAS/QC
• SAS/STAT
• Sentiment Analysis
• Simulation Studio
• Text Miner
• Text Miner for
Desktop
• Visual Data Discovery
1st Quarter 2011 2nd Quarter 2011 3rd Quarter 2011 4th Quarter 2011
Innovative algorithms for complex problems
High performance computing for speed and scale
Workflow for operationalizing analytics
• High
Performance
Analytics
• High
Performance
Data Mining
• RMPO EA
release for
IDeaS – Hilton
• HPA EA/LA
SAS 9.3 Platform SAS 9.3M1
62
Copyright © 2011, SAS Institute Inc. All rights reserved.
Analytical Product Releases in 2012 and beyond
• SAS/ETS
• SAS/IML
• SAS/IML Studio
• SAS/OR
• SAS/QC
• SAS/STAT
• Enterprise Miner
• Forecast Server
….
• HPETS
• HPOR
• HPSTAT
• HPDM
• High Performance
1st Quarter 2012 2nd Quarter 2012 3rd Quarter 2012 4th Quarter 2012
• SAS 9.4
1st Quarter 2013
• RMPO
Foundation
• SAS/ETS
• SAS/IML
• SAS/IML Studio
• SAS/OR
• SAS/QC
• SAS/STAT
• Enterprise Miner
• Forecast Server
….
• HPETS
• HPOR
• HPSTAT
• HPDM
• High Performance
63
Copyright © 2011, SAS Institute Inc. All rights reserved.
More Information
What’s New in SAS 9.3 http://support.sas.com/rnd/app/video/index.html
From TS:
9.3 License Renewal on Windows: http://support.sas.com/kb/43/617.html
ODS Graphic Designer: http://support.sas.com/kb/43/735.html
Installing 9.3 on Windows from DVDs: http://support.sas.com/kb/43/079.html
Creating a SAS 9.3 depot on UNIX from DVDs: http://support.sas.com/kb/43/429.html
Installing 9.3 on Windows from ESD: http://support.sas.com/kb/43/400.html
Creating a SAS 9.3 depot on UNIX from ESD: http://support.sas.com/kb/43/430.html