38
Grab some coffee and enjoy the pre-show banter before the top of the hour!

The Ultimate Toolkit – Equipping the Data Scientist

Embed Size (px)

Citation preview

Page 1: The Ultimate Toolkit – Equipping the Data Scientist

Grab some coffee and enjoy the pre-show banter before the top of the hour!

Page 2: The Ultimate Toolkit – Equipping the Data Scientist

H T  Technologies    of   2014  

Page 3: The Ultimate Toolkit – Equipping the Data Scientist

HOST:  Eric  Kavanagh  

Page 4: The Ultimate Toolkit – Equipping the Data Scientist

     THIS  YEAR  is…  

Page 5: The Ultimate Toolkit – Equipping the Data Scientist

Data  Science      

�  Considered  a  highly  specialized  field  

�  Perceived  as  an  expensive  position  to  fill  given  the  required  skill  set  

�  Typically  involves,  among  other  things,  data  preparation  for  advanced  analytics  

Page 6: The Ultimate Toolkit – Equipping the Data Scientist

ANALYST:  

John  Myers  Research  Director,    Enterprise  Management  Associates  

ANALYST:  

Robin  Bloor  Chief  Analyst,    The  Bloor  Group  

GUEST:  

Chuck  Yarbrough  Director  of  Big  Data  Product  Marketing,    Pentaho  

THE  LINE  UP  

GUEST:  

Mark  Kromer  Big  Data  Analytics  Product  Manager,    Pentaho  

Page 7: The Ultimate Toolkit – Equipping the Data Scientist

INTRODUCING  

John  Myers  

Page 8: The Ultimate Toolkit – Equipping the Data Scientist

Today’s Presenters

John Myers, Research Director, EMA John has over 10 years of experience working in areas related to business analytics in professional services consulting and product development roles. Additionally, John helps organizations solve their business analytics problems, whether they relate to operational platforms – such as customer care or billing – or applied analytical applications – such as revenue assurance or fraud management.

Slide 8 © 2014 Enterprise Management Associates, Inc.

Page 9: The Ultimate Toolkit – Equipping the Data Scientist

How are companies using Data Science?

Slide 9 © 2014 Enterprise Management Associates, Inc.

Page 10: The Ultimate Toolkit – Equipping the Data Scientist

Data Science Defined

Data Science is the study of the generalizable extraction of business or domain knowledge from data. It incorporates varying elements and builds on techniques and theories from many fields, including signal processing, mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, uncertainty modeling, data warehousing, and high performance computing. Data Science is not restricted to Big Data. Although the fact that data is increasing in load, complexity and structure makes Big Data an important aspect of Data Science.

Slide 10 © 2014 Enterprise Management Associates, Inc.

Page 11: The Ultimate Toolkit – Equipping the Data Scientist

Vision of a “Data Scientist”

Slide 11 © 2014 Enterprise Management Associates, Inc.

Page 12: The Ultimate Toolkit – Equipping the Data Scientist

Few and far between…

Slide 12 © 2014 Enterprise Management Associates, Inc.

Page 13: The Ultimate Toolkit – Equipping the Data Scientist

Who’s really performing Data Science…

Slide 13 © 2014 Enterprise Management Associates, Inc.

Page 14: The Ultimate Toolkit – Equipping the Data Scientist

Many more Business Analysts…

Slide 14 © 2014 Enterprise Management Associates, Inc.

Page 15: The Ultimate Toolkit – Equipping the Data Scientist

EMA Hybrid Data Ecosystem

Slide 15 © 2014 Enterprise Management Associates, Inc.

Page 16: The Ultimate Toolkit – Equipping the Data Scientist

Empowering Data Scientists AND Business Analysts to perform Data Science

Slide 16 © 2014 Enterprise Management Associates, Inc.

Page 17: The Ultimate Toolkit – Equipping the Data Scientist

INTRODUCING  

Robin  Bloor  

Page 18: The Ultimate Toolkit – Equipping the Data Scientist

The Data Science

Dance

Robin Bloor, Ph.D.

Page 19: The Ultimate Toolkit – Equipping the Data Scientist

Take Note!

You can know more about a business

from its data than by any other

means

Page 20: The Ultimate Toolkit – Equipping the Data Scientist

The Driving Force of Insight

and

OPTIMIZATION?

Foresight

Oversight Hindsight

INSIGHT

Page 21: The Ultimate Toolkit – Equipping the Data Scientist

What is a Data Scientist?

u  Project manager

u Qualified statistician

u Domain Business expert

u  Experienced data architect

u  Software engineer

(IT’S A TEAM)

Page 22: The Ultimate Toolkit – Equipping the Data Scientist

A Process, Not an Activity

u  Data Analytics is a multi-disciplinary end-to-end process

u  Until recently it was a walled-garden. But the walls were torn down by: •  Data availability •  Scalable technology •  Open source tools

Page 23: The Ultimate Toolkit – Equipping the Data Scientist

The Impact of Machine Learning Machine learning and processing power (parallelism) will CHANGE the data analysis process

Machine learning AUTOMATES “data science”

to some degree

Page 24: The Ultimate Toolkit – Equipping the Data Scientist

The Data Analysis Budget

u  Data Analysis is BUSINESS R&D

u  The focus is on business process

u  The outcome of successful R&D is a CHANGED PROCESS

u  Think of manufacturing for a useful example

Page 25: The Ultimate Toolkit – Equipping the Data Scientist

INTRODUCING  

Chuck  Yarbrough  &  Mark  Kromer  

Page 26: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 26

DATA SCIENCE PACK

JUNE 18, 2014

CHUCK YARBROUGH DIRECTOR, BIG DATA PRODUCT MARKETING @CYARBROUGH

MARK KROMER BIG DATA ANALYTICS PRODUCT MANAGER @KROMERBIGDATA

Pentaho’s Hot Topic

Page 27: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 27

Data integration

Business analytics +

The IT department

Lines of business +

Big data Any data +

Any data. Any environment. Any analytics.

The strength of Pentaho lies in the power of combination

Page 28: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 28

ANY Analytics •  Reports •  Dashboards •  Visualizations •  Discovery •  Predictive •  Any role

Analytics

ANY Environment •  Data warehouses •  Data marts •  Stack vendors •  Cloud •  Embedded

Existing & New Data Infrastructure &

Processes

ANY Data •  Relational •  Operational •  Big Data •  Data sources not

yet anticipated

Billing

Location

Social Media

Customer

Web

Network

OUR VISION

The New Reality: Powerful yet simplified analytics for all users

Page 29: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 29

Pentaho 5.0 Architected for the Future Simplified analytics experience for all users

Simplified Analytics

Experience

Enterprise Big Data

Integration

Blended Big Data

Page 30: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 30 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 30 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 30

Entry

Tran

sfor

m

Advanced

Opt

imiz

e A Spectrum of Big Data Use Cases WHAT THE MARKET IS DEPLOYING TODAY AND PLANNING FOR TOMORROW

Data  Warehouse  Op.miza.on  

Streamlined  Data  Refinery  

Big  Data  Explora.on  

Customer  360  Degree  View  

Harnessing  Machine  &  Sensor  Data  

Next  Genera.on  Applica.ons  

Internal  Big  Data  as  a  Service  

On-­‐Demand  Big  Data  Blending  

Big  Data  Predic.ve  Analy.cs  

Use Case Complexity

Bus

ines

s Im

pact

Mone.ze  My  Data  

Page 31: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 31

Pentaho Data Science Pack OPERATIONALIZE R AND WEKA, OFFLOAD DATA PREPARATION

•  Allow Data Scientists to focus on analysis •  Use familiar tools (R, Weka) •  Leverage a graphical ETL tool to manage

data preparation •  Blend Big Data Sources Easily •  Provide access to data with governance •  Operationalize the analytic workflow •  Enable IT to partner with Data Scientists

Page 32: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 32

What’s in the Pack?

• R SCRIPT EXECUTOR •  Provides access to 5,500+

advanced algorithms • WEKA FORECASTING

•  Machine learning, time series analysis

• WEKA SCORING •  Calculates probability values for

better predictions

TOOLS FAMILIAR TO THE DATA SCIENTIST

PDI

R/Weka

Analy.c  Data  Flows  

Page 33: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 33

LEVERAGING THE DATA SCIENCE PACK Providing a more complete view for customers

Ken Krooner President at ESRG

“There was a gap in the market ….people like myself were piecing together solutions to help

with the data preparation, cleansing and orchestration of analytic data sets. The Pentaho Data Science Pack fills that gap to operationalize

the data integration process for advanced and predictive analytics ”

“…we are now helping clients blend a 360-degree view of all equipment data sources for early prediction of potential machinery failure.”

Page 34: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 34

“USING WEKA WITH PDI, WE ARE NOW HELPING CLIENTS HAVE A 360-DEGREE VIEW OF ALL EQUIPMENT DATA SOURCES TO ENABLE CAPABILITIES TO PREDICT EARLY PREDICTION OF POTENTIAL MACHINERY FAILURE.”

Business User (COO) Reporting on Operations

and Efficiency

End Users Dashboards and Reports on Machine Performance

PDI

Business Analytics

Server

Data Marts

Data Scientist Data Mining and Predictive Data

Governance

Local Machine and Server Data

Fleet Data via Satellite

Cross Department Operations Data

PDI

•  Provide  remote  and  onboard  analy.cs  for  mari.me  fleets  and  ships    

•  Weka  with  PDI,  to  help  clients  blend  a  360-­‐degree  view  

Page 35: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 35

Predictive View of the Customer LEVERAGE BLENDED BIG DATA & DATA SCIENCE TO SEIZE OPPORTUNITIES

Key  Considera-ons  

•  Requires  data  scien.sts  and  PhDs  -­‐  expensive  resources  

•  Data  prep  for  predic.ve  modeling  can  be  labor-­‐intensive  

 •  Tech  fit:  Various  data  stores,  

Distributed  Weka,  Enterprise  R  

What  is  it?    

•  Brings  mul+-­‐source  data  together  for  an  on-­‐demand  analy+c  view  across  customer  touch  points  

•  Applies  predic+ve  models  to  data  as  part  of  the  integra+on  process  –  to  op+mize  customer-­‐facing  decisions  

Why  Do  It?    

•  Recommend  profitable  decisions  for  front  line  teams  

•  Automate  and  scale  op-mal  customer  interac-ons  

•  Boost  upsell,  reduce  churn  

Page 36: The Ultimate Toolkit – Equipping the Data Scientist

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 36

Thank You

blog.pentaho.com

@Pentaho

Facebook.com/Pentaho

Pentaho Business Analytics

JOIN THE CONVERSATION. YOU CAN FIND US ON:

Page 37: The Ultimate Toolkit – Equipping the Data Scientist
Page 38: The Ultimate Toolkit – Equipping the Data Scientist

The  Archive  Trifecta:  •  Inside  Analysis    www.insideanalysis.com  •  SlideShare    www.slideshare.net/InsideAnalysis  •  YouTube    www.youtube.com/user/BloorGroup  

THANK  YOU!