50
Analytics Education in the Era of Big Data Gregory Piatetsky KDnuggets 1 © KDnuggets 2013

Analytics Education in the era of Big Data

Embed Size (px)

DESCRIPTION

Review of Analytics and Data Mining Education in the era of Big Data

Citation preview

Page 1: Analytics Education in the era of Big Data

© KDnuggets 2013 1

Analytics Education in the Era of Big Data

Gregory PiatetskyKDnuggets

Page 2: Analytics Education in the era of Big Data

© KDnuggets 2013 2

Outline

• Analytics, Data Mining, Data Science - What do we call it?

• Big Data Trends• Jobs and Skills• Analytics Education overview

Page 3: Analytics Education in the era of Big Data

(c) KDnuggets 2013 3

What do we call it?

Buzzwords and TrendsUseful for Marketing

“Analytics” has staying power

Page 4: Analytics Education in the era of Big Data

© KDnuggets 2013 4

What do we call it?

• Statistics• Data mining• Knowledge Discovery in Data (KDD) • Predictive Analytics• Business Analytics• Data Science• Data Analytics• …?

Same Core Idea:Finding Useful Patterns in Data

Different Emphasis

Page 5: Analytics Education in the era of Big Data

© KDnuggets 2012 5

Pre-history (1900-2000): Statistics

From Google Ngram viewer – English language booksSearch case sensitive – used most popular spelling.

statistics is the biggest term in 20th century, Analytics is used increasingly thru 20th century data mining appears in late 1990s

Page 6: Analytics Education in the era of Big Data

© KDnuggets 2012 6

20th Century Analytics vs Data Mining

Analytics

analytics

data mining

Data Mining

Google N-grams search is case sensitive; Note: “data mining” > “Data Mining” usageWhile “analytics” < “Analytics”

??

Page 7: Analytics Education in the era of Big Data

© KDnuggets 2012 7

Recent History: 1980-2008

“analytics” has been used since 1900, but started to rise in 2005“data mining” surges around 1995 (soon after first KDD conference) but slowly declines after 2003 (TIA controversy, associated with Govt invasion of privacy).“Knowledge Discovery” appears in 1989, rises in 1996, and plateaus in 2000(Google N-grams, smoothing =1)

analytics Knowledge Discovery

Analytics

data mining

Page 8: Analytics Education in the era of Big Data

(c) KDnuggets 2012 8

Google Trends: After 2006, Analytics > Data Mining

Global – all regions

Page 9: Analytics Education in the era of Big Data

(c) KDnuggets 2012 9

>50% of “Analytics” searches are for “Google Analytics”

Google Analytics introduced,Dec 2005

Page 10: Analytics Education in the era of Big Data

Google Trends observations (as of Jan 2013)

(c) KDnuggets 2013

Competing on Analytics book, Tom Davenport, Apr 2007 Vacation drops

Decline in analytics in 2012?

data mining: 16 analytics -google: 54

Page 11: Analytics Education in the era of Big Data

(c) KDnuggets 2013 11

Global View: searches for data mining, analytics -google

Google Trends

Page 12: Analytics Education in the era of Big Data

12

Google Trends: USA, 2012

© KDnuggets 2013

For “analytics – google –web –adsense”

Page 13: Analytics Education in the era of Big Data

© KDnuggets 2013 13

Google Trends:USA, Regional Interest

Page 14: Analytics Education in the era of Big Data

© KDnuggets 2013 14

Google Trends:USA, Analytics-related terms

Page 15: Analytics Education in the era of Big Data

(c) KDnuggets 2012 15

Analytics: Business > Data> Predictive > Text

Google Insights, Jan 2007- Sep 2012, Global

Page 16: Analytics Education in the era of Big Data

(c) KDnuggets 2013 16

Big Data > Data Mining > * Analytics > Data Science

“Big Data” Surge

Google Trends search, Jan 2007- Dec 2012, USA

Page 17: Analytics Education in the era of Big Data

(c) KDnuggets 2013 17

Big Data Trends

Page 18: Analytics Education in the era of Big Data

(c) KDnuggets 2013 18

3 Vs of Big Data

• Volume– Gigabytes to Terabytes to Petabytes …

• V e l o c i t y – online streaming

• Variety – numbers, text, links, images, audio, video, …

Page 19: Analytics Education in the era of Big Data

(c) KDnuggets 2013 19

Volume + Velocity => No consistency

• CAP Theorem (Eric Brewer, 2000)For highly scalable distributed systems, you can only have

two of following: – 1) consistency, – 2) high availability, and – 3) (network) partition tolerance (network failure tolerance)

http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

Implication: Big data solutions must stop worrying about consistency if they want high availability

Page 20: Analytics Education in the era of Big Data

(c) KDnuggets 2013 20

Big Data

• 2nd Industrial Revolution

• Do old activities better

• Create new activities/businesses

Page 21: Analytics Education in the era of Big Data

(c) KDnuggets 2013 21

Doing Old Things Better

“Classical” Analytics Application areas– Churn prediction – Direct marketing/Customer modeling– Recommendations– Fraud detection– Security/Intelligence – …

• Competition will level companies

Page 22: Analytics Education in the era of Big Data

(c) KDnuggets 2013 22

Limit to Predicting Human Behavior?

• There is randomness in human behavior and once we find first-level effects, there are diminishing returns in prediction on individual level

• Many examples: Netflix Prize, Customer modeling…

Gregory Piatetsky-Shapiro, Big Data Hype and Reality, Harvard Business Review blog, Oct 2012

Page 23: Analytics Education in the era of Big Data

© KDnuggets 2013 23

Netflix Prize Progress

The most advanced algorithms were only a few percentages better than basic algorithms

Page 24: Analytics Education in the era of Big Data

© KDnuggets 2013 24

Many Customer Modeling Tasks have Similar Lift

See G. Piatetsky-Shapiro, B. Masand, Estimating Campaign Benefits and Modeling Lift, Proceedings of KDD-99 Conference

0

2

4

6

8

10

12

14

0 5 10 15 20 25

100*T%

Lift

Actual lift(T) Est. lift(T) Lift(T) ~ T -0.5 = sqrt (1/T)

Page 26: Analytics Education in the era of Big Data

(c) KDnuggets 2012 26

Big Data Enables New Things !

– Google – first big success of big data – Social networks (Facebook, Twitter, LinkedIn, …)

success depends on network size, i.e. big data

– Location analytics– Health-care

• Personalized medicine

– Semantics and AI ?• Imagine IBM Watson, Siri in 2020 ?

– Beware of Loss of privacy

Page 27: Analytics Education in the era of Big Data

© KDnuggets 2012 27

Largest Dataset Analyzed?

Big Data Miners – elite group

2012 median dataset size ~20-40 GB, vs 10-20 GB in 2011.

www.KDnuggets.com/polls/2012/largest-dataset-analyzed-data-mined.html

Page 28: Analytics Education in the era of Big Data

© KDnuggets 2013 28

Digital Universe in 2020

40,000 exabytes by 2020

1 Exabyte = 1018 bytes= 1,000,000 TB

Source : IDC Study The Digital Universe in 2020

Page 29: Analytics Education in the era of Big Data

© KDnuggets 2012 29

Where in the World is Big Data?

Most of Big Data is outside the US

Page 30: Analytics Education in the era of Big Data

© KDnuggets 2012 30

Opportunities for Big Data

• Social networks• Surveillance video• Embedded and medical devices

– M2M data, logs• Entertainment and social media• Consumer images – image recognition,

labeling, …

Page 31: Analytics Education in the era of Big Data

© KDnuggets 2013 31

www.KDnuggets.com/polls/2012/where-applied-analytics-data-mining.html

Where did you apply Analytics/Data Mining?

Avg. Number of Industries 2.6

Most Popular:- CRM/Consumer analytics- Health care/ HR, - Retail- Banking- Education

Highest growth in:1. Advertising, 89.0% 2. Search / Web content mining, 55.1% 3. Retail, 40.6% 4. Other, 36.9% 5. Manufacturing, 35.7%

Page 32: Analytics Education in the era of Big Data

© KDnuggets 2013 32

Untapped Big Data Gap

Big limitation is lack of Analytic Talent

Page 33: Analytics Education in the era of Big Data

(c) KDnuggets 2011 33

JOBS AND SKILLS

Page 34: Analytics Education in the era of Big Data

(c) KDnuggets 2012 34

Shortage of Skills

• McKinsey: shortage by 2018 in the US of– 140-190,000 people with deep analytical skills

– 1.5 M managers/analysts with the know-how to use the analysis of big data to make effective decisions.

Source: www.mckinsey.com/mgi/publications/big_data/

Page 35: Analytics Education in the era of Big Data

© KDnuggets 2012 35

Indeed.com fastest growing jobs

Top 10 skills:• HTML5• MongoDB• iOS• Android• Mobile app• Puppet• Hadoop• jQuery• PaaS• Social Media

Hadoop

MongoDB

Page 36: Analytics Education in the era of Big Data

© KDnuggets 2012 36

“Big Data” grows faster than MongoDB

Big Data

Hadoop

MongoDB

Page 37: Analytics Education in the era of Big Data

(c) KDnuggets 2013 37

Data Mining >> Hadoop

Page 38: Analytics Education in the era of Big Data

© KDnuggets 2013 38

Demand for Data Scientists surging

Data Scientist – sexiest job of the 21st Century (???)say Thomas H. Davenport and D.J. Patil, (HBR, Oct 2012)

“Data Scientist”

Fastest growing term onwww.kdnuggets.com/jobs

1% of jobs in 2010

4% of jobs in 2011

19% of jobs in 2012

Page 39: Analytics Education in the era of Big Data

© KDnuggets 2013 39

What is a Data ScientistMy definition: A combination of MBA, a Statistician, and a Hacker

Drew Conway: http://www.drewconway.com/zia/?p=2378

Page 40: Analytics Education in the era of Big Data

(c) KDnuggets 2011 40

Rebranding from “Data Mining” to “Big Data”

Data Mining

Big Data

Data Scientist

“Data mining” jobs are much more common, but“Big Data” jobs are surging much faster than “Data Scientist”

Page 41: Analytics Education in the era of Big Data

(c) KDnuggets 2012 41

LinkedIn Analytics/Data Mining Skills“Ground” analytics skills most common

“Cloud” analytics skills growfastest

Text Analytics skills less commonSentiment Analysis – fastest growing

Page 42: Analytics Education in the era of Big Data

© KDnuggets 2013 42

Analytics Education

Page 43: Analytics Education in the era of Big Data

© KDnuggets 2013 43

Analytics Education: USA/CanadaUS: Northeast | US: South | US: Midwest | US: West | Canada

Northeast• Connecticut: Central Connecticut State University (CCSU), exploring cutting-edge data mining techniques and Applications. New Britain, CT.• U. Conn. MS in Business Analytics and Project Management, designed to meet the growing demand for professionals who can harness

advanced business analytics and project management skills. Hartford, CT.• Maryland: U. of Maryland MS in Business for Marketing Analytics, will help you learn how to harness and process massive amounts of data to

help design products, predict the effects of marketing campaigns, and better understand your customers. Fall 2013. College Park, MD.• Massachusetts: Bentley Master of Science in Marketing Analytics, teaches students how to become more engaged with consumers, how to

design and deliver robust statistical analysis, and how to effectively communicate the resulting insights. Waltham, MA.• Harvard Masters of Science in Computational Science and Engineering, including a major focus on machine learning and analyzing and

visualizing very large data sets. Cambridge, MA.• New Jersey: Rutgers Master of Business and Science (MBS) in Analytics, prepares students for data-driven decision making; brings together

fields of data management, statistics, machine learning and computation. New Brunswick, NJ.• Stevens Institute of Technology Master of Science - Business Intelligence & Analytics, Hoboken, NJ.• New York: Columbia MS in Computer Science, concentration in Machine Learning, New York, NY.• NYU Master of Science in Business Analytics, starting May 2013, New York, NY.• NYU MBA with specialization in Business Analytics, New York, NY. • ….

Full list at http://www.kdnuggets.com/education/usa-canada.html

www.kdnuggets.com/education/usa-canada.html

Page 44: Analytics Education in the era of Big Data

© KDnuggets 2013 44

Analytics Education: Online• Big Data University, offering online classes on Hadoop and DB2.• Caltech Learning from Data course, free, broadcast online Apr-May 2012.• CMU Open Learning Initiative, including on-line courses in statistics, math & logic.• Coursera, offering online classes from Stanford and other top universities.

Check especially Statistics, Data Analysis, and Scientific Computing courses.• Data Mining Tools Tutorials, covering Data Mining, Probability, Weka, R, and numerous commercial data mining tools.• EMC Data Science and Big Data Analytics open course.• LearnAnalytics India, delivering SAS and Advanced Analytics trainings online and offline.• Northwestern University Online Master of Science in Predictive Analytics, skills for leadership in a growing Field.• Oxford Advanced Diploma in Data and Systems Analysis, a one-year online course.• Stanford Center for Professional Education, offers certificate programs for managers and professionals in

Data Mining and Applications and many related areas.• Statistics.com, offering on-line short courses in statistics and data mining.

• UCI: U. of California, Irvine Extension, Predictive Analytics Certificate Program, a comprehensive online program.

• UC San Diego Data Mining Courses, part of Data Analysis study area.• Udacity, online university founded by David Evans and Sebastian Thrun.• Video lectures from conferences, workshops and the scientific lectures in the areas of machine learning, data and text mining,

and semantic web.

Full list at http://www.kdnuggets.com/education/online.html

Page 45: Analytics Education in the era of Big Data

© KDnuggets 2013 45

Analytics Certificates• Business Analytics Certificates from BeyeUniversity, sponsored by Sybase.• Business Analytics Certificate from Statistics.com, become your company's expert on forecasting, customer segmentation, consumer behavior,

and risk analysis.• Central Michigan University Graduate certificate in Data Mining, with SAS. Mount Pleasant, MI, USA• Data Mining Certificate from Statistics.com, master the secrets of teasing powerful information from large data sets to predict customer

behavior, identify likely high value customers, visualize high dimensional data, and convert text to minable data.• EMC Data Science and Big Data Analytics open curriculum-based Education and Certification.• Indiana U. Kelley School of Business Business Analytics Certificate Program, Bloomington, IN, USA.• INFORMS Analytics Certification, coming April 2013.• NJIT (New Jersey Institute of Technology) The Graduate Certificate in Data Mining, online and in class.• Nova Southeastern University Graduate Certificate In Business Intelligence / Analytics, Fort Lauderdale, FL.• Stanford Center for Professional Education, offers certificate programs for managers and professionals in Data Mining and Applications and

Quantitative Methods in Finance and Risk Management.• Statistics.com Institute for Statistics Education, offers certificates in Data Analytics, Biostatistics, Social Science, and Using R.• UCI: U. of California, Irvine Extension, Predictive Analytics Certificate Program, a comprehensive online program.• UCSD Data Mining certificate program, San Diego, CA.• University of Delaware Certificate in Analytics: Optimizing Big Data, Newark, DE, USA.• U. of Washington Certificate in Data Science, Seattle, WA, USA.• SAS Certificate Programs

Full list http://www.kdnuggets.com/education/analytics-data-mining-certificates.html

Page 46: Analytics Education in the era of Big Data

© KDnuggets 2013 46

Analytics Education by Doing

• Competitions – learn by doing – Kaggle and more

• Kaggle beginner competitions • Kaggle in class : free to Instructors from any

course dealing with data analysis

Page 47: Analytics Education in the era of Big Data

© KDnuggets 2013 47

Online Education – Free Courses

Data Science pseudo degree from • Lower-Division Courses

– Data Science 101 – Statistics One– Data Science 102 – Computing for Data Analysis (R) – Data Science 103 – Data Analysis– Data Science 104 – Introduction to Data Science

• Upper-Division Courses – Data Science 201 – Machine Learning I– Data Science 202 – Machine Learning II– Data Science 203 – Neural Networks for Machine Learning

• Graduate Courses – Data Science 301 – Learning from Data (Caltech course CS101)– Data Science 302 – Machine Learning III (MIT course 6.867)

Page 48: Analytics Education in the era of Big Data

© KDnuggets 2013 48

Analytics Education: Practical Knowledge > Degree Prestige

Paco Nathan - School prestige matters some to hiring managers. Several top schools are known to have excellent programs and track records: Stanford, CMU, U Washington, UC Berkeley, Harvard, Johns Hopkins, etc. However, keep in mind that that list is not entirely representative. For example, there are nearly a dozen relevant programs at Stanford, which has produced Google, Yahoo, etc., while the list only mentions one program.

More to the point, about half of my peers in this field have backgrounds in Physics or physical science/physical engineering -- and I tend to hire from those programs more so than from CS programs, because the grad students tend to have both the math/stats depth plus practice with real-world frameworks like R, Matlab, etc. Having a really solid background in applying statistics at scale, some knack for data visualization, plus good programming chops -- those skills will trump a PhD in Machine Learning.

http://www.linkedin.com/groups/Graduate-Programs-in-Big-Data-2013423.S.203868026?view=&srchtype=discussedNews&gid=2013423&item=203868026&type=member&trk=eml-anet_dig-b_pd-ttl-cn&ut=2Xc36Ur0k0zBA1

Page 49: Analytics Education in the era of Big Data

© KDnuggets 2013 49

Analytics Education Boom

2012 was a peak year for starting Analytics Degree Programs in the US

Page 50: Analytics Education in the era of Big Data

© KDnuggets 2013 50

Questions?Analytics Education Overviewwww.kdnuggets.com/education/ Subscribe to KDnuggets News email at

www.KDnuggets.com/subscribe.html

• Follow @kdnuggets on Twitter

• Email to [email protected]