Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
• What is BIG DATA? • What is BIG DATA ANALYTICS? • Overview of MACHINE LEARNING • Tools & Technologies of BIG DATA ANALYTICS • Advanced Data Visualization Tools • DEMONSTRATION • MARKET TRENDS
TOPICS
• Volume: from terabytes to petabytes and up • Variety: an expanding universe of data types and sources • Velocity: accelerated data flow in all directions
Challenges of traditional data management techniques
Volume
Traditional analytics are often designed to analyze relatively small sample sizes
Data storage across multiple drives presents problems for traditional techniques
The cost to analyze large data sets using traditional techniques is too high from both a time and memory perspective
Challenges of traditional data management techniques
Velocity
Rapidly changing data sets require dynamic, real-time analysis that is not available with traditional techniques
Information Management processes need to intelligently decide in real time what data to save and what data to discard. This is not possible with traditional techniques
Challenges of traditional data management techniques
Variety
The proliferation of data types creates compatibility issues with traditional tools
The increasing demand for data mash-ups and deep insights challenges traditional techniques that struggle with non-numeric data
Demystifying Machine Learning
Categories of Machine Learning Tasks
• Supervised Learning •Unsupervised Learning • Reinforcement Learning
Categories of Machine Learning Tasks
• Supervised Learning
Examples of Supervised Learning
• Given data about the size of houses on the real estate market, try to predict their price – Regression Problem
• Given data about the size of houses on the real estate market, ascertain whether the home(s) in question will sell for more or less than the asking price – Classification Problem
Categories of Machine Learning Tasks
• Unsupervised Learning
Examples of Unsupervised Learning
• Automatically group a collection of 1000 essays into a small number that are somehow similar or related by different variables, such as word frequency, sentence length, page count, and so on – Clustering Problem
• Suppose a pediatrician over years of experience forms associations in his mind between patient characteristics and illnesses. If a new patient shows up, then based on this patient’s characteristics such as symptoms, physical attributes, mental outlook, etc the doctor associates possible illness based on what the doctor has seen before with similar patients – Association Problem
• Data Mining • Hadoop • In-memory Analytics • Predictive Analytics • Text Mining • Data Visualization
Big Data Analytics - Technologies
• Data Mining • Hadoop • In-memory Analytics • Predictive Analytics • Text Mining • Data Visualization
Big Data Analytics - Technologies
• Data Mining • Hadoop • In-memory Analytics • Predictive Analytics • Text Mining • Data Visualization
Big Data Analytics - Technologies
• Data Mining • Hadoop • In-memory Analytics • Predictive Analytics • Text Mining • Data Visualization
Big Data Analytics - Technologies
• Data Mining • Hadoop • In-memory Analytics • Predictive Analytics • Text Mining • Data Visualization
Big Data Analytics - Technologies
• Data Mining • Hadoop • In-memory Analytics • Predictive Analytics • Text Mining • Data Visualization
Big Data Analytics - Technologies
• Data Mining • Hadoop • In-memory Analytics • Predictive Analytics • Text Mining • Data Visualization
Big Data Analytics - Technologies
Why Put Big Data and Analytics Together?
• Big Data provides gigantic statistical samples, which enhance analytics tool results • Analytic tools and databases can now handle big data • The economics of analytics is now more embraceable than ever • There is a lot to learn from messy data as long as it is big
Why Put Big Data and Analytics Together?
• Big Data provides gigantic statistical samples, which enhance analytics tool results • Analytic tools and databases can now handle big data • The economics of analytics is now more embraceable than ever • There is a lot to learn from messy data as long as it is big
Why Put Big Data and Analytics Together?
• Big Data provides gigantic statistical samples, which enhance analytics tool results • Analytic tools and databases can now handle big data • The economics of analytics is now more embraceable than ever • There is a lot to learn from messy data as long as it is big
Why Put Big Data and Analytics Together?
• Big Data provides gigantic statistical samples, which enhance analytics tool results • Analytic tools and databases can now handle big data • The economics of analytics is now more embraceable than ever • There is a lot to learn from messy data as long as it is big
Why Put Big Data and Analytics Together?
• Big Data provides gigantic statistical samples, which enhance analytics tool results • Analytic tools and databases can now handle big data • The economics of analytics is now more embraceable than ever • There is a lot to learn from messy data as long as it is big
Options for Big Data Analytics Plotted by Potential Growth and Commitment
Advanced Data Visualization Tools
Key Capabilities of Advanced Data Visualization Tools
• Highly interactive graphics • Intuitive analytic capabilities • Easy report building • In-memory processing capabilities
Key Capabilities of Advanced Data Visualization Tools
• Highly interactive graphics • Intuitive analytics capabilities • Easy report building • In-memory processing capabilities
Key Capabilities of Advanced Data Visualization Tools
• Highly interactive graphics • Intuitive analytic capabilities • Easy report building • In-memory processing capabilities
Key Capabilities of Advanced Data Visualization Tools
• Highly interactive graphics • Intuitive analytics capabilities • Easy report building • In-memory processing capabilities
Key Capabilities of Advanced Data Visualization Tools
• Highly interactive graphics • Intuitive analytics capabilities • Easy report building • In-memory processing capabilities
Forrester Wave: Advanced Data Visualization Platforms
Forrester Wave: Advanced Data Visualization Platforms
Key Capabilities Pillars: Tableau Software
• Enables fast, ad-hoc analysis of Big Data • Provides advanced in-memory analytics • Makes data mash-ups easy • Gives users powerful, self-service analytics
Key Capabilities Pillars: Tableau Software
• Enables fast, ad-hoc analysis of Big Data • Provides advanced in-memory analytics • Makes data mash-ups easy • Gives users powerful, self-service analytics
Key Capabilities Pillars: Tableau Software
• Enables fast, ad-hoc analysis of Big Data • Provides advanced in-memory analytics • Makes data mash-ups easy • Gives users powerful, self-service analytics
Key Capabilities Pillars: Tableau Software
• Enables fast, ad-hoc analysis of Big Data • Provides advanced in-memory analytics • Makes data mash-ups easy • Gives users powerful, self-service analytics
Key Capabilities Pillars: Tableau Software
• Enables fast, ad-hoc analysis of Big Data • Provides advanced in-memory analytics • Makes data mash-ups easy • Gives users powerful, self-service analytics
Walk-through of Visualizations
• Highlight Table • Tree Maps • Maps – Symbol Maps, Filled Maps
Evolving Business Analytics: From Descriptive to Prescriptive
Gartner’s Definition: Prescriptive Analytics
Optimization is at the heart of Prescriptive Analytics
Big Data Predictions for 2017
• SQL is not going anywhere • Turn “data lakes” into insights • Combine Big Data with Machine Learning for real-time analytics
Business Analytics Trends for 2017
• Modern BI is the new normal • Analytics are everywhere, thanks to embedded BI • People will work with data in more natural ways • Advanced Analytics becomes more accessible • Collaborative analytics will take center stage
• What is DATA SCIENCE? • Skills of a Data Scientist • Applications of Data Science
TOPICS
What is Data Science?
What is Data Science?
Skills of a Data Scientist
• Business Acumen • Knowledge of Hadoop • Skills in SQL • Competent in R Programming and Statistical Analysis/Techniques • Skills in coding (i.e. Java, Python) • Adept at Data Visualization • Skills in Effective Communication
Skills of a Data Scientist
• Business Acumen • Knowledge of Hadoop • Skills in SQL • Competent in R Programming and Statistical Analysis/Techniques • Skills in coding (i.e. Java, Python) • Adept at Data Visualization • Skills in Effective Communication
Skills of a Data Scientist
• Knowledge of Hadoop
Skills of a Data Scientist
• Knowledge of Hadoop
Skills of a Data Scientist
• Business Acumen • Knowledge of Hadoop • Skills in SQL • Competent in R Programming and Statistical Analysis/Techniques • Skills in coding (i.e. Java, Python) • Adept at Data Visualization • Skills in Effective Communication
Skills of a Data Scientist
• Business Acumen • Knowledge of Hadoop • Skills in SQL • Competent in R Programming and Statistical Analysis/Techniques • Skills in coding (i.e. Java, Python) • Adept at Data Visualization • Skills in Effective Communication
Skills of a Data Scientist
• Business Acumen • Knowledge of Hadoop • Skills in SQL • Competent in R Programming and Statistical Analysis/Techniques • Skills in coding (i.e. Java, Python) • Adept at Data Visualization • Skills in Effective Communication
Skills of a Data Scientist
• Adept at Data Visualization
Skills of a Data Scientist
• Business Acumen • Knowledge of Hadoop • Skills in SQL • Competent in R Programming and Statistical Analysis/Techniques • Skills in coding (i.e. Java, Python) • Adept at Data Visualization • Skills in Effective Communication
Zuckerberg Test for Hiring Data Scientist
• They know the techniques of Data Science • They know the tools of Data Science • They can think at different altitudes • They are superb communicators • They can give and take criticism well • They are confident but not arrogant • They get things done • They make the people around them better • They are fun to be around
Types of Data Science Projects
• Tactical Optimization • Predictive Analytics • Nuanced Learning • Recommendation engines • Automated decision engines
Industry Specific Applications of Data Science - Finance
• Online Lending
Industry Specific Applications of Data Science - Travel
• Truly Personalized Offers • Enhanced Customer Service • Safer Travel
Industry Specific Applications of Data Science - Retail
• Customer Experience • Merchandizing • Marketing • Supply Chain Logistics
THANK YOU
APPENDIX
Highlight Table
71
• Step 5: Drag Profit Measure into Rows
Visualization #2: Highlight Table Chart
Tree Maps
73
Step 5: Drag Profit Measure into Label Mark
Visualization #8: Tree Map Chart
Symbol Maps
75
• Step 5: Under ‘Measures’, drag “Profit” onto ‘Label’ in the ‘Marks’ area
Visualization #5: Symbol Maps
Filled Maps
77
• Step 4: Select ‘Filled Maps’ from the ‘Show Me’ dialog
Visualization #6: Filled Maps