64
DATA SCIENCE FOR PRODUCT DESIGNERS AND NON-SCIENTISTS ESSENTIAL

Essential Data Science for Product Designers and Non-Scientists

Embed Size (px)

Citation preview

Page 1: Essential Data Science for Product Designers and Non-Scientists

DATA SCIENCEFOR PRODUCT DESIGNERS AND NON-SCIENTISTS

ESSENTIAL

Page 2: Essential Data Science for Product Designers and Non-Scientists

ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS

WHAT IS DATA SCIENCE?

Page 3: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

LOTS OF DEFINITIONS:▸ The art and science of utilizing data to produce actionable insights.

▸ The utilization of science and mathematics to extract knowledge and insights

▸ A branch of computer science that applies statistics on a dataset to make predictions and find patterns.

▸ KDD - Knowledge Discovery in Databases

WORKING WITH DATA TO OBTAIN INSIGHTS, MAKE PREDICTIONS AND PROVIDE ADVICE

Page 4: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

LOTS OF BRANCHES & DISCIPLINES

▸ Data Analytics

▸ Big Data

▸ Data Engineering

▸ Database Technology

▸ Statistics

▸ Data Visualization

▸ Machine Learning

▸ Neural Networks

▸ Artificial Intelligence

DATA SCIENCE

Page 5: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

LET’S FOCUS ON THESE TO ORIENT OURSELVES TO A NARROWER FOCUS IN DATA SCIENCE

▸ Data Analytics

▸ Machine Learning

▸ Data Engineering

Page 6: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

BUSINESS INTELLIGENCE VS DATA SCIENCE

▸ Retrospective - Looking at historical data to predict the future

▸ Pre-canned or pre-defined questions to submit to system

▸ Data is largely siloed or warehoused

▸ Ask specific questions related to strategic business operations

BUSINESS INTELLIGENCE DATA SCIENCE

▸ Prospective - forecasting the future

▸ Discovery of questions to ask or development of questions in the form of hypothesis

▸ Data is distributed in data lakes or in warehouses; can be real-time streams or near-real time streams

▸ Can ask questions about strategy but can be used for any domains

Page 7: Essential Data Science for Product Designers and Non-Scientists

ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS

WHAT PRODUCT DESIGNERS NEED TO KNOW?

KNOW THE CONCEPTS AND CHARACTERISTICS OF DATA IS A GOOD START

Page 8: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

GOALS FOR PRODUCT DESIGNERS

▸ Provide ability for users to make better decisions with new knowledge and insights

▸ To optimize and unlock the value of service and operational efficiencies

▸ Supply insights to make new, better products and services

▸ Converting data into stories that engage and empower users

▸ Employ summary reports or visualization to help decision makers

Page 9: Essential Data Science for Product Designers and Non-Scientists

ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS

UNDERSTAND THE PROBLEMWHAT IS THE PURPOSE? WHAT ARE THE QUESTIONS?

Page 10: Essential Data Science for Product Designers and Non-Scientists

ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS

REVIEW THE DATACHECK THE ORIGIN, TYPE, PROPERTIES & CLASSES OF YOUR DATA

Page 11: Essential Data Science for Product Designers and Non-Scientists

ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS

USING THE RESULTSHOW ARE THE RESULTS REFLECTED IN YOUR PRODUCT?

Page 12: Essential Data Science for Product Designers and Non-Scientists

ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS

DATA IS THE FUELTHE INPUTS FOR ANALYSIS INSIGHTS AND PREDICTION

Page 13: Essential Data Science for Product Designers and Non-Scientists

ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS

LOTS OF DATA POINTS

BIG DATATO EXTRACT SAMPLES

Page 14: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

DATA SOURCES • IT Log Files

• Sensor Data (IOT)

• Website Clickstreams

• Social Media Feeds

• Machine Data

• Location Tracking Data

• Financial Transactions

• Commercial Feeds (3rd Party Vendor)

• Public Databases

• Academic Sources

WHERE TO OBTAIN DATA FROM?

Page 15: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

PROPERTIES OF DATA : 4V

▸ Volume: How much data do you have?

▸ Velocity: How fast is the data coming in? How often?

▸ Variety: How heterogeneous or homogenous is the data?

▸ Veracity: What is the quality of the data?

Page 16: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

VOLUME Volume is usually data is measured in gigabytes (GB) and terabytes (TB), sometimes hundreds of megabytes (MB) can considered a healthy yield

Page 17: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

VELOCITY The rate at which data is delivered into the system. This can be streamed in real-time, near real-time or delivered in batches.

Page 18: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

VARIETY The types and kinds of data you expect in your system. Is it homogenous or heterogenous? We cover this in the next few slides.

Page 19: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

VERACITY Does the data accurately reflect what you are trying to accomplish with the data? Important to rate the source of the data. Do a cursory review to see if the data is what you would expect.

Page 20: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

LEVELS OF MEASURES (TYPES OF DATA TOO)

▸ Nominal - Distinct Categories (e.g. Gender)

▸ Ordinal - Ranking or Order (e.g. Service Ratings)

▸ Interval - difference between two values is meaningful (e.g. Temperature)

▸ Ratio - like interval, but has a clear definition of 0 (e.g. Height)

QUALITATIVE OR QUANTITIVE

Page 21: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

DATA TYPES▸ Text Data

▸ Image Data

▸ Timestamps

▸ Video Data

▸ Audio

▸ Binary Data

▸ Counters

KINDS OF DATA YOU CAN EXPECT

Page 22: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

DATA STRUCTURE FORMS

• Structured - Typically in a RDBMS, very organized and labeled

• Unstructured - unfiltrered, unlabeled data like images, video, raw data from IOT

• Semi-Structured - mixed between labeled and unlabeled data

STRUCTURED, UNSTRUCTURED & SEMI-STRUCTURED

Page 23: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

RECORDS AND FIELDSSEE DATA AS A MATRIX

Variables Attributes Features

Observations

Samples

Tuples

DATA COLUMNS

ROWS

When encountering these words in data analysis and machine learning, think in terms of a spreadsheet as simply columns (fields) and records (rows) like in a relational databases.

Page 24: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

DATA QUALITY๏ Complete

๏ Accurate

๏ Relevant

๏ Fresh or Outdate

๏ Distinct

๏ Accessible

WHAT IS THE LEVEL OF DATA QUALITY? IS IT…

Page 25: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

OFF THE MARK

๏ Accuracy is an issue pertaining to the quality of data and the number of errors contained in a dataset Precision

๏ Precision refers to the level of measurement and exactness of description in a dataset; It is important to realize, however, that precise data--no matter how carefully measured--may be inaccurate.

SEEK BOTH ACCURACY AND PRECISION

This also is related to variance and bias in data…

Page 26: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

ERRORS: VARIANCE

๏ The variance is error from sensitivity to small fluctuations in the training set. High variance can cause overfitting: modeling the random noise in the training data, rather than the intended outputs

๏ Add more data to your dataset to mitigate high variance

MEASUREMENT SENSITIVITY

Page 27: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

ERRORS: BIAS

๏ Bias measures how far off in general these models' predictions are from the correct value.

๏ Review your data; may require you to revise your dataset

ERRONEOUS ASSUMPTIONS

Page 28: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

DATA PIPELINE

1. HARVESTING / STORING 2. MUNGING / CLEANING 3. PREPARATION / PROCESSING 4. ANALYSIS 5. MODELING 6. VISUALIZATION / REPORTING / ACTING

6 STAGES

Page 29: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

DATA PIPELINE

Harvesting > Wrangling > Processing > Analysis > Modeling > Visualization & Action

6 STAGES

DATA ENGINEERING DATA SCIENCE AND ANALYSIS

Page 30: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

1. HARVESTINGData Yield

Data Sources (Data Lake, Data Warehouse

Data Volume (GB -> TB)

Data Delivery

DATA COLLECTION & INGESTION

Page 31: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

RAW DATA

Data can be sourced from a data lake or data warehouse; value-to-data ratio: low

Page 32: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

2. DATA WRANGLINGNull Values

Duplicates

Incomplete ValuesDATA PROCESSING AND PREP

Page 33: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

SCRUB YOUR DATA FOR ANALYSISCLEAN DATA

Option 1: Identity and discard records with null or wrong values

Option 2: Fill-in a placeholder common value in the dataset

Page 34: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

3. PROCESSING‣ Ranking

‣ Scoring

‣ Sorting

‣ Grouping

‣ Manipulating

ARRANGING & MANIPULATING DATA

Page 35: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

4. ANALYSISARRANGING THE DATA

Descriptive Analytics

Inferential Analytics

Advanced Statistical Analytics

STATISTICAL5. MODELING

Density

Linear Regression

Nearest Neighbor

…many more modeling techniques

+

Utilizing stochastic models to validate hypothesis and make predictions

Page 36: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

STATISTICALMODELING

Applying statistical models to validate hypothesis, simulate scenarios and make predictions

Page 37: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

6. VISUALIZATIONREPORTING & INTERPRETATION

What does it show?

What does it say?

Exploratory or Explanation

Page 38: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

OBJECTIVES OF DATA ANALYTICS

๏ Find patterns to test hypothesis

๏ Refine an existing hypothesis

๏ Provide actionable insights

๏ Make predictions

CLEARLY DEFINED

Page 39: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

TYPES OF ANALYTICS

• Descriptive - What happened?

• Diagnostic - What went wrong?

• Prescriptive - What to do?

• Predictive - Can this happen?

EACH TYPE ANSWERS DIFFERENT QUESTIONS

Page 40: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

ANSWERS THE QUESTION: WHAT HAPPENED?

Descripitive Analytics is based on historical data and current data

DESCRIPTIVE ANALYTICS

Page 41: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

ANSWERS THE QUESTION: WHY DID THIS HAPPEN? WHAT WENT WRONG?

Deduce and infer the success and failure of a particular activity, initiative, campaign or program

DIAGNOSTIC ANALYTICS

Page 42: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

ANSWERS THE QUESTIONS: WHAT DO I DO? WHAT ACTIONS SHOULD I TAKE?

Based on generated predictions, the analysis provides informed actions a decision maker can or should take.

PRESCRIPTIVE ANALYTICS

Page 43: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

ANSWERS THE QUESTION: COULD THIS HAPPEN?

Applying stochastic and mathematical models to predict outcomes

PREDICTIVE ANALYTICS

Page 44: Essential Data Science for Product Designers and Non-Scientists

ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS

LEARNING FOR THE DATA

MACHINE LEARNINGSUPERVISED AND UNSUPERVISED LEARNING

Page 45: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

IDENTIFYING PATTERNS IN THE DATASET

Input data is unlabeled; process is non-deterministic; use of inferential methods to find relationships, patterns and correlations;

UNSUPERVISED

Page 46: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

MAKING INFERENCES ON DATASETS

Data is labeled; deterministic process with an input and desired output; algorithms learn from labeled data

SUPERVISED

Page 47: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

COMMON MACHINE LEARNING MODELSALGORITHMS

Page 48: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

LINEAR REGRESSION

REGRESSION

LOGISTIC REGRESSION

Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Regressions are used to quantify the relationship between one variable and the other variables that are thought to explain it; regressions can also identify how close and well determined the relationship is.

Page 49: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

CLASSIFICATION

DECISION TREES

Decision tree builds classification in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed

K NEAREST NEIGHBOR

K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions)

Page 50: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

CLUSTERING

K-MEANSK-Means clustering intends to partition n objects into k clusters in which each object belongs to the cluster with the nearest mean.

Page 51: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

LOTS OF DIFFERENT MACHINE LEARNING ALGORITHMS

Page 52: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

• What datasets do you have now? What Sources? What parameters?

• Address 4Vs (Velocity, Variety, Volume, Veracity)

• Rate the data quality

• Identify the attributes, type and groups of data

• Do you need enrich your current dataset?

REMEMBER: APPRAISE YOUR DATALOOK AT YOUR PROPERTIES OF YOUR DATA

Page 53: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

EXPECTED OUTCOMES AND YOUR OBJECTIVE

• How is your product going to use the data?

• How does your objective align with results?

• Does the current dataset allow you to make inferences or predictions?

• Get help from domain experts to determine if the attributes and data is sufficient

DOES YOUR DATASET & EXPECTED RESULTS MATCH YOUR OBJECTIVE?

Page 54: Essential Data Science for Product Designers and Non-Scientists

ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS

APPLYING DATA SCIENCE TRADECRAFT TO BUILD

DATA PRODUCTS

Page 55: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

USING RESULTS TO IMPROVE PRODUCT EXPERIENCES

Use the findings and insights and apply it to enhancing your product or service

DATA PRODUCT DEVEXAMPLES

Sales Forecasts

Operational Predictions

Video Recommendations

Featured Content

Other stuff

Page 56: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

USE THE FINDINGS AND INSIGHTS AND APPLY IT TO ENHANCING YOUR PRODUCT OR SERVICE

DATA ENRICHING PRODUCTS

Page 57: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

PROVIDING ANSWERS TO THE INQUIRIES

Utilize predictive and prescriptive analytics to forecast and give recommendations to decision-makers. Make it clear what actions to take and why.

USE IN PRODUCTS

Page 58: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

OUTPUTS FORMED FOR HUMAN CONSUMPTION

Key characteristic of data products is data visualization elements like charts, graphs, scoreboards and tables to communicate the story. Utilize graphics and interactivity to tell your story.

DATA VISUALIZATION

Page 59: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

• Show Relationships

• Make Comparisons

• Show Distribution

• Present Composition

• Make Predictions / Forecasts USING DATA VISUALIZATIONS

Page 60: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

• Dashboards

• Data Filters

• Data Exploration Features

• Custom Inputs

• Graph Selection

DATA PRODUCT CUSTOMIZATIONS

Page 61: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

ECONOMICS OF WORKING WITH DATA

The cost of labor and resources to prepare, process and maintain data can be high. Consider using simpler models.

COST CONSIDERATIONS

Page 62: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

JCHRISA

THANK YOU!https://www.linkedin.com/in/jchrisa

Creative Technologist. Product Designer

Page 63: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

REFERENCES

http://insidebigdata.com/2014/06/05/data-munging-exploratory-data-analysis-feature-engineering/

http://www.colorado.edu/geography/gcraft/notes/error/error_f.html

Rhodes, Trey, and Kenneth Foote. "Error, Accuracy, and Precision." Error, Accuracy, and Precision. University of Colorado, n.d. Web. 05 Apr. 2017.

Fortmann, Scott. "Bias and Variance." Understanding the Bias-Variance Tradeoff. N.p., n.d. Web. 05 June 2012.http://scott.fortmann-roe.com/docs/BiasVariance.html

Gutierrez , Daniel. "Data Munging, Exploratory Data Analysis, and Feature Engineering." InsideBIGDATA. N.p., 20 June 2014. Web. 05 Apr. 2017.

Causey, Trey. "Trey Causey – Getting started in data science." Trey Causey – Getting started in data science. N.p., 7 June 2014. Web. 05 Apr. 2017.http://treycausey.com/getting_started.html

Page 64: Essential Data Science for Product Designers and Non-Scientists

ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS

REFERENCES CONTINUED

Sayed, Saed. "Data Mining Map." Data Mining Map. Saed Sayed, 2010. Web. 10 Apr. 2017.http://www.saedsayad.com/modeling.htm

http://www.imf.org/external/pubs/ft/fandd/2006/03/basics.htm

Ramcharan, Rodney. "Finance and Development." Finance and Development | F&D. Finance & Development, Mar. 2006. Web. 10 Apr. 2017.