View
11
Download
1
Category
Preview:
Citation preview
DATA SCIENCEFOR PRODUCT DESIGNERS AND NON-SCIENTISTS
ESSENTIAL
ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS
WHAT IS DATA SCIENCE?
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
LOTS OF DEFINITIONS:▸ The art and science of utilizing data to produce actionable insights.
▸ The utilization of science and mathematics to extract knowledge and insights
▸ A branch of computer science that applies statistics on a dataset to make predictions and find patterns.
▸ KDD - Knowledge Discovery in Databases
WORKING WITH DATA TO OBTAIN INSIGHTS, MAKE PREDICTIONS AND PROVIDE ADVICE
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
LOTS OF BRANCHES & DISCIPLINES
▸ Data Analytics
▸ Big Data
▸ Data Engineering
▸ Database Technology
▸ Statistics
▸ Data Visualization
▸ Machine Learning
▸ Neural Networks
▸ Artificial Intelligence
DATA SCIENCE
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
LET’S FOCUS ON THESE TO ORIENT OURSELVES TO A NARROWER FOCUS IN DATA SCIENCE
▸ Data Analytics
▸ Machine Learning
▸ Data Engineering
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
BUSINESS INTELLIGENCE VS DATA SCIENCE
▸ Retrospective - Looking at historical data to predict the future
▸ Pre-canned or pre-defined questions to submit to system
▸ Data is largely siloed or warehoused
▸ Ask specific questions related to strategic business operations
BUSINESS INTELLIGENCE DATA SCIENCE
▸ Prospective - forecasting the future
▸ Discovery of questions to ask or development of questions in the form of hypothesis
▸ Data is distributed in data lakes or in warehouses; can be real-time streams or near-real time streams
▸ Can ask questions about strategy but can be used for any domains
ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS
WHAT PRODUCT DESIGNERS NEED TO KNOW?
KNOW THE CONCEPTS AND CHARACTERISTICS OF DATA IS A GOOD START
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
GOALS FOR PRODUCT DESIGNERS
▸ Provide ability for users to make better decisions with new knowledge and insights
▸ To optimize and unlock the value of service and operational efficiencies
▸ Supply insights to make new, better products and services
▸ Converting data into stories that engage and empower users
▸ Employ summary reports or visualization to help decision makers
ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS
UNDERSTAND THE PROBLEMWHAT IS THE PURPOSE? WHAT ARE THE QUESTIONS?
ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS
REVIEW THE DATACHECK THE ORIGIN, TYPE, PROPERTIES & CLASSES OF YOUR DATA
ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS
USING THE RESULTSHOW ARE THE RESULTS REFLECTED IN YOUR PRODUCT?
ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS
DATA IS THE FUELTHE INPUTS FOR ANALYSIS INSIGHTS AND PREDICTION
ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS
LOTS OF DATA POINTS
BIG DATATO EXTRACT SAMPLES
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
DATA SOURCES • IT Log Files
• Sensor Data (IOT)
• Website Clickstreams
• Social Media Feeds
• Machine Data
• Location Tracking Data
• Financial Transactions
• Commercial Feeds (3rd Party Vendor)
• Public Databases
• Academic Sources
WHERE TO OBTAIN DATA FROM?
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
PROPERTIES OF DATA : 4V
▸ Volume: How much data do you have?
▸ Velocity: How fast is the data coming in? How often?
▸ Variety: How heterogeneous or homogenous is the data?
▸ Veracity: What is the quality of the data?
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
VOLUME Volume is usually data is measured in gigabytes (GB) and terabytes (TB), sometimes hundreds of megabytes (MB) can considered a healthy yield
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
VELOCITY The rate at which data is delivered into the system. This can be streamed in real-time, near real-time or delivered in batches.
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
VARIETY The types and kinds of data you expect in your system. Is it homogenous or heterogenous? We cover this in the next few slides.
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
VERACITY Does the data accurately reflect what you are trying to accomplish with the data? Important to rate the source of the data. Do a cursory review to see if the data is what you would expect.
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
LEVELS OF MEASURES (TYPES OF DATA TOO)
▸ Nominal - Distinct Categories (e.g. Gender)
▸ Ordinal - Ranking or Order (e.g. Service Ratings)
▸ Interval - difference between two values is meaningful (e.g. Temperature)
▸ Ratio - like interval, but has a clear definition of 0 (e.g. Height)
QUALITATIVE OR QUANTITIVE
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
DATA TYPES▸ Text Data
▸ Image Data
▸ Timestamps
▸ Video Data
▸ Audio
▸ Binary Data
▸ Counters
KINDS OF DATA YOU CAN EXPECT
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
DATA STRUCTURE FORMS
• Structured - Typically in a RDBMS, very organized and labeled
• Unstructured - unfiltrered, unlabeled data like images, video, raw data from IOT
• Semi-Structured - mixed between labeled and unlabeled data
STRUCTURED, UNSTRUCTURED & SEMI-STRUCTURED
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
RECORDS AND FIELDSSEE DATA AS A MATRIX
Variables Attributes Features
Observations
Samples
Tuples
DATA COLUMNS
ROWS
When encountering these words in data analysis and machine learning, think in terms of a spreadsheet as simply columns (fields) and records (rows) like in a relational databases.
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
DATA QUALITY๏ Complete
๏ Accurate
๏ Relevant
๏ Fresh or Outdate
๏ Distinct
๏ Accessible
WHAT IS THE LEVEL OF DATA QUALITY? IS IT…
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
OFF THE MARK
๏ Accuracy is an issue pertaining to the quality of data and the number of errors contained in a dataset Precision
๏ Precision refers to the level of measurement and exactness of description in a dataset; It is important to realize, however, that precise data--no matter how carefully measured--may be inaccurate.
SEEK BOTH ACCURACY AND PRECISION
This also is related to variance and bias in data…
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
ERRORS: VARIANCE
๏ The variance is error from sensitivity to small fluctuations in the training set. High variance can cause overfitting: modeling the random noise in the training data, rather than the intended outputs
๏ Add more data to your dataset to mitigate high variance
MEASUREMENT SENSITIVITY
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
ERRORS: BIAS
๏ Bias measures how far off in general these models' predictions are from the correct value.
๏ Review your data; may require you to revise your dataset
ERRONEOUS ASSUMPTIONS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
DATA PIPELINE
1. HARVESTING / STORING 2. MUNGING / CLEANING 3. PREPARATION / PROCESSING 4. ANALYSIS 5. MODELING 6. VISUALIZATION / REPORTING / ACTING
6 STAGES
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
DATA PIPELINE
Harvesting > Wrangling > Processing > Analysis > Modeling > Visualization & Action
6 STAGES
DATA ENGINEERING DATA SCIENCE AND ANALYSIS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
1. HARVESTINGData Yield
Data Sources (Data Lake, Data Warehouse
Data Volume (GB -> TB)
Data Delivery
DATA COLLECTION & INGESTION
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
RAW DATA
Data can be sourced from a data lake or data warehouse; value-to-data ratio: low
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
2. DATA WRANGLINGNull Values
Duplicates
Incomplete ValuesDATA PROCESSING AND PREP
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
SCRUB YOUR DATA FOR ANALYSISCLEAN DATA
Option 1: Identity and discard records with null or wrong values
Option 2: Fill-in a placeholder common value in the dataset
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
3. PROCESSING‣ Ranking
‣ Scoring
‣ Sorting
‣ Grouping
‣ Manipulating
ARRANGING & MANIPULATING DATA
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
4. ANALYSISARRANGING THE DATA
Descriptive Analytics
Inferential Analytics
Advanced Statistical Analytics
STATISTICAL5. MODELING
Density
Linear Regression
Nearest Neighbor
…many more modeling techniques
+
Utilizing stochastic models to validate hypothesis and make predictions
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
STATISTICALMODELING
Applying statistical models to validate hypothesis, simulate scenarios and make predictions
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
6. VISUALIZATIONREPORTING & INTERPRETATION
What does it show?
What does it say?
Exploratory or Explanation
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
OBJECTIVES OF DATA ANALYTICS
๏ Find patterns to test hypothesis
๏ Refine an existing hypothesis
๏ Provide actionable insights
๏ Make predictions
CLEARLY DEFINED
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
TYPES OF ANALYTICS
• Descriptive - What happened?
• Diagnostic - What went wrong?
• Prescriptive - What to do?
• Predictive - Can this happen?
EACH TYPE ANSWERS DIFFERENT QUESTIONS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
ANSWERS THE QUESTION: WHAT HAPPENED?
Descripitive Analytics is based on historical data and current data
DESCRIPTIVE ANALYTICS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
ANSWERS THE QUESTION: WHY DID THIS HAPPEN? WHAT WENT WRONG?
Deduce and infer the success and failure of a particular activity, initiative, campaign or program
DIAGNOSTIC ANALYTICS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
ANSWERS THE QUESTIONS: WHAT DO I DO? WHAT ACTIONS SHOULD I TAKE?
Based on generated predictions, the analysis provides informed actions a decision maker can or should take.
PRESCRIPTIVE ANALYTICS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
ANSWERS THE QUESTION: COULD THIS HAPPEN?
Applying stochastic and mathematical models to predict outcomes
PREDICTIVE ANALYTICS
ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS
LEARNING FOR THE DATA
MACHINE LEARNINGSUPERVISED AND UNSUPERVISED LEARNING
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
IDENTIFYING PATTERNS IN THE DATASET
Input data is unlabeled; process is non-deterministic; use of inferential methods to find relationships, patterns and correlations;
UNSUPERVISED
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
MAKING INFERENCES ON DATASETS
Data is labeled; deterministic process with an input and desired output; algorithms learn from labeled data
SUPERVISED
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
COMMON MACHINE LEARNING MODELSALGORITHMS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
LINEAR REGRESSION
REGRESSION
LOGISTIC REGRESSION
Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
Regressions are used to quantify the relationship between one variable and the other variables that are thought to explain it; regressions can also identify how close and well determined the relationship is.
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
CLASSIFICATION
DECISION TREES
Decision tree builds classification in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed
K NEAREST NEIGHBOR
K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions)
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
CLUSTERING
K-MEANSK-Means clustering intends to partition n objects into k clusters in which each object belongs to the cluster with the nearest mean.
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
LOTS OF DIFFERENT MACHINE LEARNING ALGORITHMS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
• What datasets do you have now? What Sources? What parameters?
• Address 4Vs (Velocity, Variety, Volume, Veracity)
• Rate the data quality
• Identify the attributes, type and groups of data
• Do you need enrich your current dataset?
REMEMBER: APPRAISE YOUR DATALOOK AT YOUR PROPERTIES OF YOUR DATA
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
EXPECTED OUTCOMES AND YOUR OBJECTIVE
• How is your product going to use the data?
• How does your objective align with results?
• Does the current dataset allow you to make inferences or predictions?
• Get help from domain experts to determine if the attributes and data is sufficient
DOES YOUR DATASET & EXPECTED RESULTS MATCH YOUR OBJECTIVE?
ESSENTIAL DATA SCIENCE FOR PRODUCT DESIGNERS
APPLYING DATA SCIENCE TRADECRAFT TO BUILD
DATA PRODUCTS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
USING RESULTS TO IMPROVE PRODUCT EXPERIENCES
Use the findings and insights and apply it to enhancing your product or service
DATA PRODUCT DEVEXAMPLES
Sales Forecasts
Operational Predictions
Video Recommendations
Featured Content
Other stuff
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
USE THE FINDINGS AND INSIGHTS AND APPLY IT TO ENHANCING YOUR PRODUCT OR SERVICE
DATA ENRICHING PRODUCTS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
PROVIDING ANSWERS TO THE INQUIRIES
Utilize predictive and prescriptive analytics to forecast and give recommendations to decision-makers. Make it clear what actions to take and why.
USE IN PRODUCTS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
OUTPUTS FORMED FOR HUMAN CONSUMPTION
Key characteristic of data products is data visualization elements like charts, graphs, scoreboards and tables to communicate the story. Utilize graphics and interactivity to tell your story.
DATA VISUALIZATION
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
• Show Relationships
• Make Comparisons
• Show Distribution
• Present Composition
• Make Predictions / Forecasts USING DATA VISUALIZATIONS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
• Dashboards
• Data Filters
• Data Exploration Features
• Custom Inputs
• Graph Selection
DATA PRODUCT CUSTOMIZATIONS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
ECONOMICS OF WORKING WITH DATA
The cost of labor and resources to prepare, process and maintain data can be high. Consider using simpler models.
COST CONSIDERATIONS
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
JCHRISA
THANK YOU!https://www.linkedin.com/in/jchrisa
Creative Technologist. Product Designer
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
REFERENCES
http://insidebigdata.com/2014/06/05/data-munging-exploratory-data-analysis-feature-engineering/
http://www.colorado.edu/geography/gcraft/notes/error/error_f.html
Rhodes, Trey, and Kenneth Foote. "Error, Accuracy, and Precision." Error, Accuracy, and Precision. University of Colorado, n.d. Web. 05 Apr. 2017.
Fortmann, Scott. "Bias and Variance." Understanding the Bias-Variance Tradeoff. N.p., n.d. Web. 05 June 2012.http://scott.fortmann-roe.com/docs/BiasVariance.html
Gutierrez , Daniel. "Data Munging, Exploratory Data Analysis, and Feature Engineering." InsideBIGDATA. N.p., 20 June 2014. Web. 05 Apr. 2017.
Causey, Trey. "Trey Causey – Getting started in data science." Trey Causey – Getting started in data science. N.p., 7 June 2014. Web. 05 Apr. 2017.http://treycausey.com/getting_started.html
ESSENTIALS OF DATA SCIENCE FOR PRODUCT DESIGNERS
REFERENCES CONTINUED
Sayed, Saed. "Data Mining Map." Data Mining Map. Saed Sayed, 2010. Web. 10 Apr. 2017.http://www.saedsayad.com/modeling.htm
http://www.imf.org/external/pubs/ft/fandd/2006/03/basics.htm
Ramcharan, Rodney. "Finance and Development." Finance and Development | F&D. Finance & Development, Mar. 2006. Web. 10 Apr. 2017.
Recommended