24
Introduction to Machine Learning & Data Analytics

Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Introduction to Machine Learning & Data Analytics

Page 2: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Agenda – 2:00 pm – 2:45 pm

2:00 – 2:05Introductions and Session Overview

2:05 – 2:10Machine Learning & Predictive Analytics Background

2:10 – 2:30 Sample ML/PA Study & Findings

2:30 – 2:45Expert Panel Q & A

Page 3: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Founded in 2004 as a public sector IT consulting firm, Infiniti has evolved into a public sector cloud services and consulting organization with a reputation for delivering results on time and on budget.

Infiniti - Who We Are…

Harnessing a deep commitment to state & local government, education, and healthcare; Infiniti aims to improve the lives of students through innovation and technology.

Page 4: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Infiniti - Where We Work

Cloud Education Gov’t Agency Healthcare IV&V MSP

Page 5: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Machine Learning & Predictive Analytics

Deploying analytical IT tools is

relatively easy.

Understanding how they might

be used is much less clear.

Page 6: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Machine Learning & Predictive Analytics

> Typically start with sensing problems or potential opportunities, which may initially just be somebody’s hunch.

> Often move on to develop theories about the existence of a particular outcome or effect, generate hypotheses, identify relevant data, and conduct experiments.

> They are opportunities for discovery.

Focus more on the “I” and less on the “T” in IT

More like scientific research than traditional IT initiatives. Leads to specific targeted actions.

Page 7: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Our Predictive Analytics Process

The cycle of analyzing, transforming and learning

can be repeated many times

Page 8: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Popular Machine Learning Use Cases

Fraud / Anomaly Detection

Targeted Citizen Outreach

Business / Operational efficiency

Educational Outcome Predictions

Content Personalization

Document Classification

Page 9: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

John Gray

Page 10: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Sample ML/PA Study & Findings

Problem Statement & Project Objective

Tasks

Tools and Environments

Deliverables

Roles

Schedule/Duration

Page 11: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Problem Statement & Project Objective

The California public sector client has surveys from millions of people who

apply online. A small percentage give negative feedback. The feedback is

entered as free form text. Client wants to analyze this text to identify specific

areas of the application process that need to be improved.

The objective of this project is to perform text processing, analysis, and

clustering to understand survey comments from dissatisfied users and

determine the parts of the application process that might need improvement.

(This is a “starter” ML/PA project – client expects us to help with more complex and higher benefit projects in the future)

Page 12: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Tasks - Typical

1. Define the problem. Work with customers to get a good understanding of the

specific questions they want to get answered

2. Analyze existing customer data. If not sufficient, work with customer to collect

additional / relevant data

3. Perform ETL (Extract, Transform & Load) and complete data integrity

checks. Make sure there are no issues with data (missing data, statistical

anomalies, etc.)

4. Make predictions and test outcomes. Model development - feature

engineering and predictive modeling

5. Test predictions for accuracy and validity. Improve / refine until results are

satisfactory

6. Deploy in production

7. Train / transition to customer’s team (or continue to support if required)

8. Discover other potential opportunities. Provide suggestions on other

questions that can be asked

Page 13: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Tasks – This Project

• Perform Sentiment Analysis - Provides insight into positive (or) negative emotions communicated in the textual data

• Process Word Cloud - Visual representation of key words communicated.

Sizes indicate relative importance/frequency.

• Perform Clustering - An unguided (Unsupervised Learning) machine learning technique that reveals underlying themes in text source

• Temporal Analysis of Negative Sentiment - We look at changes in negative sentiment over time – this might correlate to some client event

that occurred or in the world in general.

Page 14: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Sentiment Analysis – Steps and Results

• Evaluated a couple of modelso Logistic Regression, Naïve Bayes

• Logistic regression performed better

• Classification scoreso Logistic regression

• Accuracy – 93.7%

• Precision – 95.6%, Recall – 97.4%, Fscore – 96.5%

o Naïve Bayes

• Accuracy – 90.9%

• Precision – 96.7%, Recall – 93%, Fscore – 94.8%

Sentiment correlates very well with the user provided experience rating

Page 15: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Sentiment Analysis - Results

Words identified align well with sentiment

Page 16: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Generate Word Cloud - Tasks

• Generate key words that dominate negative comments

• Survey comments transformed using below text pre-processing steps• Stemming

• Removing most common words (I, we, is etc.)

• Spellcheck

Page 17: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Word Cloud - Results

Page 18: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Clustering - Tasks

• Used K-Means Clustering model

• The Process• Text pre-processing

• Convert text to numbers: Term Frequency – Inverse Document Frequency Transformation

• Run K-Means for assigned number of clusters

• Generate top-n key words that most represent each cluster

• Analyze output to identify key insights

• Tune, Iterate: Algorithm parameters, number of clusters, etc.

Page 19: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Clustering - Results

Cluster Key Words Theme

0 college, times, just, apply, confusing, did, student, need, website, difficult No clear theme

1 time, kept, consuming, time consuming, waste, logging, waste time, kept logging , times, page

Potential Website Issues

2 process, application process, long, times, college, student, just, class, students, online

No clear theme

3 long, takes, way long, took, way, unnecessary, process, complicated, tedious, personal

Time Consuming

4 school, personal, sexual, high, high school, orientation, sexual orientation, personal information, personal questions, college

PersonalInformation

Clustering model identified some generic trends on potential sources of dissatisfaction

Page 20: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Temporal Analysis

2014 had a spike in number of negative comments

Page 21: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Tools/ Environment

Environment can be on-premise, hybrid cloud, or cloud.NetApp storage provide excellent performance for this type of application

Page 22: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Tools and Environment – This Project

Tools/Environment:

• Secure AWS Environment with following open source tools:

• Python / Natural language toolkit (NLTK) package

• Open source machine learning tools: Python / Scikit-learn package

Page 23: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Roles, Schedule, and Next Steps

Roles on this project:

• Two Data Scientists (Harsha & Ananth)

• Project Manager (part time)

• AWS Solution Architect (to build environment)

Schedule:

• Less than three elapsed months from concept to completion

• Approximately three weeks of actual work

Next Steps:

• This client has at least half a dozen other ML projects

• Fraud, where to apply expert guidance, …

Page 24: Introduction to Machine Learning & Data Analytics...Machine Learning & Predictive Analytics > Typically start with sensing problems or potential opportunities, which may initially

Thank You

Panel Discussion