51
Looking into the Future Using Google’s Prediction API Justin Grammens Recursive Awesome & IoT Weekly

Looking into the Future: Using Google's Prediction API

Embed Size (px)

Citation preview

Page 1: Looking into the Future: Using Google's Prediction API

Looking into the FutureUsing Google’s Prediction API

Justin Grammens Recursive Awesome & IoT Weekly

Page 2: Looking into the Future: Using Google's Prediction API

What is Prediction?

• Defined by Wikipedia as: “A statement about an uncertain event.”

• Continues on to read… “It is often, but not always, based upon experience or knowledge.”

• In statistics, prediction is a part of Statistical Inference.

Page 3: Looking into the Future: Using Google's Prediction API

Statistical Inference• Statistical inference is the process of deducing

properties of an underlying distribution by analysis of data.

• Two major paradigms used for statistical inference

• Frequentist Inference

• Bayesian Inference

Page 4: Looking into the Future: Using Google's Prediction API

Frequentist Inference• Data is repeatable random sample with a specific

probability

• Parameters and probabilities remain constant during the test

• Results are independent results from prior tests

• Q: Will the sun rise tomorrow? What’s the probability of a sun dying based on all the suns in the universe

Page 5: Looking into the Future: Using Google's Prediction API

Bayesian Inference• Take into account prior results and subjective

beliefs

• Update probabilities of occurrence based on new data

• Tests are NOT run in isolation and affect one another

• Q: Will the sun rise tomorrow? Depends on how many times we have seen it rise in the past

Page 6: Looking into the Future: Using Google's Prediction API

Predictions by Machines

• Could therefore define prediction as an “informed guess or opinion.”

• Software systems have to be trained before they can be effective.

source: reading.pppst.com

Page 7: Looking into the Future: Using Google's Prediction API

What is Prediction API?• Announced at Google I/O in 2011

• Provides pattern-matching and machine learning capabilities.

• Handles both numeric or text input

• Handles both classification or regression output

• Access from App Engine, client libs and command line

• Able to retrain the model on the fly - Bayesian?

Page 8: Looking into the Future: Using Google's Prediction API

What Are Some Usages?

Page 9: Looking into the Future: Using Google's Prediction API

What Do You Need?

• Google Account

• Google Platform Console project

• Google Predication API Activated

• Google Cloud Storage API Activated

Page 10: Looking into the Future: Using Google's Prediction API

Steps Involved• Define what you are trying to accomplish

• Find the training data and format to support your goal (hardest part)

• Upload training data to Google Cloud Storage

• Train the system against the data you provide

• Send queries to your model

• Upload additional data with new information gained.

Page 11: Looking into the Future: Using Google's Prediction API

Hosted Model• The Prediction API hosts a gallery of user-submitted

models

• Owners can charge for the use of the model

• Hosted models are versioned so they an be updated easily

• Models are submitted in PMML format

• XML-based language to define statistical & data models

• Appears to currently be a waitlist

Page 12: Looking into the Future: Using Google's Prediction API

How To Train• 3 ways to create and train the correct type of model

• CSV File - Lives on Google Cloud Storage

• Training data embedded in request

• Limited to the size of an HTTP Request < 2MB

• Empty model created and trained with update calls

Page 13: Looking into the Future: Using Google's Prediction API

CSV File Rules• Maximum file size 2.5 GB

• No header row. Yes, to the system it’s irrelevant

• One example per line

• The first column indicates to the system the type of model.

• Ideally remove punctuation (other then apostrophes) from your data.

Page 14: Looking into the Future: Using Google's Prediction API

CSV File Rules• Text Strings

• Double quotes around all text strings

• Text matching is case-sensitive

• Numeric Values

• Integer and decimals are supported

• Numbers: "1", "23", “999"

• Strings: "6 12", “colt 45"

Page 15: Looking into the Future: Using Google's Prediction API

Structuring Data• Example Value

• “The Answer”

• Features

• No limit on number of feature

• More features & examples the better

• To train 16MB ~ 1 hour

Page 16: Looking into the Future: Using Google's Prediction API

What’s The Answer?

Page 17: Looking into the Future: Using Google's Prediction API

Regression ModelExample Data

• Define your data to support numbers and strings

• Query of “Seattle, 288, sunny”, might get back value of 62

• Don’t need to match any values in the dataset

• Fill model with all columns then query with first column missing

Page 18: Looking into the Future: Using Google's Prediction API

Classification ModelExample Data

• Query of “Lose weight now!” you would get result of “spam”

• Returns the category from the dataset

Page 19: Looking into the Future: Using Google's Prediction API

Authorization• You must use OAuth 2.0 to authorize requests

• Can share your model with others

• View: User can call Analyze, Get, List and Predict on the project and/or any model owned by the project.

• Edit: User has all the permissions of Can view, but can also Delete, Insert, and Update any models owned by the project.

• Is Owner: User has all the permissions of Can edit, but can also grant permissions to other users to access the project.

Page 20: Looking into the Future: Using Google's Prediction API

Tips & Tricks• The more examples & features the better results

• However - Adding more features doesn’t always give better predictions

is_comedy is_drama is_action is_horror

Y N N N

VS

genre

Comedy

Page 21: Looking into the Future: Using Google's Prediction API

Tips & Tricks

• Need to add a numeric aspect to the genre?

• Add additional genre columns and weight it based on count

genre genre genre genre genre

Drama Drama Drama Comedy Comedy

Page 22: Looking into the Future: Using Google's Prediction API

Tips & Tricks• Always put something into each feature

• Include all the features that you know about

• For Regression:

• Make sure will have the time to ensure the values are correct

• Conversely, if you have exact numbers use them

• Try to have at least a few hundred examples for each category

Page 23: Looking into the Future: Using Google's Prediction API

Tips & Tricks

• Can only compare against known relationships

• Can’t feed an untrained title and user to get rating

• Solution is to break the title into genre, director, actors

Rating user_name movie_title9.5 Justin Star Wars2.2 Justin Disaster Movie5.0 Justin Billy Madison

Page 24: Looking into the Future: Using Google's Prediction API

Let’s Talk Data!• Nice Ride

• Based on the starting station, predict the ending station

• New York Cab Rides

• Given a starting GPS coordinate, predict where the cab ride will end

• Sentiment Analysis

• Based on the state of the union speech define the sentiment

Page 25: Looking into the Future: Using Google's Prediction API

Based on the starting station, can we predict the ending station?

Page 26: Looking into the Future: Using Google's Prediction API

Nice Ride Location Rides

• https://www.niceridemn.org/data/

• Offers a live XML stream to update along the way

Page 27: Looking into the Future: Using Google's Prediction API

Nice Ride Location RidesStarted

with this:

Next: Ended with this:

Page 28: Looking into the Future: Using Google's Prediction API

Nice Ride Insert DataID &

Location

Page 29: Looking into the Future: Using Google's Prediction API

Nice Ride Running Prediction

Status

Page 30: Looking into the Future: Using Google's Prediction API

Lessons Learned• I forgot to put the

values in quotes. Treated it as numerical regression.

• Verify how it’s interpreting your data with “get” call.

Type

Page 31: Looking into the Future: Using Google's Prediction API

Nice Ride Location Rides

Show Scripts, API & Results

Page 32: Looking into the Future: Using Google's Prediction API

Can we predict the movement of NYC cabs?

Page 34: Looking into the Future: Using Google's Prediction API

Sample Data

Contains pickup & drop off latitude and longitude

Page 35: Looking into the Future: Using Google's Prediction API

There’s A Problem

• Asking for 2 inputs and 2 outputs!

• Not possible with Prediction API as it only supports one dependent variable. :(

• Change of plan…

Page 36: Looking into the Future: Using Google's Prediction API

Let’s predict the cost of a NYC cab ride instead!

Page 37: Looking into the Future: Using Google's Prediction API

Prediction Demo• Features are

distances (B)

• Examples are prices (A)

• Is this accurate?

• Different fares based on areas of the city

Page 38: Looking into the Future: Using Google's Prediction API

Ok, not really… Let's use location based

data instead

Page 39: Looking into the Future: Using Google's Prediction API

Prediction Demo

• Latitude / Longitude are the features (B, C, D, E

• Price Is The Example (A)

• Examples

Page 40: Looking into the Future: Using Google's Prediction API

NYC Cab Ride Location

Show Scripts, API & Results

Page 41: Looking into the Future: Using Google's Prediction API

Sentiment Analysis of a Speech

Page 42: Looking into the Future: Using Google's Prediction API

Speech Sentiment• Always Check Your Data!

• Website incorrectly claimed positive(4), negative(0) and neutral(2) sentiment.

• Data had groups of sentiment values.

• Source

Page 43: Looking into the Future: Using Google's Prediction API

Speech SentimentFeatureExample Value

Training Examples

Page 44: Looking into the Future: Using Google's Prediction API

Sentiment Training

Page 45: Looking into the Future: Using Google's Prediction API

Sentiment Example

Show Scripts, API & Results

Obama State of the Union Speech - 1/16

Donald Trump Speech Des Moines, IA - 1/24

Page 46: Looking into the Future: Using Google's Prediction API

Smart Spreadsheets

Install Smart Autofill Add-on

Page 47: Looking into the Future: Using Google's Prediction API

Smart Spreadsheets

Prediction API used to fill in missing values

Page 48: Looking into the Future: Using Google's Prediction API

Smart Spreadsheets

Select columns to use for data training

Page 49: Looking into the Future: Using Google's Prediction API

Smart Spreadsheets

“Example Values” are populated

Page 50: Looking into the Future: Using Google's Prediction API

Final Thoughts - Overfitting

• Overfitting the model generally takes the form of making an overly complex model to explain idiosyncrasies in the data under study.

• Therefore, a model that has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.

• Exact query should not return EXACT examples

Page 51: Looking into the Future: Using Google's Prediction API

Thank YouJustin Grammens

[email protected] http://recursiveawesome.com

Checkout my IoT Weekly Newsletter http://iotweeklynews.com