# Looking into the Future: Using Google's Prediction API

• View
432

• Category

## Data & Analytics

Embed Size (px)

### Text of Looking into the Future: Using Google's Prediction API

• Looking into the FutureUsing Googles Prediction API

Justin Grammens Recursive Awesome & IoT Weekly

• What is Prediction?

Defined by Wikipedia as: A statement about an uncertain event.

Continues on to read It is often, but not always, based upon experience or knowledge.

In statistics, prediction is a part of Statistical Inference.

• Statistical Inference Statistical inference is the process of deducing

properties of an underlying distribution by analysis of data.

Two major paradigms used for statistical inference

Frequentist Inference

Bayesian Inference

• Frequentist Inference Data is repeatable random sample with a specific

probability

Parameters and probabilities remain constant during the test

Results are independent results from prior tests

Q: Will the sun rise tomorrow? Whats the probability of a sun dying based on all the suns in the universe

• Bayesian Inference Take into account prior results and subjective

beliefs

Update probabilities of occurrence based on new data

Tests are NOT run in isolation and affect one another

Q: Will the sun rise tomorrow? Depends on how many times we have seen it rise in the past

• Predictions by Machines

Could therefore define prediction as an informed guess or opinion.

Software systems have to be trained before they can be effective.

source: reading.pppst.com

http://reading.pppst.com/prediction.html

• What is Prediction API? Announced at Google I/O in 2011

Provides pattern-matching and machine learning capabilities.

Handles both numeric or text input

Handles both classification or regression output

Access from App Engine, client libs and command line

Able to retrain the model on the fly - Bayesian?

• What Are Some Usages?

• What Do You Need?

Google Account

Google Platform Console project

Google Predication API Activated

Google Cloud Storage API Activated

• Steps Involved Define what you are trying to accomplish

Find the training data and format to support your goal (hardest part)

Upload training data to Google Cloud Storage

Train the system against the data you provide

Send queries to your model

Upload additional data with new information gained.

• Hosted Model The Prediction API hosts a gallery of user-submitted

models

Owners can charge for the use of the model

Hosted models are versioned so they an be updated easily

Models are submitted in PMML format

XML-based language to define statistical & data models

Appears to currently be a waitlist

• How To Train 3 ways to create and train the correct type of model

CSV File - Lives on Google Cloud Storage

Training data embedded in request

Limited to the size of an HTTP Request < 2MB

Empty model created and trained with update calls

• CSV File Rules Maximum file size 2.5 GB

No header row. Yes, to the system its irrelevant

One example per line

The first column indicates to the system the type of model.

Ideally remove punctuation (other then apostrophes) from your data.

• CSV File Rules Text Strings

Double quotes around all text strings

Text matching is case-sensitive

Numeric Values

Integer and decimals are supported

Numbers: "1", "23", 999"

Strings: "6 12", colt 45"

• Structuring Data Example Value

The Answer

Features

No limit on number of feature

More features & examples the better

To train 16MB ~ 1 hour

• Whats The Answer?

• Regression ModelExample Data

Define your data to support numbers and strings

Query of Seattle, 288, sunny, might get back value of 62

Dont need to match any values in the dataset

Fill model with all columns then query with first column missing

• Classification ModelExample Data

Query of Lose weight now! you would get result of spam

Returns the category from the dataset

• Authorization You must use OAuth 2.0 to authorize requests

Can share your model with others

View: User can call Analyze, Get, List and Predict on the project and/or any model owned by the project.

Edit: User has all the permissions of Can view, but can also Delete, Insert, and Update any models owned by the project.

Is Owner: User has all the permissions of Can edit, but can also grant permissions to other users to access the project.

• Tips & Tricks The more examples & features the better results

However - Adding more features doesnt always give better predictions

is_comedy is_drama is_action is_horror

Y N N N

VS

genre

Comedy

• Tips & Tricks

Need to add a numeric aspect to the genre?

Add additional genre columns and weight it based on count

genre genre genre genre genre

Drama Drama Drama Comedy Comedy

• Tips & Tricks Always put something into each feature

Include all the features that you know about

For Regression:

Make sure will have the time to ensure the values are correct

Conversely, if you have exact numbers use them

Try to have at least a few hundred examples for each category

• Tips & Tricks

Can only compare against known relationships

Cant feed an untrained title and user to get rating

Solution is to break the title into genre, director, actors

Rating user_name movie_title9.5 Justin Star Wars2.2 Justin Disaster Movie5.0 Justin Billy Madison

• Lets Talk Data! Nice Ride

Based on the starting station, predict the ending station

New York Cab Rides

Given a starting GPS coordinate, predict where the cab ride will end

Sentiment Analysis

Based on the state of the union speech define the sentiment

• Based on the starting station, can we predict the ending station?

• Nice Ride Location Rides

https://www.niceridemn.org/data/

Offers a live XML stream to update along the way

https://www.niceridemn.org/data/

• Nice Ride Location RidesStarted

with this:

Next: Ended with this:

• Nice Ride Insert DataID &

Location

• Nice Ride Running Prediction

Status

• Lessons Learned I forgot to put the

values in quotes. Treated it as numerical regression.

Verify how its interpreting your data with get call.

Type

• Nice Ride Location Rides

Show Scripts, API & Results

• Can we predict the movement of NYC cabs?

• NYC Cab Ride Data

Data DictionaryData Website

http://www.nyc.gov/html/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdfhttp://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

• Sample Data

Contains pickup & drop off latitude and longitude

• Theres A Problem

Asking for 2 inputs and 2 outputs!

Not possible with Prediction API as it only supports one dependent variable. :(

Change of plan

• Lets predict the cost of a NYC cab ride instead!

• Prediction Demo Features are

distances (B)

Examples are prices (A)

Is this accurate?

Different fares based on areas of the city

• Ok, not really Let's use location based

data instead

• Prediction Demo

Latitude / Longitude are the features (B, C, D, E

Price Is The Example (A)

Examples

• NYC Cab Ride Location

Show Scripts, API & Results

• Sentiment Analysis of a Speech

• Speech Sentiment Always Check Your Data!

Website incorrectly claimed positive(4), negative(0) and neutral(2) sentiment.

Data had groups of sentiment values.

Source

http://help.sentiment140.com/for-students/

• Speech SentimentFeatureExample Value

Training Examples

• Sentiment Training

• Sentiment Example

Show Scripts, API & Results

Obama State of the Union Speech - 1/16

Donald Trump Speech Des Moines, IA - 1/24

https://medium.com/@WhiteHouse/president-obama-s-2016-state-of-the-union-address-7c06300f9726#.ardf6wqm6http://www.p2016.org/photos15/summit/trump012415spt.html

• Smart Spreadsheets

Install Smart Autofill Add-on

• Smart Spreadsheets

Prediction API used to fill in missing values

• Smart Spreadsheets

Select columns to use for data training

• Smart Spreadsheets

Example Values are populated

• Final Thoughts - Overfitting

Overfitting the model generally takes the form of making an overly complex model to explain idiosyncrasies in the data under study.

Therefore, a model that has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.

Exact query should not return EXACT examples

https://groups.google.com/forum/#!topic/prediction-api-discuss/n64eHnv5iug

• Thank YouJustin Grammens

justin@recursiveawesome.com http://recursiveawesome.com

Checkout my IoT Weekly Newsletter http://iotweeklynews.com

http://recursiveawesome.comhttp://iotweeklynews.com

Recommended

Business
Technology
Technology
Documents
Documents
Internet
Documents
Social Media
Technology
Business
##### Google's management style
Leadership & Management
Marketing