9
© 2013 Sqor, Inc. Sqor Using R For Social Media and Sports Data Athletes Success Data Noah Gift: CTO @ Sqor

Using R for Social Media and Sports Analytics

Embed Size (px)

Citation preview

Page 1: Using R for Social Media and Sports Analytics

© 2013 Sqor, Inc.

Sqor

Athletes

Traditional sponsorship, contracts/salary SqorFunding Increasing cash flow for athletes Success Athletes focus on what they do best

Using R For Social Media and Sports Data

Athletes Success Data

Noah Gift: CTO @ Sqor

Page 2: Using R for Social Media and Sports Analytics

© 2013 Sqor, Inc.

What is Sqor?

Athletes

Traditional sponsorship, contracts/salary SqorFunding Increasing cash flow for athletes Success Athletes focus on what they do best

•  Social Network hyper-focused on enhancing fan/athlete relationships. We only do Sports!: Now •  Marketplace for athletes to build and market their digital brand: Now •  Social Analytics and Prediction Engine as a Service: Q1 2015 •  Micro-endorsement platform: Q1 2015 •  Crowdfunding for athletes: Now •  Game platform: First Homegrown game featuring Brett Favre: Now •  Cross-Social Network Publishing Platform: Facebook, Twitter, Embeddable posts.: Now •  Website, Android App, and iOS App:

Page 3: Using R for Social Media and Sports Analytics

© 2013 Sqor, Inc.

Key Aspects of Data Pipeline

Athletes

Traditional sponsorship, contracts/salary SqorFunding Increasing cash flow for athletes Success Athletes focus on what they do best

•  Multiple languages involved: Python, R, Erlang, C#, SQL and Javascript. •  Multiple persistence options: SQL Server (RDS), Riak (No SQL), CSV Files, Mnesia (Distributed Soft Realtime

DB) •  RabbitMQ and Erlang handle messaging and job communication •  Easy to debug: daily and nightly scripts, intermediate CSV files, deep storage in K/V store and reports live in

RDS. •  R is used exclusively for machine learning and statistics (Although recommendation engine v1 was written in

Python. We are going to replace it with R/Erlang code though)

Page 4: Using R for Social Media and Sports Analytics

© 2013 Sqor, Inc.

Things They Don’t Tell You Building A Data Pipeline From Scratch (Our you should have paid attention to)

Athletes

Traditional sponsorship, contracts/salary SqorFunding Increasing cash flow for athletes Success Athletes focus on what they do best

•  Getting the data in the right format and making sure it is accurate is back breaking work. It truly is horrible. •  Keeping track of model prediction accuracy over time: both with new data and new models is really important •  Non-linear regression is non-trivial •  Automation and debuggability of every step is very important. Think Unix Tools •  Expensive, exotic solutions sometimes aren’t worth it at first…or maybe ever. Weird databases, etc. •  Making predictions involving real money with limited data is scary and really hard. If your not scared about this,

you should be.)

Page 5: Using R for Social Media and Sports Analytics

© 2013 Sqor, Inc.

Predicting Top Athletic Performers in Social Media

Athletes

Traditional sponsorship, contracts/salary SqorFunding Increasing cash flow for athletes Success Athletes focus on what they do best

•  Sqor finds influential athletes and collaborates with them using our prediction algorithms

Page 6: Using R for Social Media and Sports Analytics

© 2013 Sqor, Inc.

Our Prediction Algorithms Appear To Work

Athletes

Traditional sponsorship, contracts/salary SqorFunding Increasing cash flow for athletes Success Athletes focus on what they do best

•  Or we got really lucky….

Page 7: Using R for Social Media and Sports Analytics

© 2013 Sqor, Inc.

Clustering

Athletes

Traditional sponsorship, contracts/salary SqorFunding Increasing cash flow for athletes Success Athletes focus on what they do best

•  We use R clustering packages for classification, visualization of patterns and diagnostics for predictions

Page 8: Using R for Social Media and Sports Analytics

© 2013 Sqor, Inc.

Clustering

Athletes

Traditional sponsorship, contracts/salary SqorFunding Increasing cash flow for athletes Success Athletes focus on what they do best

•  We use kNN clustering for NBA and MLB Sports. Plan on expanding this further in the near future.

Page 9: Using R for Social Media and Sports Analytics

© 2013 Sqor, Inc.

Erlang/R Bridge

Athletes

Traditional sponsorship, contracts/salary SqorFunding Increasing cash flow for athletes Success Athletes focus on what they do best

•  Sqor is a heavy user of Erlang •  We like Erlang because it has unique concurrency abilities and high uptime (and also because I had a lot of

bosses who told me I couldn’t use).

•  ➜ ~ curl -v -X PUT -H 'content-type: application/json' http://127.0.0.1:8080/api/script/foo -d '{"script":"execute <- function (A) { A * 2 }", "docs":"this doubles stuff"}'

•  ➜ ~ curl -v http://127.0.0.1:8080/api/script/foo -X POST -H 'content-type: application/json' -d '[25]’ •  Returns: [50.0] •  We plan on open sourcing this in next 2 months: Run scripts, runs jobs, scales R