Foursquare -CEPSR Presentation

Embed Size (px)

Citation preview

  • 8/7/2019 Foursquare -CEPSR Presentation

    1/18

    Foursquare

    Adventures in Machine

    Learning, Statistics, and BigData

  • 8/7/2019 Foursquare -CEPSR Presentation

    2/18

    What is Foursquare?

    Location based startup, application

    that helps you to explore your city

    Visit places, check-in, earn rewards,stay connected with your friends

    Game elements: single-player, multi-player

  • 8/7/2019 Foursquare -CEPSR Presentation

    3/18

    What is Foursquare? (cont.)

    4M+ users, 10M+ venues, 200M+

    check-ins

    Large reach (most major countries,North Pole, Space)

    Native app for almost everysmartphone, also available on SMS,web, mobile-web

  • 8/7/2019 Foursquare -CEPSR Presentation

    4/18

    Data Model

    Users Venues

    / -Tips To dos

    -Check ins

    Shouts

  • 8/7/2019 Foursquare -CEPSR Presentation

    5/18

    Big Data

    Some problems no longer solvable by

    simple/nave algorithms, simplecrowd-sourcing

    Interesting problems can now besolved using statistical methods:prediction, classification,optimization

  • 8/7/2019 Foursquare -CEPSR Presentation

    6/18

    Example Problems

    Prediction: Recommending places to

    people, people to places, people topeople, places from places, tips,events, checking-in, interestingness

    Optimization: Search, ranking, userexperience

    Classification: Categorizing venues,removing junk, spam, duplicates

    The list goes on

  • 8/7/2019 Foursquare -CEPSR Presentation

    7/18

    Zooming in

    Recommending Places toPeople

    What do we have available? Check-ins (user history, venue

    history)

    Venue meta-data, User meta-data

    Friend graph What do we want to do?

    Social, fun (interesting >? optimal)

    Be smart, provide serendipity, hit the

    tail

  • 8/7/2019 Foursquare -CEPSR Presentation

    8/18

    Initial Thoughts

    Want a hybrid model, many features

    Need to be scalable, fast (web-scale,offline computations are OK as longas they scale linearly)

    Start with dumb, get smarter where

    possible. Iterate. Something isbetter than nothing. Data is key.

  • 8/7/2019 Foursquare -CEPSR Presentation

    9/18

    Start with Simple

    Popularity: user independent, works for

    cold-start, can be extended: Decay popularity: recently popular, new,

    long-term

    Break down by time of day, day of week

    Unique users vs. hits per user (must seevs. hidden gem vs. generally popular)

    Bubble up interesting things: specials,todos, similar tips

  • 8/7/2019 Foursquare -CEPSR Presentation

    10/18

    :// . / / .http bitsybot com foursquare trends html

    .Breakfast vs Brunch.Unstructured vs Commuting

  • 8/7/2019 Foursquare -CEPSR Presentation

    11/18

    More Complex Add some social elements. Where do

    your friends go? Can we rate yourfriends?

    Good for users with small check-inhistory, large # of friends

    Can we determine friend quality fromcheck-ins?

    Even if weak mathematically, social

    can triumph: Jane, John, and 17other friends went here

  • 8/7/2019 Foursquare -CEPSR Presentation

    12/18

    Hard: Check-in History Can we accurately predict where you want to go based

    on where you went? Seems good, but how to do it?How to scale it?

    Lots of research in this area lately, mostly because ofNetflix and Amazon before them

    Collaborative filtering (venue-to-venue, user-to-user)

    Factorization, dimensionality reduction

    Clustering

    SVM

    Linear models

    Context Filtering/Search

    The branching factor for choosing a method increasesdramatically at this stage. Although it provides the

    most value, it is the most difficult to do right.

  • 8/7/2019 Foursquare -CEPSR Presentation

    13/18

    Choosing a Venue SimilarityMetric

    Correlation

    Cosine similarity

    How do adjust for scale? How much?

    How to remove neighborhood effect?

    A B

    ||A *|| ||B||

    ( , )cov A B

    ( )td A * ( )td b

  • 8/7/2019 Foursquare -CEPSR Presentation

    14/18

    kNN Collaborative Filtering

    val result =

    for { historyVenue 0)

    val modscore = vPair.r /

    (1 + math.exp(vPair.distance / -1000.0 + 1.75))

    } yield PairScore(venue, modscore, vPair.numCheckins)

    val kNN = result.sortBy(_.modscore).reverse.take(K)

    kNN.map(n => n.modScore * math.log(n.numCheckins + 1)).sum /

    WTF?

  • 8/7/2019 Foursquare -CEPSR Presentation

    15/18

    Similarity Data: Correlation?Venue

    Ve

    nu

    e

    ,

    (

    ,

    Tooclose

    dens

    e lazy)

    neighborhood

    ,

    Sweetspot

    sparse

    Similarity Matrix( )symmetric?

    No Data

    Bestsource

    ofdat

    a

    !Interesting

    Lazy?

  • 8/7/2019 Foursquare -CEPSR Presentation

    16/18

    ( ), (. )Relatively close 300m high correlation 26 ( . ), (. )Relatively far 7 6km low correlation 04

    Whats the Difference?

  • 8/7/2019 Foursquare -CEPSR Presentation

    17/18

    Were Hiring!

    Looking for developers in the field of

    ML and/or Statistics, also hiringacross the board

    We use cool tech: Scala, Lift,

    MongoDB Small company (35 people), flexible

    work environment, lots of big

    projects to work on

  • 8/7/2019 Foursquare -CEPSR Presentation

    18/18

    Seriously, Come Work for Us

    Lots of ex-finance employees (almost

    half our engineers!), lots of ex-*softemployes (more than half ourengineers!) note the ex-

    Fast growing company, lots ofinnovation

    Many very smart people with

    common goals