31
Machine Learning to Grow the World's Knowledge Xavier Amatriain (@xamat) 11/10/2015

H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Embed Size (px)

Citation preview

Page 1: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Machine Learning to Grow the World's Knowledge

Xavier Amatriain (@xamat)

11/10/2015

Page 2: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Our Mission

“To share and grow the world’s

knowledge”

• Millions of questions & answers

• Millions of users

• Thousands of topics

• ...

Page 3: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Demand

What we care about

Quality

Relevance

Page 4: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Data@Quora

Page 5: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Lots of data relations

Page 6: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Complex network propagation effects

Page 7: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Importance of topics & semantics

Page 8: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Machine Learning@Quora

Page 9: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Ranking - Answer rankingWhat is a good Quora answer?

• truthful

• reusable

• provides explanation

• well formatted

• ...

Page 10: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Ranking - Answer rankingHow are those dimensions translated

into features?

• Features that relate to the text

quality itself

• Interaction features

(upvotes/downvotes, clicks,

comments…)

• User features (e.g. expertise in topic)

Page 11: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Ranking - Feed• Goal: Present most interesting stories for

a user at a given time• Interesting = topical relevance +

social relevance + timeliness

• Stories = questions + answers

• ML: Personalized learning-to-rank approach

• Relevance-ordered vs time-ordered = big

gains in engagement

• Challenges:

• potentially many candidate stories

• real-time ranking

• optimize for relevance

Page 12: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Feed dataset: impression logs

click

upvote

downvote

expand

share

click

answer pass

downvote

follow

Page 13: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

● Value of showing a story to a user, e.g. weighted sum of actions:

v = ∑a va 1{ya = 1}

● Goal: predict this value for new stories. 2 possible approaches:○ predict value directly

v_pred = f(x)

■ pros: single regression model

■ cons: can be ambiguous, coupled

○ predict probabilities for each action, then compute expected value:

v_pred = E[ V | x ] = ∑a va p(a | x)

■ pros: better use of supervised signal, decouples action models from action values

■ cons: more costly, one classifier per action

What is relevance?

Page 14: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

● Essential for getting good rankings

● Better if updated in real-time (more reactive)

● Main sets of features:○ user (e.g. age, country, recent activity)

○ story (e.g. popularity, trendiness, quality)

○ interactions between the two (e.g. topic or author affinity)

Feature engineering

Page 15: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

● Linear

○ simple, fast to train

○ manual, non-linear transforms for richer

representation (buckets, ngrams)

● Decision trees

○ learn non-linear representations

● Tree ensembles

○ Random forests

○ Gradient boosted decision trees

● In-house C++ training code, third-party

libraries for prototyping new models

Models

Page 16: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Scalability: feed backend system

Aggregator 1 Aggregator 2 Aggregator 3

Leaf 1 Leaf 2 Leaf 3

Aggregator

Leaf

Requests from Web (python)

...

...

...

user_id

object_id

Page 17: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Recommendations - Topics

Goal: Recommend new topics for the

user to follow

• Based on

• Other topics followed

• Users followed

• User interactions

• Topic-related features

• ...

Page 18: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Recommendations - Users

Goal: Recommend new users to follow

• Based on:

• Other users followed

• Topics followed

• User interactions

• User-related features

• ...

Page 19: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Related Questions

• Given interest in question A (source) what other

questions will be interesting?

• Not only about similarity, but also “interestingness”

• Features such as:

• Textual

• Co-visit

• Topics

• …

• Important for logged-out use case

Page 20: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Duplicate Questions• Important issue for Quora

• Want to make sure we don’t disperse

knowledge to the same question

• Solution: binary classifier trained with

labelled data

• Features

• Textual vector space models

• Usage-based features

• ...

Page 21: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

User Trust/Expertise InferenceGoal: Infer user’s trustworthiness in relation

to a given topic

• We take into account:

• Answers written on topic

• Upvotes/downvotes received

• Endorsements

• ...

• Trust/expertise propagates through the network

• Must be taken into account by other algorithms

Page 22: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Trending TopicsGoal: Highlight current events that are

interesting for the user

• We take into account:

• Global “Trendiness”

• Social “Trendiness”

• User’s interest

• ...

• Trending topics are a great discovery mechanism

Page 23: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Spam Detection/Moderation• Very important for Quora to keep quality of

content

• Pure manual approaches do not scale

• Hard to get algorithms 100% right

• ML algorithms detect content/user issues

• Output of the algorithms feed manually

curated moderation queues

Page 24: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Content Creation Prediction• Quora’s algorithms not only optimize for

probability of reading

• Important to predict probability of a user

answering a question

• Parts of our system completely rely on

that prediction

• E.g. A2A (ask to answer) suggestions

Page 25: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Models

Page 26: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Models● Logistic Regression

● Elastic Nets

● Gradient Boosted Decision

Trees

● Random Forests

● (Deep) Neural Networks

● LambdaMART

● Matrix Factorization

● LDA

● ...

Page 27: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Experimentation

Page 28: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

⚫ Extensive A/B testing, data-driven decision-

making

⚫ Separate, orthogonal “layers” for different parts

of the system

⚫ Experiment framework showing comparisons for

various metrics

Experimentation

Page 29: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Conclusions

Page 30: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain

Conclusions

• At Quora we have not only Big, but also “rich” data

• Our algorithms need to understand and optimize

complex aspects such as quality, interestingness, or user

expertise

• We believe ML will be one of the keys to our success

• We have many interesting problems, and many unsolved

challenges

Page 31: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain