16
Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups [email protected]

Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups [email protected]

Embed Size (px)

Citation preview

Page 1: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

Personalizing Java based Answers for Hundreds of Millions of Users

Anurag GuptaSenior Architect, Yahoo Answers & [email protected]

Page 2: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

Agenda

• Industry Gaps• Vision• Strategy• Use Cases• Architecture• Next Steps

Page 3: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

2010: Resurgence of Q&A

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2010: A year of highlights…

2011: The story continues…Quora, Location-based Q&A apps (Crowd Beacon, Hipster), Facebook Questions and Mahalo pivoting, Answers.com acquisition…

Launch Acquisition Investment Mobile play

. . . Yahoo! Answers is still #1 (twice size of nearest competitor)

Page 4: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

- 4 -

• Meeting unmet needs:– Improving signal to noise ratio

– Beyond realtime: creating User Generated Content of lasting, evergreen value

– Organising people’s knowledge and opinion for mass consumption

– Allowing people to connect and share based on common interests, locations etc.

– Providing platforms for people to become regarded as experts

• Identifying untapped monetisation opportunities– Mining intent and interest and information from participating users

Why this activity?

Companies entering market to address deficiencies of Social Media Companies entering market to address deficiencies of Social Media

Page 5: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

- 5 -

Industry Gaps

Personal Relevance User Reputation Content Quality

No understanding or filtering of content by interest

Lack of understanding of quality contributors / content – poor signals

Spam management

No filtering of content by social circle or user reputation

Persona vs. Real identity No distinction between knowledge vs. conversational Q&A

Almost no ability to post location-specific questions and filter content by location

No topic specific reputation (PeopleRank)

No ‘memory’ – hard to surface previous questions around topic

Limited action, reaction, interaction loops – opportunity to improve engagement through notifications/follows

No community tools for users to engage outside of Q&A

Page 6: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

- 6 -

Yahoo Answers is the place to share opinions, experience & knowledge around personal interests

Page 7: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

- 7 -

Y! Answers: Leading Site with over 2X next competitor

Unique Users - Comscore% Reach - Comscore

  Jun-11 M/M Y/Y     Jun-11 M/M Y/Y  

Reference 745 M -2% 11%            

Wikimedia Foundation Sites 399 M -3% 5% Wikimedia Foundation Sites 54% -1% -5%

Yahoo! Answers 245 M -2% 17%   Yahoo! Answers 33% 0% 5%  

Baidu Answers 109 M 4% 10% Baidu Answers 15% 6% -1%

eHow 82 M -8% 13%   eHow 11% -6% 1%  

Answers.com Sites 72 M -19% 5% Answers.com Sites 10% -17% -6%

Page 8: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

- 8 -

Strengthen core and reach out

Personalization,User Interest GraphUser Reputation

Distribution

Ecosystem

Monetization

Page 9: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

Personalization & Relevance

Insight

sUse

rs

Conten

t Ads

APIs

PublisherPartners

Yaho

o

Partner Data

APIs

User clicksUser clicks

Social graphSocial graph

Ranked content, video, adsRanked content, video, ads

Connected Devices

User Generated Content, taggingUser Generated Content, tagging

Page 10: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

Personalization & Relevance

FinanceSports

News

3rd party publisher Ads

Content & Ad ServerIn-memory user-content-relevance_score

Users

CollaborativeFiltering, social, geo, time

User Segments

Advertisers Social Graph ‘like’

User InterestGraph Tag

User clicksSearch termsRanked content & ad

Interactions:UGC, tags, Q&A

Publishers

Gaps driveacquisition ofnew relevant long-tail content

Search

Content-Tags Ad & Content

Feeds

Page 11: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

Yahoo Answers Personalization Use Cases• Learn about new users’ interests (cold-start)• Show relevant questions to user that comes via search engine• Show relevant questions to Answerer on Y! Answers or 3rd party site• Use knowledge of user interests to increase user engagement, page views, reach, monetization

Page 12: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

# Best AnswersAttributedTo Answerer

Useful Vote

PeopleRank ofViewer who voted“useful”

Answerers with High PeopleRank

Viewer’sinterest

Question Popularity

Quality ofAnswers

High qualityHigh relevanceQ&A page

Answers: Relevance & Content Quality

LikeVote

UserInterestGraph

Answerability

Increase signal to noise ratioReward content creators with relevant audienceHelp audience discover relevant high quality content

Green – Y! wideYellow – Answers specific

Page 13: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

Architecture for Online & Offline Computation

Front-End

Middle-tier

NoSQLLong Tail

Cache

Oracle

User Profile Services

Tags

User interest

Content

search terms,UGC

Answers serving

New Offline on Hadoop Grid

userId, contentId,

relevance_score

3rd party feeds

FeedAcquisition

Notification

Fast path

PeopleRank

Question Popularity

Answerability

Quality of Answers

Collaborative Filtering

Thumbs-up

TagsRelevancecomputation

New Online serving

Page 14: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

Offline Relevance Computation

Answers Data on Grid

UserInterestGraph

1, userID

2, viewerinterests

PeopleRank

3, viewerinterests

4, top answerers

5, top answerers

6, Qs answered

3b, viewer interests

4b, popular Qs

RelevanceComputation

7, userID-Q-relevance_score

Page 15: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

Incremental Online Relevance Computation

Front End

Middle Tier

Answers Oracle Database

1, click, search, UGC

2

UPS

3, userID, tags

4, viewerinterests

PeopleRank

5, viewerinterests

6, top answerers

7, top answerers

8, Qs answered

5b, viewer interests

6b, popular Qs

RelevanceComputation

9, relevant Qs

10, relevant Qs

Page 16: Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com

Next Steps

• Move Oracle batch processing to Hadoop grid• Get Answers data on Hadoop grid• Annotation of source property for user interest• Detect useful vs. interesting feedback• User Interest Graph• PeopleRank• Tag computation• Bucketing infrastructure• Notification services