52
1 How Could We All Get Along on the Web 2.0? The Power of Structured Data on the Web Sihem Amer Yahia Yahoo! Research

1 How Could We All Get Along on the Web 2.0? The Power of Structured Data on the Web Sihem Amer Yahia Yahoo! Research

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

1

How Could We All Get Along on the Web 2.0?

The Power of Structured Data on the Web

Sihem Amer Yahia

Yahoo! Research

2Yahoo! Research

Outline

• Web search and web 2.0 search

• Why should we all get along?

• How could we all get along?

• Related work

• Conclusion

3Yahoo! Research

Web search

• Access to “heterogeneous”, distributed information– Heterogeneous in creation– Heterogeneous in motives

• Keyword search very effective in connecting people to information

Search

web pages web pages

4Yahoo! Research

Web search vs web 2.0 search?

Content creators Content aggregators

Feeds

Content consumers

Ano

nym

ous

Sub

scri

ber

s

Web

2.0

se

arch

Web

se

arch

5Yahoo! Research

Web 2.0 a generation of internet-based services that

– let people form online communities

– in order to collaborate

– and share information

in previously unavailable ways

6Yahoo! Research

Online communities

• Subscribers join communities where they – exchange content: emails, comments, tags– rate content from other subscribers– exhibit common behavior

• About 500M unique Y! visitors per month, about 200M subscribers (login visitors) to more than 130 Y! services

7Yahoo! Research

Web 2.0 search

Web 2.0

Connecting people to people

FlickrY!AnswersYouTubeY!Groups

8Yahoo! Research

Web 2.0 search examples

• Mary is a professional photographer and is looking for aerial photos of the Hoggar desert

• She is also an amateur Jazz dancer and wants to ask about dance schools w/flexible schedules in SF

• She is also looking for the latest video on bird migration in Central Park, NY

• She has heart problems but loves biking and is interested in finding about email discussions on biking trails in northern California

9Yahoo! Research

Outline

• Web search and web 2.0 search

• Why should we all get along?

• How could we all get along?

• Related work

• Conclusion

10Yahoo! Research

Improving users’ experience

• Keyword search should be maintained: simple and intuitive

• Keyword queries usually short

– only express a small fraction of the user's true intent

• Users's interactions within community-based systems can be used to infer a lot more about intent and return better answers

11Yahoo! Research

Why should we all get along?

• Contributed content is structured

– This is what DB community knows how to do best

• Relevance to query keywords is key

– This is what IR community knows how to do best

12Yahoo! Research

Searching online communities

id author date001 s2 1/1/06

002 s4 1/8/06

003 s4 3/9/06

sub sub trust

s1 s3 c13

s1 s4 c14

s2 s3 c23

s4 s6 c46

data table

community relationship table

id sub annotation001 s2 1/1/06

002 s4 1/8/06

003 s4 3/9/06

Tags, ratings, Reviews table

13Yahoo! Research

Searching online communities

• Search for most relevant data on some topic

– Querying data: selection over data table

– Querying annotations: selection over annotation table + join w/data table

– Personalizing answers: join w/subscribers table

• Relevance: use data relevance + annotation table

14Yahoo! Research

Why should we all get along?

• Query interpretation depends on subscriber’s interest at the time of querying

• Data annotations are dynamic–Precompute all (sub,sub,trust) for

each topic?

• Need for dynamic query generation

15Yahoo! Research

DB and IR

• Shared interactions help focus search

– User-input, community-input, extraction

– Personalizing answers with community information

• Ranking as a combination of

– Relevance

– Relationship strengths between people in the same community

16Yahoo! Research

Outline

• Web search and web 2.0 search

• Why should we all get along?

• How could we all get along?

– Applications

– Technical challenges

• Related work

• Conclusion

17Yahoo! Research

Applications

• Flickr enables sharing and tagging photos

• Y! Answers enables asking and answering questions in natural language

• YouTube enables sharing videos, rating videos, commenting on videos and subscribing to new videos from favorite users

• Y! Groups enables creating groups, joining existing groups, posting in a group

18Yahoo! Research

Flickr

• Acquired by Y! in 2005

• Tag search

• Photos grouped into categories.

• Set privacy levels on each photo

19Yahoo! Research

20Yahoo! Research

21Yahoo! Research

The new inputs to Flickr search

Users tag and rate photos

Users tagging same photos with

similar tags form a community of interest

• Combine tag-based search

with community knowledge

• Combine photo rating with

relationship strength

Communities Query

Subscriber

Search

22Yahoo! Research

Y! Answers

• Launched in second half of 2005

• Incentive system based on points and voting for best answers

• Questions grouped by category

• Some statistics:

– over 60 million users

– over 120 million answers, available in 18 countries and in 6 languages

23Yahoo! Research

24Yahoo! Research

Y! Answers

25Yahoo! Research

Y! Answers

26Yahoo! Research

The new inputs to Y!Answers search

Users provideQuestions/Answers

Voting information reflects

communities of interest

Combine community

information with answer rating

Communities Query

Subscriber

Search

27Yahoo! Research

YouTube

• Founded in February 2005

• Tag search

• Videos grouped by category

• Some statistics:

– 100 million views/day

– 65,000 new videos/day

28Yahoo! Research

29Yahoo! Research

30Yahoo! Research

The new inputs to YouTube search

Users provide videos, tags, ratings, comments

Similar tags on same videos

imply communities of interest

Combine community

information with video rating

Communities Query

Subscriber

Search

31Yahoo! Research

Yahoo! Groups

• Yahoo! acquired eGroups in 2000

• Group moderators

• Groups belong to categories

• Public and private groups

• Some statistics:– over 7M groups

– over 190M subscribers

– over 100K new subscribers/day

– over 12M emails/day

32Yahoo! Research

33Yahoo! Research

34Yahoo! Research

35Yahoo! Research

Alternative query interpretations

• Return all group postings relevant to a query.

• Return only posting by subscribers sharing the same interests: women with heart disease interested in steep slopes

36Yahoo! Research

The new inputs to Group search

Users participate in many groups

Group membership and postings imply communities of interest

Combine community information

with postings relevance

Communities Query

Subscriber

Search

37Yahoo! Research

Outline

• Web search and web 2.0 search

• Why should we all get along?

• How could we all get along?

– Applications

– Technical challenges

• Related work

• Conclusion

38Yahoo! Research

So, how can we all get along?

• Augment keyword query with conditions on structure to focus and personalize search (DB)

– Flickr: tags

– Answers: points

– YouTube: reviews and ratings

– Groups: emails

• Combine it with relevance (IR)

39Yahoo! Research

Search architecture

Subscriber Queryevaluation

search termsQuery

tightening

Find relevant community of

interest

Rankingcontent relevance

+relationship

structuredquery

40Yahoo! Research

Example

“biking trails northern california”

Query tightening

message contains “…” andfrom = “s1” or “s2”

From:To:Date:Subject:Content:

message structure

S1 S1S2 S2S3 S3S4 S4S5 S5S6 S6S7 S7

( si, sj, cij )

Many such relationships depending on subscriber’s interests

41Yahoo! Research

Can we really all get along?

• IR may think that user weights are enough to target communities of interest and personalize queries

• DB thinks expressiveness of query languages cannot all be captured by ranking functions

42Yahoo! Research

Query rewriting

Content-OnlyContent in context

Loose interpretationof context

43Yahoo! Research

Query relaxation

• Primitive operations for dropping query predicates

• Answers to relaxed query contain answers to exact one

• Scores relaxed answer no higher than score of exact one

44Yahoo! Research

Query tightening

• Primitive operations for adding query predicates

• Tighter answers are found but looser answers should be maintained

• Scores tighter answers no lower than scores of other answers

45Yahoo! Research

More technical challenges

• Query tightening primitives to focus search

• Subscriber has a different profile/community of interest

• Topk processing needs to enforce user profiles

46Yahoo! Research

Outline

• Web search and web 2.0 search

• Why should we all get along?

• How could we all get along?

– Applications

– Technical challenges

• Related work

• Conclusion

47Yahoo! Research

Related Work

• Language models: Ask Bruce Croft

• Web search personalization– Search behavior

– HARD track at TREC

• Building relationship graphs: – Collaborative filtering

– Clustering

– Unsupervised learning

48Yahoo! Research

Tempting conclusion

• Little information could be gathered on users to greatly improve new-generation search

• IR and DB views both needed

49Yahoo! Research

More technical challenges

• Subscriber belongs to different communities of interest

• Should subscriber turn off personalization?

• How is efficiency affected? (revisiting topk processing)

• Back from community search to web search?

50Yahoo! Research

Beyond search in online communities

• Are online communities a way to build more accurate user profiles or more?

– display relevant groups when user is asking a question on Y! Answers: mashups?

51Yahoo! Research

Danger of online communities

Are we discouraging diversity?

52

Thank you.

[email protected]

http://research.yahoo.com/~sihem