February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron...

Preview:

Citation preview

February 11, 2008 WSDM 1

Preferential Behavior inOnline Groups

Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 2

Power Users

February 11, 2008 WSDM 3

Executive Summary of Preferential Treatment

Long term power-users are:

1) 20 times more likely to receive a response upon joining

2) Twice as likely to receive a response upon becoming heavily engaged

3) 9 times more likely to have early responses come from other power-users

February 11, 2008 WSDM 4

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

February 11, 2008 WSDM 5

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

February 11, 2008 WSDM 6

Online Groups

• A majority of internet users participate in some form of “online group” related to hobbies, beliefs or offline relationships (Pew 2001)

• Groups vary along a number of dimensions:– Scale– Online vs. offline relationships– Broadcast, Q/A, and interaction– etc.

• Examples– Ithaca Rotary Club mailing list– Palo Alto Parenting group– Aerosmith fan club on MySpace

February 11, 2008 WSDM 7

Yahoo Groups

• 100 million users, 6 million groups• Can be created by any user. This user becomes the

moderator of the group– controls privacy settings, access, memberships, etc.

• Content includes information pages, multimedia content, and message boards.

• Majority of contents resides in message boards.– Members may post a message on a new topic, or respond to a

message posted earlier– Users may read content online, or receive by email– ~6 million groups, ~6 billion messages

• We used data from one year: May 2005 - May 2006

February 11, 2008 WSDM 8

Privacy and Size

• Analysis performed on several categories of groups

• Size– Small: fewer than 20 unique posters– Medium: 20-99 unique posters– Large: greater than 100 posters

• Privacy– Public: open and listed or open and unlisted– Semi-public: restricted and listed– Private: closed and listed, closed and unlisted or restricted and

unlisted

February 11, 2008 WSDM 9

Yahoo Groups

February 11, 2008 WSDM 10

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

February 11, 2008 WSDM 11

Engaged Users & Thriving Groups

• Various degrees of user engagement.– lurkers, to heavily engaged

• Our focus: users who are heavily engaged in the group, with a high level of posting activity

• What differentiates these engaged users? Are they treated differently? Do they behave differently?

• Look at “thriving” groups

February 11, 2008 WSDM 12

Thriving Groups

Three requirements to be a “thriving” group:

1) Baseline Users– At least 10 users must post during the year

2) Baseline traffic– at least two messages for every 30 day window.

3) Dense period– a two-month period during which every 7-day interval has at least

10 posts

New corpus: 44,473 groups, 1M users

February 11, 2008 WSDM 13

k-Cores

• We define the k-core of a group at time t as follows:• For a two week window around t, a user is in the k-core if:

– the user has replied to k distinct users in the group– the user has been replied to by k distinct users

3-core user

February 11, 2008 WSDM 14

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

February 11, 2008 WSDM 15

Core Size

48% of group/time pairs have a 2-core of at least 6 people

February 11, 2008 WSDM 16

Fraction of Posters in Core

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Small Medium Large

PrivateSemi-publicPublic

February 11, 2008 WSDM 17

Time Spent in Core

small-private groups: 20% in core less than 2 weeks

large public groups: 94% in core less than 2 weeks

small private groups: 48% in core for 200+ days

February 11, 2008 WSDM 18

Half-life of Cores

February 11, 2008 WSDM 19

Core Populations

• Light: briefly enters the conversation, i.e., don’t enter the core

• Short Core: enters the core for less than 50 days

• Long Core: enters the core for 50 days or more

Light 774k

Short core 134k

Long core 90k

February 11, 2008 WSDM 20

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

February 11, 2008 WSDM 21

Long Core Users Across Groups

(6, 0.55) = long-core in first 6 groups joined, 55% probability of being long-core in the 7th

February 11, 2008 WSDM 22

Multiple Memberships

probability

February 11, 2008 WSDM 23

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

February 11, 2008 WSDM 24

Preferential Treatment of Engaged Users

• Are engaged, or “long-core”, users treated differently within a group?

• Yes! We detail three key forms of preferential treatment given to heavily engaged users.

February 11, 2008 WSDM 25

Response to Newcomer

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Light Short-Core Long-core

Private Small Semi-Public Small Public Small Private Medium Semi-public MediumPublic Medium Private Large Semi-Public LargePublic Large

February 11, 2008 WSDM 26

Response to Core Members

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Long-core Short-core

PublicSemi-PublicPrivate

February 11, 2008 WSDM 27

Response Probability by Newcomer Type

00.10.20.30.40.50.60.70.80.9

1

Light Short-core Long-core

Newcomer Type

Probability

Long-coreColumn 2Light

February 11, 2008 WSDM 28

Long-core Response Probability

0.28

0.29

0.3

0.31

0.32

0.33

0.34

At time ofjoining core

When in Core50+ Days

PublicSemi-PublicPrivate

February 11, 2008 WSDM 29

Summary of Preferential Treatment

Heavily engaged “long-core” users are:

1) 20 times more likely to receive a response upon joining

2) Twice as likely to receive a response upon becoming heavily engaged

3) 9 times more likely to have early responses come from other long-core users

Note: Probability of receiving a response increases until joining the core, then begins to decline.

February 11, 2008 WSDM 30

First Post Types

100 first posts of long-core users:

Friends: the newcomer has some prior relationship with another group member

Introduction: the new poster introduces herself to the group

No decision: no information to determine a relationship

No decision 57%

Introduction 37%

Friends 6%

February 11, 2008 WSDM 31

Pregnancy-and-pups: Coccidiosis

I am new to this board and I have enjoyed reading the posts. I am hoping you can help me learn more about coccidiosis.

I have a litter of puppies whose stool is good. They have been on Albon for 10 days. One of the puppies went home and to the vet today and has coccidiosis. Why is that? What else can I use to be sure the puppies are free of cocci?

I do appreciate all your input and all your time in helping me!

February 11, 2008 WSDM 32

skatefans: Appropos of Barbara Cook

Hi! This is my first "Skatefans" post. (I've been reading -- don't like the word "lurking" -- Skatefans since about the summer of 2000, but haven't had the time to post before.)While everyone, of course, is entitled to their own likes and dislikes, I'd just like to add some thoughts about "Fosse" from someone who's been a very big fan of that program. I've only been following figure skating since 1997 so my frame of reference is obviously limited but, in terms of the exhibition programs that I've seen duing this period, I think "Fosse" is one of the "landmark" exhibition programs that I've seen (although I can also see some of the problems with it that people have been pointing out).

February 11, 2008 WSDM 33

Outline

• Introduction to online groups

• Experimental set-up

• Examination of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

February 11, 2008 WSDM 34

Modeling Long-Core Engagement

Factors at work creating long-core engagement:

• User factor: a user’s personality causes her to become long-core in every group she joins

• Group factor: a group is so welcoming, or its topic so engaging, that users are likely to become long-core

February 11, 2008 WSDM 35

Model

Users Groups

Chance of being long-core in a group: 0.7

Chance of random member being a long-core user: 0.3

February 11, 2008 WSDM 36

Model

• For each (u,g) pair, predict whether pair is in set of memberships H that are long-core.

• Pr[(u,g) in H] = 1 - (1-p(u))(1-p(g))• Task is to choose the best p(u) and p(g) to reproduce H,

the set of long-core memberships• Evaluate quality by the likelihood of predicting H

• Consider three variants:– Use only properties of users, p(g) = 0– Use only properties of groups, p(u) = 0– Allow both p(u) and p(g) to be arbitrary

February 11, 2008 WSDM 37

Analytical Results

Model % of correct edges

User-only p(g) = 0 94.9

Group-only p(u) = 0 85.6

Combined 95.1

February 11, 2008 WSDM 38

Improvement Using Group Factor

February 11, 2008 WSDM 39

Fin

• Social analysis of one of the world’s largest collections of online communities

• Proposed a partitioning of the data to select for active communities of engaged users

• Examined several levels of engagement: “light”, “short-core”, and “long-core”

• Identified several striking ways in which heavily engaged users are given preferential treatment from other members of the group

• Proposed a model to study factors contributing to long-term engagement and showed that both user and group factors play a role.

February 11, 2008 WSDM 40

Fin++

Special thanks to the Groups team:Di-fa Chang, Lee Clancy, David Kopp, Bobby Lee,

Maria Saltz and Gordon Strause.

Ravi Kumar, Jasmine Novak & Andrew Tomkins

{ravikuma,jnovak,atomkins}@yahoo-inc.com

Lars Backstrom lars@cs.cornell.edu

Cameron Marlow cameron@facebook.com

Recommended