8
Effects of Missing Data in Social Networks Gueorgi Kossinets Social Networks 2006 Presenter: Ke Zhang Outline !Introduc)on 3 Sources of Missing Data in SNs Dataset and sta@s@cs of interest Results and Discussion Conclusion Introduction-I Social network data is oCen incomplete Nodes(Actors ) Links (Affilia@ons) Missing data introduced by Boundary specifica@on NonOresponse in network surveys (Ques@onaire) Study design (Fixed choice design) Introduction-II Missing data have effects on topological proper@es of networks Connec@on Degree Assorta@vity, a kind of mixing paVern (HighO degree nodes VS. lowOdegree ones) Inves@gate effects from different sources of data missing

Introduc)on** Effects of Missing Data in Social Networksprashk/inf3350/f12/september_10_2.pdf · Non-response effects • Network(survey(research((Ques@onaire)(– Actors(are(asked(to(report(groups(they(belong(to(–

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Effects of Missing Data in Social Networks

Gueorgi(Kossinets(

Social(Networks(2006(

(Presenter:(Ke(Zhang(

(

(

Outline

! Introduc)on**•  3(Sources(of(Missing(Data(in(SNs(

•  Dataset(and(sta@s@cs(of(interest(•  Results(and(Discussion(•  Conclusion((

Introduction-I

•  Social(network(data(is(oCen(incomplete(– Nodes(Actors()(– Links((Affilia@ons)((

•  Missing(data(introduced(by(

– Boundary(specifica@on((– NonOresponse(in(network(surveys((Ques@onaire)(– Study(design((Fixed(choice(design)(

Introduction-II

•  Missing(data(have(effects(on(topological(

proper@es(of(networks(– Connec@on(Degree(– Assorta@vity,(a(kind(of(mixing(paVern((HighO

degree(nodes(VS.(lowOdegree(ones)((

– …(

•  Inves@gate(effects(from(different(sources(

of(data(missing(

Outline

•  introduc@on((! 3*Sources*of*Missing*Data*in*SNs*•  Dataset(and(sta@s@cs(of(interest(•  Results(and(Discussion(

Models of Social Networks

•  Mul@contexual(Model(

– Bipar@te(Graph((twoOmode)((

– Events(or(Contexts((((((,((((()(– Actors((A,(B,(C(…)(– From(twoOmode(to(oneOmode(

•  Simple(Model(

– Random(Graph(

– Poisson(Distribu@on((

The boundary specification problem-I

•  Specify(inclusion(rules(on((– The(set(of(actors((nodes)(– Which(rela@on(to(consider(

– E.g.(intraorganiza@onal(networks(

Average(degree(

drop(down(25%(

with(omission(of(D(

The boundary specification problem-II

•  Omit(interac@on(contexts/events(

Non-response effects

•  Network(survey(research((Ques@onaire)(– Actors(are(asked(to(report(groups(they(belong(to(– Missing( some( responses( from( actors( leads(missing(

network(links(

Fixed-choice Nomination-I

•  An( actor( nominate( up( to( X( persons( from(his( x(

friends(

ACer(nomina@on(process(

Fixed-choice Nomination-II

•  Reciprocated(nomina@ons(

– Links(reported(by(both(interactants(•  NonOreciprocated(nomina@ons(

– Links(reported(by(only(one(partner(•  Lead(to(a(nonOrandom(missing(data(paVern(

– Popular( actors(with(more( contacts( are(more( likely(

to(be(nominated(by(their(contacts(

Outline

•  Introduc@on((•  3(Sources(of(Missing(Data(in(SNs(

! Dataset*and*Sta)s)cs*of*Interest*•  Results(and(Discussion(

Topological Properties of Network-I

•  Mean(vertex(degree(z"– Average( number( of( interactants( (edges)( per( actor((ver@ce)(

–  Reflect(the(connec@vity(of(the(network(•  Clustering(C"–  The( probability( that( any( two( ver@ces( with( a( mutual(neighbors(are(themselves(connected(

–  ((((((:(Number(of(triangles(

–  ((((((:(number(of(connected((

((((((((((((triples(of(ver@ces(

Topological Properties of Network-II

•  Assorta@vity(r"– Similiarity(between(two(connected(

((((nodes(in(term(of(their(degrees(

– Nodes(might(tend(to(connect((

((((others(that(are(very(similar(or((

((((different(

–  r>0(:(assor@ve(mixed(paVern(

–  r<0:(dissor@ve(mixed(paVern(

Topological Properties of Network-III

•  The(average(path(length(– Between( all( pairs( of( ver@ces( in( the( largest(component((

Dataset-I

(1)(Scien@fic(collabora@on(graph((real(network)((

– Contain(authors(and(papers(from(Condensed(MaVer(

sec@on((“condOmat”)(

– Provided(by(Los(Alamos(EOpoint(Archive((1995O1999)(

(2)(100(Random(bipar@te(graphs(

– With(the(same(size(of(ver@ces(and(edges(

Dataset-II

•  Real(model( are( greatly( skewed( compared( to( random(

model(

(((((((:(Number(of(authors(per(paper((

(((((((:(Number(of(papers(per(authors(

(((((((:(Number(of(collaborators(per(author(

Dataset-III

•  1(

Dataset-IIII

•  Real(network(vs.(Random(network(

– Strongly( nonOrandom( alloca@on( of( authors( over(

papers(

– Assor@@ve:(authors(with(many(collaborators(tend(to(

work(with(those(of(the(same(ilk(

Algorithms to measure effects

(1)(take(a(real(and(random(social(network(

(2)( remove( a( frac@on( of( en@@es( to( simulate(

different(sources(of(missing(data(

(3)(measure( network( proper@es( and( compare( to(

the(“complete”(network((

(4)(repeat((2)O(3)(to(make(a(sta@s@cs(

Outline

•  Introduc@on((•  3(Sources(of(Missing(Data(in(SNs(

•  Dataset(and(sta@s@cs(of(interest(! Results*and*Discussion*•  Conclusion(

Comparison of Boundary Specification and non-response effects

Effects on Mean Vertex Degree: z

(•  NonOrandom(alloca@on(of(actors(to(groups(leads(to(redundancy(

in(links(

•  High( redundancy( implies( actors( are( likely( to( link( already(connected(actors(

BSPC(context�

BSPA(actor�

NRE("Non"response�

Clustering: C

(•  Random(omission(of(actors(has(no(effects(on(clustering(

•  Omission(of(contexts(reduce(triples(while(keep(triangles(high(

•  NonOresponse(open(up(triples(faster(than(destroy(them(

BSPC"("context�

BSPA(actor�

NRE"("non"response�

Assortativity: r

(•  BSPC(increase(degreeOtoOdegree(correla@on(

•  NRE(causes(it(to(diminish(to(a(disassorta@ve(mixing(paVern(

•  Assorta@vely(mixed(network(are(more( robust( to( removal(of( ver@ces(than(disassorta@ve?((

–  Not(exactly(as(we(can(see.(Social(Network(may(have( less(assorta@vity( than(they(appaer(to(have((overes@mated)((

BSPA(actor�

NRE"("non"response�

BSPC("context�

Size of the largest connected component: S

(

•  Omission( of( actors( leads( severe( breakdown( of( network(

connec@vity(

•  Assota@ve(correla@ons(make(graph(robust(to(random(damages?(

(Disagree(in(out(simula@on)(

–  Assorta@vity(alone(does(not(necessarily(imply(network(robustness(

BSPA(actor�NRE"("non"response�

BSPC("context�

Fixed Choice Effect

•  Three(cases(to(omit(data(

(1)( Only( record( K( interac@on( contexts/papers( out( of(

average(((((for(every(actor(

(2)( Each( actor( nominates( up( to( X( out( of( average( z(

interac@on( partners( (NonOreciprocated,( the( link( is(

present(if(either(of(or(both(report(it)(

(3)(The(same(to(2,(but(reciprocated(

Fixed Choice Effect (on Mean degree z)

(

•  Real(network(has(more(severe(effect(than(random(network(

•  Fixed( Choice( does( not( affect(mean( degree( z(when( k>3( or( x>3(

(enough(nomina@ons)(

Dots:"contexts""(FCC)"""""Square:"actor"non(reciprocated"(FCA)"""""Stars:"only"reciprocated"(FCR)"�

Fixed Choice Effect (on other network properties)

•  The(real(network(appear(to(more(sensi@ve(to(fixed(bound(

effects(

–  Due( to( the( joint( effect( of( nonOrandom( (assorta@ve)( mixing( and(

skewed(degree(distribu@on((

–  LiVle(effect(on(random(graph(because(its(degree(variance(is(small(

–  Elimina@ng(most(connec@ons(within(the(network(core(will(quickly(

break(down(the(giant(component((

Conclusions

•  Discuss( the( effects( on( network( topological( proper@es(by(different(sources(of(missing(data(

•  Boundary( specifica@on( alters( network( proper@es(greatly(even(if(context(redundancy(is(large(

•  The( effect( of( fixed( choice( depend( on( the( vertex(distribu@on(and(mixing(paVern(

•  Provide( a( sta@s@cal( guidance( to( researches( in( data(analysis(of(SN(

Thank you