Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Effects of Missing Data in Social Networks
Gueorgi(Kossinets(
Social(Networks(2006(
(Presenter:(Ke(Zhang(
(
(
Outline
! Introduc)on**• 3(Sources(of(Missing(Data(in(SNs(
• Dataset(and(sta@s@cs(of(interest(• Results(and(Discussion(• Conclusion((
Introduction-I
• Social(network(data(is(oCen(incomplete(– Nodes(Actors()(– Links((Affilia@ons)((
• Missing(data(introduced(by(
– Boundary(specifica@on((– NonOresponse(in(network(surveys((Ques@onaire)(– Study(design((Fixed(choice(design)(
Introduction-II
• Missing(data(have(effects(on(topological(
proper@es(of(networks(– Connec@on(Degree(– Assorta@vity,(a(kind(of(mixing(paVern((HighO
degree(nodes(VS.(lowOdegree(ones)((
– …(
• Inves@gate(effects(from(different(sources(
of(data(missing(
Outline
• introduc@on((! 3*Sources*of*Missing*Data*in*SNs*• Dataset(and(sta@s@cs(of(interest(• Results(and(Discussion(
Models of Social Networks
• Mul@contexual(Model(
– Bipar@te(Graph((twoOmode)((
– Events(or(Contexts((((((,((((()(– Actors((A,(B,(C(…)(– From(twoOmode(to(oneOmode(
• Simple(Model(
– Random(Graph(
– Poisson(Distribu@on((
The boundary specification problem-I
• Specify(inclusion(rules(on((– The(set(of(actors((nodes)(– Which(rela@on(to(consider(
– E.g.(intraorganiza@onal(networks(
Average(degree(
drop(down(25%(
with(omission(of(D(
The boundary specification problem-II
• Omit(interac@on(contexts/events(
Non-response effects
• Network(survey(research((Ques@onaire)(– Actors(are(asked(to(report(groups(they(belong(to(– Missing( some( responses( from( actors( leads(missing(
network(links(
Fixed-choice Nomination-I
• An( actor( nominate( up( to( X( persons( from(his( x(
friends(
ACer(nomina@on(process(
Fixed-choice Nomination-II
• Reciprocated(nomina@ons(
– Links(reported(by(both(interactants(• NonOreciprocated(nomina@ons(
– Links(reported(by(only(one(partner(• Lead(to(a(nonOrandom(missing(data(paVern(
– Popular( actors(with(more( contacts( are(more( likely(
to(be(nominated(by(their(contacts(
Outline
• Introduc@on((• 3(Sources(of(Missing(Data(in(SNs(
! Dataset*and*Sta)s)cs*of*Interest*• Results(and(Discussion(
Topological Properties of Network-I
• Mean(vertex(degree(z"– Average( number( of( interactants( (edges)( per( actor((ver@ce)(
– Reflect(the(connec@vity(of(the(network(• Clustering(C"– The( probability( that( any( two( ver@ces( with( a( mutual(neighbors(are(themselves(connected(
– ((((((:(Number(of(triangles(
– ((((((:(number(of(connected((
((((((((((((triples(of(ver@ces(
Topological Properties of Network-II
• Assorta@vity(r"– Similiarity(between(two(connected(
((((nodes(in(term(of(their(degrees(
– Nodes(might(tend(to(connect((
((((others(that(are(very(similar(or((
((((different(
– r>0(:(assor@ve(mixed(paVern(
– r<0:(dissor@ve(mixed(paVern(
Topological Properties of Network-III
• The(average(path(length(– Between( all( pairs( of( ver@ces( in( the( largest(component((
Dataset-I
(1)(Scien@fic(collabora@on(graph((real(network)((
– Contain(authors(and(papers(from(Condensed(MaVer(
sec@on((“condOmat”)(
– Provided(by(Los(Alamos(EOpoint(Archive((1995O1999)(
(2)(100(Random(bipar@te(graphs(
– With(the(same(size(of(ver@ces(and(edges(
Dataset-II
• Real(model( are( greatly( skewed( compared( to( random(
model(
(((((((:(Number(of(authors(per(paper((
(((((((:(Number(of(papers(per(authors(
(((((((:(Number(of(collaborators(per(author(
Dataset-III
• 1(
Dataset-IIII
• Real(network(vs.(Random(network(
– Strongly( nonOrandom( alloca@on( of( authors( over(
papers(
– Assor@@ve:(authors(with(many(collaborators(tend(to(
work(with(those(of(the(same(ilk(
Algorithms to measure effects
(1)(take(a(real(and(random(social(network(
(2)( remove( a( frac@on( of( en@@es( to( simulate(
different(sources(of(missing(data(
(3)(measure( network( proper@es( and( compare( to(
the(“complete”(network((
(4)(repeat((2)O(3)(to(make(a(sta@s@cs(
Outline
• Introduc@on((• 3(Sources(of(Missing(Data(in(SNs(
• Dataset(and(sta@s@cs(of(interest(! Results*and*Discussion*• Conclusion(
Comparison of Boundary Specification and non-response effects
Effects on Mean Vertex Degree: z
(• NonOrandom(alloca@on(of(actors(to(groups(leads(to(redundancy(
in(links(
• High( redundancy( implies( actors( are( likely( to( link( already(connected(actors(
BSPC(context�
BSPA(actor�
NRE("Non"response�
Clustering: C
(• Random(omission(of(actors(has(no(effects(on(clustering(
• Omission(of(contexts(reduce(triples(while(keep(triangles(high(
• NonOresponse(open(up(triples(faster(than(destroy(them(
BSPC"("context�
BSPA(actor�
NRE"("non"response�
Assortativity: r
(• BSPC(increase(degreeOtoOdegree(correla@on(
• NRE(causes(it(to(diminish(to(a(disassorta@ve(mixing(paVern(
• Assorta@vely(mixed(network(are(more( robust( to( removal(of( ver@ces(than(disassorta@ve?((
– Not(exactly(as(we(can(see.(Social(Network(may(have( less(assorta@vity( than(they(appaer(to(have((overes@mated)((
BSPA(actor�
NRE"("non"response�
BSPC("context�
Size of the largest connected component: S
(
• Omission( of( actors( leads( severe( breakdown( of( network(
connec@vity(
• Assota@ve(correla@ons(make(graph(robust(to(random(damages?(
(Disagree(in(out(simula@on)(
– Assorta@vity(alone(does(not(necessarily(imply(network(robustness(
BSPA(actor�NRE"("non"response�
BSPC("context�
Fixed Choice Effect
• Three(cases(to(omit(data(
(1)( Only( record( K( interac@on( contexts/papers( out( of(
average(((((for(every(actor(
(2)( Each( actor( nominates( up( to( X( out( of( average( z(
interac@on( partners( (NonOreciprocated,( the( link( is(
present(if(either(of(or(both(report(it)(
(3)(The(same(to(2,(but(reciprocated(
Fixed Choice Effect (on Mean degree z)
(
• Real(network(has(more(severe(effect(than(random(network(
• Fixed( Choice( does( not( affect(mean( degree( z(when( k>3( or( x>3(
(enough(nomina@ons)(
Dots:"contexts""(FCC)"""""Square:"actor"non(reciprocated"(FCA)"""""Stars:"only"reciprocated"(FCR)"�
Fixed Choice Effect (on other network properties)
• The(real(network(appear(to(more(sensi@ve(to(fixed(bound(
effects(
– Due( to( the( joint( effect( of( nonOrandom( (assorta@ve)( mixing( and(
skewed(degree(distribu@on((
– LiVle(effect(on(random(graph(because(its(degree(variance(is(small(
– Elimina@ng(most(connec@ons(within(the(network(core(will(quickly(
break(down(the(giant(component((
Conclusions
• Discuss( the( effects( on( network( topological( proper@es(by(different(sources(of(missing(data(
• Boundary( specifica@on( alters( network( proper@es(greatly(even(if(context(redundancy(is(large(
• The( effect( of( fixed( choice( depend( on( the( vertex(distribu@on(and(mixing(paVern(
• Provide( a( sta@s@cal( guidance( to( researches( in( data(analysis(of(SN(
Thank you