33
Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou Hasso Plattner Institute

Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

  • Upload
    others

  • View
    44

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Graph Mining: Overview of different graph

models

Graph Mining course Winter Semester 2016

DavideMottin,KonstantinaLazaridouHassoPlattnerInstitute

Page 2: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Lecture road

2

Anomalydetection(previouslecture)

RepresentativesofProbabilistic(Uncertain)graphs

IntroductiontoSignednetworks

Page 3: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Graph models▪ Graphsareeverywhere!▪ Variousinterestingmodelsthatwehaven’tanalyzedinthelecture..

• graphstreams• evolvinggraphs• attributedgraphs• probabilisticgraphs• signedgraphs• coloredgraphs• ...

3

Page 4: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Definitions

▪ Graphstream• sequenceofunorderedpairse={u,v}whereu,v∈ [n],S=(e1,e2,...,emi)

▪ Timeevolvinggraph• sequenceofstaticgraphs{G1,G2,...,Gn},whereGt=(Vt,Et)isasnapshotoftheevolvinggraphattimestampt

▪ Attributedgraph• G=(V,E,A)whereVisthevertexset,Eistheedgeset,andAistheattributesetthatcontainsunaryattributeai(linkedtoeachnodeni)andbinaryattributeaij (linkedtoeachedgeek=(ni ,nj )∈ E),

▪ Coloredgraph• G=(V,E)inwhicheachvertexisassignedacolor.

⁃ properlycoloredgraph:colorassignmentsconformtothecoloringrulesappliedtothegraph

4

Page 5: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Probabilistic graphs - Outline

▪ Uncertaintyindata▪ Introductiontouncertaingraphs• Modeldefinition• Applications• Problems

▪ Findingrepresentativesinprobabilisticgraphs• Problemdefinition• Algorithms

GRAPH MINING WS 2016 5

Page 6: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Uncertainty in data

▪ Noiseingeneration• sensors

▪ Noiseincollection• missinginstances

▪ Biologicaldata• protein-proteininteractionprobability

▪ Problem’snature• risk,trust,influence,status

▪ Anonymizeddata• privacy preservation of user generated data

GRAPH MINING WS 2016 6

Page 7: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

What is an uncertain graph?

▪ Agraphwhereeachedgehasanassociatedprobabilityp:[0,1]toit

GRAPH MINING WS 2016

Figure 1: (left) An unweighted probabilistic graph G, (right) G with the expected vertex degrees (in Italics)associated to each node

7

Page 8: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Possible applications and problems

▪ Modellingofprobabilitiesinprotein-proteininteractiongraphs▪ Modellingrelationshipsinsocialgraphs

GRAPH MINING WS 2016

▪ Problemsthatapplytodeterministicgraphs• algorithmsneedtoberedesignedtoincorporateuncertainty

▪ Dataanonymization• oneofthepossibleworldscorrespondstheoriginaldata

▪ Frequentsubgraphmining• frequencyisredefinedusingtheedgeprobabilities

▪ Queriesbasedonshortestpaths• returns paths with very low probabilities

8

Page 9: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Graph model definition

▪ AprobabilisticgraphisrepresentedasG =(V, E,W,p),whereV isthesetofvertices,E isthesetofedges,forweightedgraphsW:V х V →Rdenotestheweightsassociatedwitheveryedgeandpmapseverypairofnodestoarealnumber in[0,1]▪ puv representstheprobabilitythatedge(u,v)existsintheuncertainnetwork▪ ForaprobabilisticgraphG,2 " deterministicgraphscanbegenerated

• thesegraphsarecalledpossibleworlds

GRAPH MINING WS 2016 9

Page 10: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Possible world semantics [1]▪ Oftenintheliteratureitisassumedthattheedgeprobabilities

areindependent• isthisalwaysthecase?

GRAPH MINING WS 2016

[1] S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds, SIGMOD 1987

▪ Forsimplicity,variousapproachestreattheprobabilitiesoftheedgesasweights

▪ Othersonlyconsidertheedgeshavingaprobabilityp>t• notvalidassumptionsinmanyscenarios!

10

Page 11: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Sampling

▪ TheprobabilitythatacertaingraphG=(V,E)willbesampledfromGiscomputedasfollows:•P[G]=Π(u,v)ϵEPuv*Π(u,v)ϵ(VxV)\E(1– Puv)

▪ GivenGandthevertexdegrees,wecanalsocalculatedthevertexdiscrepancies•disu(G)=degu(G)– degu(G),whereuisanodeinG•G’sdiscrepancyisdefinedasthesumofallnodediscrepancies

GRAPH MINING WS 2016

Figure 2: (left) G with the expected vertex degrees associated to each node, (right) a certain instance G of Gwith the vertex discrepancies

G=argminG:worldofG Δ(G)

11

Page 12: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

What if we could work on a deterministic graph instead? How do we benefit?

▪ Computationalcomplexitywouldbemuchlower!▪ Traditionaldataminingalgorithmscouldbeapplied▪ Whichcharacteristicsshouldthiscertaingraphmaintainfromtheuncertainone?• samenumberofvertices..•whichedgesshouldbeincluded?

GRAPH MINING WS 2016 12

Page 13: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Outline - Probabilistic graphs

▪ Uncertaintyindata▪ Introductiontouncertaingraphs• Modeldefinition• Applications• Problems

▪ Findingrepresentativesinprobabilisticgraphs• Problemdefinition• Algorithms

GRAPH MINING WS 2016 13

Page 14: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Finding representatives in probabilistic graphs [2]▪ ArepresentativeGofaprobabilisticgraphG isadeterministicgraphthatitsverticeswillpresenttheleastpossiblediscrepancy

▪ Moreformally• Given an undirected uncertain graph G = (V, E, W, p), the representative isan exact instance G of G (possible world), such that each vertex degreewill have the minimum deviation from its expected value

GRAPH MINING WS 2016

[2]ThePursuitofaGoodPossibleWorld:ExtractingRepresentativeInstancesofUncertainGraphs,PanosParchaset.al,ACMSIGMOD2014

14

Page 15: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Introduced algorithms▪ Baseline1:Greedyprobability

• eachedgee=(u,v)belongstoG,ifitdecreasesthetotaldiscrepancy

▪ Baseline2:Mostprobable• eachedgee=(u,v)belongstoG,ifpe ≥0.5holds

▪ ADR(averagedegreerewiring)• aimsatpreservingtheexpectedaveragedegreeofG

▪ ABM(approximateb-matching)• preserves the expected vertex degrees

GRAPH MINING WS 2016 15

Page 16: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

ADR: average degree rewiring

▪ Whatistheexpectedaveragedegree?• degavg(G)=2*P/|V|,wherePisthesumofalledgeprobabilitiesinG

▪ Inordertopreserveit,GshouldcontainexactlyPedges▪ MainstepsofADR

•ConstructaseedsetE0oftheedgesinG• Foragivennumberoftimesk⁃SwaptheedgesinE0withedgesinE\E0,sothattheoveralldiscrepancyoftherepresentativedecreases

GRAPH MINING WS 2016 16

Page 17: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Pseudocode

▪ Initialization,computationofP,sortEindecreasingorderbytheedgeprobabilities▪ ForeacheinE

• ifrandomx<=pe:insertintoE0,updateG

▪ C=E\E0▪ Forktimes

• ForeachnodeuinG⁃I=incidentedgesofu⁃chooserandomlye1inIande2inCtoswap⁃computetheoveralldiscrepancybeforeandafterthepotentialswap⁃ifimprovement:swape1withe2inE,Crespectively,updatediscrepancies

GRAPH MINING WS 2016 17

Page 18: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

ADR example: edge probabilities

GRAPH MINING WS 2016 18

Page 19: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

ADR: a possible world and the discrepancies

GRAPH MINING WS 2016 19

Page 20: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

ADR: first iterations

GRAPH MINING WS 2016 20

Page 21: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

d1+d2 < 0 explanation▪ Forreplacing(u,v)with(x,y)

▪ d1=|disu(G)- 1|+|disv(G)– 1|- (|disu(G)|+|disv(G)|)

▪ d2=|disx(G)+1|+|disy(G)+1|- (|disx(G)|+|disy(G)|)

▪ Sumuv_bef=|disu(G)|+|disv(G)|

▪ Sumuv_after=|disu(G)– 1|+|disv(G)– 1|

▪ Sumxy_bef=|disx(G)|+|disy(G)|

▪ Sumxy_after=|disx(G)+1|+|disy(G)+1|

▪ d1=Sumuv_after– Sumuv_bef

▪ d2=Sumxy_after– Sumxy_bef

▪ Ifd1andd2arepositive,then• Sumuv_after>Sumuv_bef

• Sumxy_after>Sumxy_bef⁃ noneoftheunderlyingnodesbenefitsfromtheswap...

GRAPH MINING WS 2016 21

Page 22: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

References

▪ Uncertaindata•Ontherepresentationandqueryingofsetsofpossibleworlds•Asurveyofuncertaindataalgorithmsandapplications

▪ Uncertaingraphs• Thepursuitofagoodpossibleworld:extractingrepresentativeinstancesofuncertaingraphs

•Uncertaingraphsparsification•Uncertaingraphprocessingthroughrepresentativeinstances• Triangle-basedrepresentativepossibleworldsofuncertaingraphs•Clusteringlargeprobabilisticgraphs•Algorithmsformininguncertaingraphdata•K-nearestneighborsinuncertaingraphs

GRAPH MINING WS 2016 22

Page 23: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Lecture road

23

Anomalydetection

RepresentativesofProbabilistic(Uncertain)graphs

Introductiontosignednetworks

Page 24: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

What is a signed network?

▪ ItisagraphG=(V,E),whereeachedgeismappedtoasign▪ Asigncanbepositiveornegative▪ Thesignofapathistheproductofthesignsofitsedges▪ Typicallyasignednetworkisdenotedby:

• Σ=G(V,E,σ),whereσ,orthesignatureofthegraph,isthefunctionσ:E->(+,-)

GRAPH MINING WS 2016

u v

k

+/-

+/-+/-

24

Page 25: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

What is balance?

▪ History..• FritzHeider (psychologist)andFrankHarary (mathematician)laythefoundationsofthesignedgraphsandthebalancetheory

• OriginalideaofP-O-Xmodel⁃ howaresocialrelationsmodeled?aretheybalanced?

GRAPH MINING WS 2016

“Theenemyofmyenemyismyfriend”!

P O

X

+

+-

25

Page 26: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Example of the P-O-X model

▪ ImaginethatyouarepersonPandthatOissomeone,whomyouthinkhighlyof,nowimagineXisapresidentialcandidateyoudislike,butXvehementlyendorseesO.▪ Whatdoyoususpectwouldhappen?

GRAPH MINING WS 2016

+

+-

thesituationisunbalanced...

PneedstoagreewithhisfriendO,orneedstounfriendO!

26

Page 27: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Balance theory

▪ Theorem1:Gisbalancedifeverypathpbetweenu,vhavethesamesign▪ Theorem2:AsignedgraphisbalancedifandonlyifVcanbebipartitioned,s.t.eachedgebetweenthepartsisnegativeandeachedgewithinapartispositive

GRAPH MINING WS 2016 27

Page 28: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Status theory [3]

▪ Thesignsinbalancetheoryareperceivedaslikes/dislikes▪ Cantheyalsoindicateanotherrelation?

• inthecontextofdirectedsocialnetworks,theintentionoftheusercreatingthelinkmatters..

GRAPH MINING WS 2016

P O

X

+

-

“I think O has a lower status than I do”

“I think O has a higher status than I do”

[3]SignedNetworksinSocialMedia,JureLeskovec,SIGCHI2010

28

Page 29: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Some possible applications

▪ ModellinginteractionsinChemical/Biologicalnetworks▪ Socialnetworkanalysis▪ Politicalandeconomicalrelations

GRAPH MINING WS 2016

GraphAlgorithms,ApplicationsandImplementations,CharlesPhillips

29

Page 30: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

References

▪ Morematerial• Signedgraphs,MatthiasBeck•GraphAlgorithms,ApplicationsandImplementations,CharlesPhillips•Harary:Onthenotionofbalanceofasignedgraph•Networks,Crowds,andMarkets:ReasoningaboutaHighlyConnectedWorld,Chapter5:PositiveandNegativeRelationships

▪ Researchproblemsonsignedgraphs• Signedgraphsinsocialmedia•CommunityMininginSignedSocialNetworks–AnAutomatedApproach•PolarityRelatedInfluenceMaximizationinSignedSocialNetworks•NodeClassificationinSignedSocialNetworks•PredictingPositiveandNegativeLinksinOnlineSocialNetworks

GRAPH MINING WS 2016 30

Page 31: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

In the next episodes …

3rdpresentationdate

CourseEvaluation

Examsandmaybemore…!

31

Page 32: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

Questions?

32

Page 33: Graph Mining: Overview of different graph models · Graph Mining: Overview of different graph models Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou

References▪ Akoglu,L.,McGlohon,M.andFaloutsos,C..Oddball:Spottinganomaliesinweightedgraphs.PAKDD,2010.

▪ Tong,H.andLin,C.Y.Non-NegativeResidualMatrixFactorizationwithApplicationtoGraphAnomalyDetection. In SDM,2011.

▪ Xing,E.P.,Ng,A.Y.,Jordan,M.I.andRussell,S.Distancemetriclearningwithapplicationtoclusteringwithside-information.In NIPS,2002.

33