25
Networks from data bases V. Batagelj Two mode networks Multiplication Derived networks Pajek Big data Networks from data bases Vladimir Batagelj University of Ljubljana Undicesima conferenza nazionale di statistica Rome, February 20-21, 2013 V. Batagelj Networks from data bases

V. Batagelj - Big data Networks from data bases

Embed Size (px)

Citation preview

Page 1: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Big dataNetworks from data bases

Vladimir Batagelj

University of Ljubljana

Undicesima conferenza nazionale di statisticaRome, February 20-21, 2013

V. Batagelj Networks from data bases

Page 2: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Outline

1 Two mode networks2 Multiplication3 Derived networks4 Pajek

V. Batagelj Networks from data bases

Page 3: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Example: Internet Movie Data Base

Die Another Day

Casino Royale

Skyfall

Lee Tamahori

Martin Campbell

Paul Haggis

Sam Mendes

Neal Purvis

Robert Wade

John Logan

Ian Fleming

Pierce Brosnan

Daniel Craig

Judi Dench

Halle Berry

Javier Bardem

Ralph Fiennes

Eva Green

Mads Mikkelsen

On February 17, 2013 IMDB (Internet Movie Data Base) contained 2,262,638 titles and 4,745,392 names.Web of Science, Scopus, Zentralblatt Math, Google Scholar, DBLP, Amazon, etc.

V. Batagelj Networks from data bases

Page 4: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Two mode networks from data bases

A simple data base B is a set of records B = {Rk : k ∈ K}, where K is theset of keys. A record has the form Rk = (k, q1(k), q2(k), . . . , qr (k)) whereqi (k) is the description of the property (attribute) qi for the key k.Suppose that the description q(k) takes values in a finite set Q. It canalways be transformed into such set by partitioning the set Q and recodingthe values. Then we can assign to the property q a two-mode networkK × q = (K,Q,L,w) where (k, v) ∈ L iff v ∈ q(k). w(k, v) is the weightof the link (k, v); often w(k, v) = 1.Single-valued properties can be represented by a partition.

Examples:(papers, authors, was written by),(papers, keywords, is described by),(parlamentarians, problems, positive vote),(persons, journals, is reading),(persons, societies, is member of, years of membership),(buyers/consumers, goods, bought, quantity), etc.

V. Batagelj Networks from data bases

Page 5: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Methods: degree distributions

In a network (V,L) the degree deg(v) of vertex v ∈ V is equalto the number of links that have vertex v as their end-vertex.The indegree / outdegree is equal to the number of incoming /outgoing links.Usually one of the first analyses of a network is to look at itsdegree distribution(s). Are there isolated nodes (deg(v) = 0)?Which are the nodes with the largest degrees? What is theaverage degree? What is the shape of degree distribution?

V. Batagelj Networks from data bases

Page 6: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Methods: two-mode cores and 4-rings weights

The subset of vertices C ⊆ V is a (p, q)-core in a two-mode networkN = (V1,V2;L), V = V1 ∪ V2 iff

a. in the induced subnetwork K = (C1,C2;L(C )), C1 = C ∩ V1,C2 = C ∩ V2 it holds ∀v ∈ C1 : degK(v) ≥ p and∀v ∈ C2 : degK(v) ≥ q ;

b. C is the maximal subset of V satisfying condition a.

A k-ring is a simple closed chain of length k . Using k-rings we candefine a weight of edges aswk(e) = # of different k-rings containing the edge e ∈ E

In two-mode network there are no 3-rings. The densest substructures arecomplete bipartite subgraphs Kp,q.They contain many 4-rings. There-fore these weights can be used toidentify the dense parts of a network.

V. Batagelj Networks from data bases

Page 7: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Example: (247,2)-core and (27,22)-core in IMDB

Royal Rumble

Survivor Series

Dumas, AmyEllison, LillianGarcía, LiliÆnGuenard, NidiaHulette, ElizabethKai, LeilaniKeibler, StacyLaurer, JoanieMartel, SherriMartin, Judy (II)McMahon, StephanieMcMichael, DebraMero, RenaMoore, Carlene (II)Moore, Jacqueline (VI)Moretti, LisaPsaltis, Dawn MarieRobin, Rockin’Runnels, TerriStratus, TrishVachon, AngelleWilson, TorrieWright, JuanitaYoung, Mae (I)Adams, Brian (VI)Ahrndt, JasonAl-Kassi, AdnanAlbano, LouAnderson, ArnAndrØ the GiantAngle, KurtAnoai, ArthurAnoai, MattAnoai, RodneyAnoai, SamAnoai, SolofatuApollo, PhilAustin, Steve (IV)Backlund, BobBarnes, Roger (II)Bass, Ron (II)Batista, DaveBenoit, Chris (I)Bigelow, Scott ’Bam Bam’Bischoff, EricBlackman, Steve (I)Blair, Brian (I)Blanchard, TullyBlood, RichardBloom, Matt (I)Bloom, WayneBresciano, AdolphBrisco, GeraldBrunzell, JimBuchanan, Barry (II)Bundy, King KongCalaway, MarkCandido, ChrisCanterbury, MarkCena, John (I)Centopani, PaulChavis, ChrisClarke, BryanClemont, PierreCoachman, JonathanCoage, AllenCole, Michael (V)Connor, A.C.Constantino, RicoCopeland, Adam (I)Cornette, James E.Darsow, BarryDavis, Danny (III)DeMott, WilliamDiBiase, TedDouglas, ShaneDuggan, Jim (II)Eadie, BillEaton, Mark (II)Enos, Mike (I)Eudy, SidFarris, RoyFatu, EddieFifita, UliuliFinkel, HowardFlair, RicFoley, MickFrazier Jr., NelsonFujiwara, HarryFunaki, ShoGarea, TonyGasparino, PeterGill, DuaneGoldberg, Bill (I)Gray, George (VI)Guerrero Jr., ChavoGuerrero, EddieGunn, Billy (II)Guttierrez, OscarHall, Scott (I)Hardy, Jeff (I)Hardy, MattHarris, Brian (IX)Harris, Don (VII)Harris, Ron (IV)Hart, BretHart, Jimmy (I)Hart, OwenHart, StuHayes, Lord AlfredHeath, David (I)Hebner, DaveHebner, EarlHeenan, BobbyHegstrand, MichaelHelms, ShaneHennig, CurtHenry, Mark (I)Hernandez, RayHeyman, PaulHickenbottom, MichaelHogan, HulkHollie, DanHorn, BobbyHorowitz, BarryHouston, SamHoward, JamieHoward, Robert WilliamHuffman, BookerHughes, DevonHyson, MattJackson, TigerJacobs, GlenJames, Brian (II)Jannetty, MartyJarrett, Jeff (I)Jericho, ChrisJohnson, Ken (X)Jones, Michael (XVI)Keirn, SteveKelly, Kevin (VIII)Killings, RonKnight, Dennis (II)Knobs, BrianLauer, David (II)Laughlin, Tom (IV)Laurinaitis, JoeLawler, Brian (II)Lawler, JerryLayfield, JohnLeinhardt, RodneyLeslie, EdLesnar, BrockLevesque, Paul MichaelLevy, Scott (III)Lockwood, MichaelLoMonaco, MarkLong, TeddyLothario, JoseManna, MichaelMarella, Joseph A.Marella, RobertMartel, RickMartin, Andrew (II)Matthews, Darren (II)McMahon, ShaneMcMahon, VinceMero, MarcMiller, ButchMoody, William (I)Mooney, Sean (I)Morgan, Matt (III)Morley, SeanMorris, Jim (VII)Muraco, DonNash, Kevin (I)Neidhart, JimNord, JohnNorris, Tony (I)Nowinski, ChrisOkerlund, GeneOrton, RandyOttman, FredPage, DallasPalumbo, Chuck (I)Peruzovic, JosipPettengill, ToddPfohl, LawrencePiper, RoddyPlotcheck, MichaelPoffo, LannyPowers, Jim (IV)Prichard, TomRace, HarleyReed, Bruce (II)Reiher, JimReso, JasonRhodes, Dusty (I)Rivera, Juan (II)Roberts, Jake (II)Rock, TheRoss, Jim (III)Rotunda, MikeRougeau Jr., JacquesRougeau, RaymondRude, RickRunnels, DustinRuth, GlenSags, JerrySaturn, PerrySavage, RandyScaggs, CharlesSenerca, PeteShamrock, KenShinzaki, KensukeSimmons, Ron (I)Slaughter, Sgt.Smith, Davey BoySnow, AlSolis, MercidSteiner, Rick (I)Steiner, ScottStorm, LanceSzopinski, TerryTajiri, YoshihiroTanaka, PatTaylor, Scott (IX)Taylor, Terry (IV)Tenta, JohnTraylor, RaymondTunney, JackVailahi, SioneValentine, GregVan Dam, RobVaziri, Kazrowvon Erich, KerryWalker, P.J.Waltman, SeanWare, David (II)Warrington, ChazWarriorWhite, LeonWickens, BrianWight, PaulWilson, Al (III)Wright, Charles (II)Zhukov, Boris (I)

Fully Loaded

Invasion

King of the Ring

No Way Out

Royal Rumble

Summerslam

Survivor Series

Wrestlemania 2000

Wrestlemania X-8

Wrestlemania X-Seven

WWE Armageddon

WWE Judgment Day

WWE No Mercy

WWE No Way Out

WWE SmackDown! Vs. Raw

WWE Unforgiven

WWE Vengeance

WWE Wrestlemania X-8

WWE Wrestlemania XX

WWF Backlash

WWF Insurrextion

WWF Judgment Day

WWF No Mercy

WWF No Way Out

WWF Rebellion

WWF Unforgiven

WWF Vengeance

’Raw Is War’

’Sunday Night Heat’

’WWE Velocity’

’WWF Smackdown!’

Dumas, Amy

Keibler, StacyMcMahon, Stephanie

Stratus, TrishAngle, KurtAnoai, SolofatuAustin, Steve (IV)Benoit, Chris (I)Bloom, Matt (I)Calaway, MarkCole, Michael (V)Copeland, Adam (I)Guerrero, EddieGunn, Billy (II)Hardy, Jeff (I)Hardy, Matt

Hebner, EarlHeyman, PaulHuffman, BookerHughes, Devon

Jacobs, GlenJericho, ChrisLawler, JerryLayfield, JohnLevesque, Paul Michael

LoMonaco, Mark

Martin, Andrew (II)

Matthews, Darren (II)

McMahon, ShaneMcMahon, VinceReso, JasonRock, TheRoss, Jim (III)Senerca, PeteSimmons, Ron (I)

Taylor, Scott (IX)Van Dam, Rob

Wight, Paul

IMDB 2005: n1 = 428440, n2 = 896308, m = 3792390.

V. Batagelj Networks from data bases

Page 8: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Example: Islands for w4 / Charlie Brown and Adult

Be My Valentine, Charlie Brown

Boy Named Charlie Brown

Charlie Brown Celebration

Charlie Brown Christmas

Charlie Brown Thanksgiving

Charlie Brown’s All Stars!

He’s Your Dog, Charlie Brown

Is This Goodbye, Charlie Brown?

It’s a Mystery, Charlie Brown

It’s an Adventure, Charlie Brown

It’s Flashbeagle, Charlie Brown

It’s Magic, Charlie Brown

It’s the Easter Beagle, Charlie Brown

It’s the Great Pumpkin, Charlie Brown

Life Is a Circus, Charlie Brown

Making of ’A Charlie Brown Christmas’

Play It Again, Charlie Brown

Race for Your Life, Charlie Brown

Snoopy Come Home

There’s No Time for Love, Charlie Brown

You Don’t Look 40, Charlie Brown

You’re a Good Sport, Charlie Brown

You’re In Love, Charlie Brown

You’re Not Elected, Charlie Brown

Charlie Brown and Snoopy ShowAltieri, Ann

Dryer, Sally

Mendelson, Karen

Momberger, Hilary

Stratford, Tracy

Brando, Kevin

Hauer, Brent

Kesten, Brad

Melendez, Bill

Ornstein, Geoffrey

Reilly, Earl ’Rocky’

Robbins, Peter (I)

Schoenberg, Jeremy

Shea, Christopher (I)

Shea, Stephen

Pajek

Boy, T.T.

Byron, Tom

Davis, Mark (V)

Dough, Jon

Drake, Steve (I)

Horner, Mike

Jeremy, Ron

Michaels, Sean

Morgan, Jonathan (I)

North, Peter (I)

Sanders, Alex (I)

Savage, Herschel

Silvera, Joey

Thomas, Paul (I)

Voyeur, Vince

Wallice, Marc

West, Randy (I)

Pajek

V. Batagelj Networks from data bases

Page 9: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Sparsity and Dunbar’s number

Networks obtained from data bases are usually large – tens ofthousands or millions of nodes. Large networks are usuallysparse – they have small average degree.

In one-mode networks describing relations among people thiscan be related to Dunbar’s number with a value around 150.See Wikipedia: Dunbar’s number.

In general, if initiator of a link wants to keep the link he shouldspend / invest a certain amount of finite total ”energy” he has.

V. Batagelj Networks from data bases

Page 10: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Multiplication of networks

To a simple two-mode network N = (I,J , E ,w); where I and J aresets of vertices, E is a set of edges linking I and J , and w : E → R(or some other semiring) is a weight; we can assign a network matrixW = [wi,j ] with elements: wi,j = w(i , j) for (i , j) ∈ E and wi,j = 0otherwise.Given a pair of compatible networks NA = (I,K, EA,wA) andNB = (K,J , EB ,wB) with corresponding matrices AI×K and BK×Jwe call a product of networks NA and NB a networkNC = (I,J , EC ,wC ), where EC = {(i , j) : i ∈ I, j ∈ J , ci,j 6= 0} andwC (i , j) = ci,j for (i , j) ∈ EC . The product matrixC = [ci,j ]I×J = A ∗ B is defined in the standard way

ci,j =∑k∈K

ai,k · bk,j

In the case when I = K = J we are dealing with ordinary one-modenetworks (with square matrices).

V. Batagelj Networks from data bases

Page 11: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Multiplication of networks

KI

J

i

k

j

A B

ai,k

bk,j

ci,j =∑k∈K

ai,k · bk,j

If all weights in networks NA and NB are equal to 1 the value of ci,jcounts the number of ways we can go from i ∈ I to j ∈ J passing

through K.

V. Batagelj Networks from data bases

Page 12: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Multiplication of networks

The standard matrix multiplication has the complexityO(|I| · |K| · |J |) – it is too slow to be used for large networks.For sparse large networks we can multiply much fasterconsidering only nonzero elements.In general the multiplication of large sparse networks is a’dangerous’ operation since the result can ’explode’ – it is notsparse.If for the sparse networks NA and NB there are in K only fewvertices with large degree and no one among them with largedegree in both networks then also the resulting productnetwork NC is sparse.

V. Batagelj Networks from data bases

Page 13: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Derived networks

From a bibliographical data base we get two-mode networks WA =Works × Authors and WK = Works × Keywords. Since they have acommon set Works the networks WAT and WK are compatible andmultiplying them we obtain a derived network

AK = WAT ∗WK

The entry akit = number of times author i used in his/her workskeyword t.

The dataset of EU projects on simulation (January 2006) containsdata about research groups. We obtain networks: P = Groups ×Projects, C = Groups × Countries, and U = Groups × Institutions.Sizes: |Groups| = 8869, |Projects| = 933, |Institutions| = 3438,|Countries| = 60.

In the derived network W = Projects × Institutions = PT ∗U we

determine link islands for w4.

V. Batagelj Networks from data bases

Page 14: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Analysis of Projects × Institutions

502909

IST-2000-30082

G4RD-CT-2000-00395

IST-2000-29207

502842

IST-2000-28177

G4RD-CT-2002-00795

G4RD-CT-2000-00178BRPR987001

G4RD-CT-2002-00836

502896

502889

502917

G4RD-CT-2001-00403

G4MA-CT-2002-00022

BRST985352

506257

28283

506503

EVG3-CT-2002-80012

50108429817

ENK6-CT-2002-30023

IST-2000-30158

511758

7210-PR/142

25525

7215-PP/034

T3.5/99

JOE3980089

SMT4982223

7210-PR/163

7210-PR/233

7215-PP/031

T3.2/99

IST-1999-56418

IST-1999-57451

HPSE-CT-2002-00108IST-2001-35358

ENK5-CT-2000-00335

JOR3980200

QLK6-CT-2002-02292

7210-PR/095

HPSE-CT-2002-00143

A.S.M. S.A.

AGRO-SAT CONSULTING

AIRBUS DEUTSCHLAND

AIRBUS FRANCE SAS

AIRBUS UK LIMITED

ALBERTSEN & HOLM AS

ALENIA AERONAUTICA SPA

ARMINES

ASM - DIMATEC INGENIERIA

BAE SYSTEMS

BARCO NV

BARTENBACH

BAYER. ROTES KREUZ

BBL

BICC GENERAL CABLE

BRITISH STEEL

BROD THOMASSON

BUILDING RESEARCH

BUURSKOV

CATALYSE SARL

CENTRE DE RECH. METALLURG.

CENTRE DE ROBOTIQUE

CENTRE FOR EUROP. ECONOMIC

CSTB

C. R. FIAT S.C.P.A.

CHALMERS TEKNISKA HOEGSKOLA

CHIPIDEA - MICROELECTRONICA, S.A.

CINAR LTD.

COLOPLAST A/S

CRE GROUP LTD.

DAIMLER CHRYSLER AG

DASSAULT AVIATION

DATASYS S.R.O.

DE ZENTRUM FUER LUFTUND RAUMFAHRT E.V.

DFA DE FERNSEHNACHRICHTEN AGENTUR

DISENO DE SISTEMAS EN SILICIO

DPME ROBOTICS AB

EA TECH. LTDEADS DE

EDAG ENGINEERING + DESIGN

ENEL.IT

ENERGITEKNIK HEATEX AB

ENERGY RESEARCH CENTRE NL

ESI SOFTWARE SA

EUROCOPTER S.

FFT ESPANA TECH. DE AUTOMOCION,

FONDAZIONE ENI - ENRICO MATTEI

FRAUENHOFER INST. FUERMATERIALFLUSS UND LOGISTIK

FRAUENHOFER INST. FUER PRODUKTIONSTECH. UND AUTOMATISIERUNG

FRIMEKO INT. AB

GATE5 AG

GUNNESTORPS SMIDE & MEKANISKA AB

HELP SERVICE REMOTE SENSING

IFEN GES. FUER SATELLITENNAVIGATION

ILEVO AB

INDUSTRIAS ROYO

INGENIORHOJSKOLEN HELSINGOR TEKNIKUM

INOX PNEUMATIC AS

INST. CARTOGRAFIC DE CATALUNYA

INST. DE RECHERCHESDE LA SIDERURGIE FR

INST. FUER TEXTIL UNDVERFAHRENSTECH. DENKENDORF

INST. NAT. DE RECHERCHESUR LES TRANSPORTS ET LEUR SÉCURITÉ

INST. SUPERIOR TECNICO

JERNKONTORET

KBC MANUFAKTUR, KOECHLIN,BAUMGARTNER UND CIE. AG

KOMMANDITGES. HAMBURG 1FERNSEHEN BETEILIGUNGS & CO

LANDIS & GYR - EUROPE AG

LESPROJEKT SLUZBY S.R.O.

LH AGRO EAST S.R.O.

LKSOFTWARE

LMS UMWELTSYS.E, DIPL. ING. DR. HERBERT BACK MECALOG SARL

MEFOS, FOUNDATION FORMETALLURGICAL RESEARCH

MJM GROUP, A.S.

MSO CONCEPT INNOVATION + SOFTWARE

MTU AERO ENGINES

NAT. TEC. UNIV. OF ATHENS NL ORG. FOR APPLIEDSCIENTIFIC RESEARCH - TNO

OESTERREICHISCHER BERGRETTUNGSDIENST

OFFICE NAT. DETUDES ETDE REC. AEROSPATIALES

OK GAMES DI ALESSANDRO CARTA

ORAD HI TEC SYS. POLAND

OSAUHING EETRIUKSUS

POLYMAGE SARL

PROLEXIA

PSI FUR PRODUKTE UNDSYS.E DER INFORMATIONSTECH.

RESEARCH INST. OF THE FINNISH ECONOMY

ROSENHEIMER GLASTECH.

RUDOLF BRAUNS AND CO. KG

SHERPA ENGINEERING SARL

SNECMA MOTEURS SA

SPORTART

SSAB TUNNPL¯T

STICHTING NATIONAAL LUCHT

SUPERELECTRIC DICARLO PAGLIALUNGA & C. SASSVETS & TILLBEHOR AB

TECHNOFARMING S.R.L.

TESSITURA LUIGI SANTI SPA

THE AARHUS SCHOOL OF BUSINESS

THYSSENKRUPP STAHL A.G.

TPS TERMISKA PROCESSER AB

TQT SRL

TRUMPF-BLUSEN-KLEIDERWALTER GIRNER UND CO. KG

UAB LKSOFT BALTIC

UNIV. DE ZARAGOZA

UNIV. DER BUNDESWEHR MUENCHEN

UNIV. PANTHEON-ASSAS - PARIS II

UNIV. OF ABERDEEN

UNIV. OF MACEDONIA

VOEST-ALPINE STAHL

VOLKSWAGEN AG

WISDOM TELE VISION

WYKES ENGINEERING COMPANY

YAHOO! DE

ZAMISEL D.O.O

Pajek

V. Batagelj Networks from data bases

Page 15: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Collaboration networks

Let WA be the works × authors two mode network; wapi ∈ {0, 1} isdescribing the authorship of author i of work p.∑

i∈A

wapi = deg(p) = # of authors of work p

Let N be its normalized version, ∀p ∈W :∑

i∈A npi = 1, obtainedfrom WA by npi = wapi/ deg(p), or by some other rule determiningthe author’s contribution.The first collaboration network Co = WAT ∗WA

coij =∑p∈W

wapiwapj =∑

p∈N(i)∩N(j)

1

coij = the number of works that authors i and j wrote together.Problem: The Co network is composed of complete graphs on theset of work’s authors. Works with many authors produce largecomplete subgraphs.

V. Batagelj Networks from data bases

Page 16: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Cores of orders 10–21 in Computational Geometry

L.J.Guibas

M.Sharir

L.P.Chew

M.Flickner

M.J.vanKreveld

D.G.Kirkpatrick

W.J.Lenhart

S.P.Fekete

F.Hurtado

B.Chazelle

D.White

K.R.Romanik

N.M.Amato

T.D.Blacker

J.S.Snoeyink

T.C.Shermer

D.Z.Chen

D.P.Dobkin

H.Alt

F.P.Preparata

J.Erickson

J.E.Hershberger

C-K.Yap

M.Whitely

J-D.Boissonnat

S.J.Fortune

R.L.S.Drysdale

J.Harer

D.M.Avis

O.Schwarzkopf

J.S.B.Mitchell

D.Bremner

H.A.El-Gindy

D.Steele

B.Dom

J-R.SackM.H.Overmars

V.Sacristan

O.Aichholzer

R.Pollack

D.H.Rappaport

S.H.Whitesides

D.Eppstein

E.D.Demaine

M.T.Goodrich

D.M.Mount

S-W.Cheng

D.L.Souvaine

S.A.Mitchell

D.PetkovicP.Yanker

M.W.Bern

P.K.Agarwal

I.G.Tollis

T.J.Tautges

H.Edelsbrunner

T.L.Edwards

H.Imai

E.M.Arkin

R.Wenger

S.E.Benzley

P.Plassmann

M.T.deBerg

D.Halperin

T.C.Biedl

W.J.Bohnhoff

J.R.Hipp

P.Belleville

C.Grimm

G.T.Toussaint

M.Yvinec

H.Meijer

Te.Asano

S.S.Skiena

M.Teillaud

H.S.Sawhney

D.Zorin

A.Lubiw

S.Suri

D.T.Lee

R.R.Lober

K.KedemE.Welzl

G.Liotta

J.Pach

P.K.Bose

J.C.Clements

S.R.Kosaraju

J.Weeks

D.Letscher

G.Lerman

J.Czyzowicz

A.Aggarwal

H.Everett

B.Zhu

T.K.Dey

E.Trimble

N.AmentaG.D.Sjaardema

R.Tamassia

M.Gorkani

B.Aronov

S.LazardT.Roos

G.T.Wilfong

M.L.Demaine

J-M.Robert

T.J.Wilson

S.M.Robbins

R.Seidel

N.Katoh

G.Rote

J.Urrutia

J.S.Vitter

I.Streinu

L.Lopez-BuriekC.K.Johnson

F.Aurenhammer

S.Parker

J.Matousek

E.Sedgwick

J.O’Rourke

O.Devillers

J.Ashley

J.Hafner

C.Zelle

W.R.Oakes

W.Niblack

K.Mehlhorn

M.E.Houle

J.Hass

A.Hicks

Q.Huang

V. Batagelj Networks from data bases

Page 17: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

pS -core at level 46 of Computational Geometry

L.Guibas

M.Sharir

M.vanKreveld

B.Chazelle

J.Snoeyink

A.Garg

D.Dobkin

F.Preparata

J.Hershberger

C.Yap

J.Boissonnat

O.Schwarzkopf

J.Mitchell

M.Overmars

P.Gupta

R.Pollack

D.Eppstein

M.Goodrich

M.Bern

P.Agarwal

I.Tollis

H.Edelsbrunner

E.Arkin

R.Janardan

M.deBerg

D.Halperin

L.Vismara

M.Smid

G.Toussaint

M.Yvinec

M.Teillaud

S.Suri

R.Klein

E.Welzl

G.Liotta

J.Pach

P.Bose

J.Schwerdt

J.Majhi

J.Czyzowicz

R.Tamassia

B.AronovR.Seidel

J.Urrutia

J.Vitter

J.Matousek

C.Icking

J.O’Rourke

O.Devillers

G.diBattista

V. Batagelj Networks from data bases

Page 18: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Second collaboration network

The second collaboration network Cn = WAT ∗N

cnij =∑p∈W

bipnpj =∑

p∈N(i)∩N(j)

npj

cnij = contribution of author j to works, that (s)he wrote together with theauthor i .It holds

∑j∈A

∑j∈A

bipnpj = deg(p) and∑j∈A

cnij = deg(i)

cnii =∑

p∈N(i)

npi is the contribution of author i to his/her works.

Self-sufficiency: Si =cnii

deg(i)Collaborativness (co-authorship index): Ki = 1− Si

V. Batagelj Networks from data bases

Page 19: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

The ”best” authors in Statistics

name contrib pap self collab1. Burt R 83.716667 96 0.872049 0.1279512. Newman M 59.533333 87 0.684291 0.3157093. Doreian P 59.070408 75 0.787605 0.2123954. Bonacich P 45.416667 59 0.769774 0.2302265. Marsden P 41.000000 50 0.820000 0.1800006. White H 39.986111 51 0.784041 0.2159597. Wellman B 38.754762 57 0.679908 0.3200928. Friedkin N 36.333333 40 0.908333 0.0916679. Leydesdo L 34.533333 47 0.734752 0.265248

10. Borgatti S 30.469048 57 0.534545 0.46545511. Freeman L 30.250000 36 0.840278 0.15972212. Everett M 27.450000 45 0.610000 0.39000013. Litwin H 26.166667 32 0.817708 0.18229214. Snijders T 23.920408 42 0.569534 0.43046615. Skvoretz J 23.691667 39 0.607479 0.39252116. Breiger R 23.520408 30 0.784014 0.21598617. Krackhar D 22.031519 35 0.629472 0.37052818. Valente T 21.616667 44 0.491288 0.50871219. Barabasi A 18.755159 42 0.446551 0.55344920. Mizruchi M 18.333333 25 0.733333 0.26666721. Carley K 17.616667 35 0.503333 0.49666722. Cohen C 17.111111 32 0.534722 0.46527823. Moody J 16.916667 22 0.768939 0.23106124. Rothenbe R 16.492063 40 0.412302 0.58769825. Pattison P 16.483333 34 0.484804 0.51519626. Batagelj V 16.353741 29 0.563922 0.43607827. Lazega E 16.000000 20 0.800000 0.20000028. Latkin C 15.896032 49 0.324409 0.67559129. Wasserma S 15.803741 33 0.478901 0.52109930. Berkman L 15.767857 36 0.437996 0.562004

V. Batagelj Networks from data bases

Page 20: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Third collaboration network

The third collaboration network Ct = NT ∗Nctij = the total contribution of collaboration of authors i and jto works.

It holds ctij = ctji ,∑

i∈A∑

j∈A ctij = |W | and∑i∈A

∑j∈A npinpj = 1 – the total contribution of a complete

subgraph corresponding to the authors of a work is 1.∑j∈A

ctij =∑p∈W

npi is the total contribution of author i to works

from W .

V. Batagelj Networks from data bases

Page 21: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Components in SN5 cut at level 0.5

Network SN5 (2008): for "social network*" + most frequent references + around 100 social networkers;|W | = 193376, |C | = 7950, |A| = 75930, |J| = 14651, |K | = 29267

Borgatti_S

Cross_R

Jackson_M

Sparrowe_R

Carley_K

Galaskie_J

Wasserma_S

Holland_P

Leinhard_S

Newman_M

Knowlton_A

Schneide_J

Barabasi_A

Lin_N

Feld_S

Suitor_J

Wellman_BLaumann_E

Lind_P Herrmann_H

Albert_R

Gronlund_A

Holme_P

Watts_D

Johnson_C

Braha_D

Jolly_A

Bernard_H

Latkin_C

Marsden_P

Neaigus_A

Rothenbe_R

Weisner_C

Anderson_C

Jeong_H

Hawkins_J

Berkman_L

Fraser_M

Miller_M

Breiger_R

Krackhar_D

Kilduff_M

Seeman_T

Yang_H

Nowak_M

Sundquis_J

Girvan_M

Lambiott_R

Stauffer_D Leydesdo_L

Vandenbe_P

Chen_H

Potterat_J

Park_J

Mandell_W

Sherman_S

Bell_D

Atkinson_J

Bonacich_P Grabowsk_A

Batagelj_V

Newton_J

Faust_K

Ohtsuki_H

Weisbuch_G

Acock_A

Hampton_K

Doreian_P

Hummon_N

Keeling_M

Moore_C

Willer_D

Grundy_E

Ennett_S

Bauman_K

Farmer_T

Feiring_C

Xie_H

Litwin_H

Degenne_A

Flap_HSkyrms_B

Bar-Yam_Y

Davey-Ro_M

Lewis_M Ausloos_M

Mccarty_C

Pattison_P

Morris_M

Eames_KGhani_A

Hansson_L

Bjorkman_T

Cohen_C

Volker_B

Rodkin_P

Song_M

Miskel_C

Kretzsch_M

Wylie_J

Robins_G

Skvoretz_J

Garnett_G

Zenou_Y

Janssen_M

Killwort_P

Masuda_N

Jager_W

Bowling_A

Pillemer_K

Demeneze_M

Sneppen_K

Krause_N

Chou_K

Pinquart_M

Gastner_M

Pattie_C

Balkundi_P

Chi_I

Shelley_G

Woodard_K

Ostergre_P

Kogovsek_T

Carter_W

Everett_M

Ferligoj_A

Mrvar_A

Fararo_T

Hurlbert_J

Muth_S

Solomon_P

Fingerma_K

Birditt_K

Doak_S Assimako_D

Kosinski_R

Wallace_D

Sokolovs_J

Hanson_B

Bienenst_E

Rosvall_MCairns_B

Wallace_R

Hua_W

Foster_B

Calvo-Ar_A

Matzger_H

Shiovitz_S

Steinhau_H

Boyd_J

Ensel_W

Boyack_K

Xu_J

Sorensen_S

Vespigna_A

Banks_D

Rennemar_MKimura_M Saito_K

Seidman_S

Liden_R

Parker_A

Metzke_C

Farquhar_M

Konno_N

Franzen_A Hangartn_D

Johnston_R

Lindstro_D

Munoz-Fr_E

Holmes_D

Hagberg_B

Landau_R

Barer_BJohansso_S

Fiala_J Paulusma_D

Tang_J

Pastor-S_R

Iacobucc_D

Teresi_J

Pemantle_R

Shaw_B

Horner_RStark_F

Hopkins_N

Draine_J

Stolle_R

Browne_P

Lebeaux_M

Leinhard_

Brusco_M Steinley_D

Klavans_R

Borlund_P

Pajek

V. Batagelj Networks from data bases

Page 22: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Authors’ citations network

W

A

is

WA Ci

was,i

wat,j

A

j

W

t

WAT

cis,t

Ca = WAT ∗ Ci ∗WA is a network of citations betweenauthors. The weight w(i , j) counts the number of times a workauthored by i is citing a work authored by j .

V. Batagelj Networks from data bases

Page 23: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

Islands in SN5 authors citation network

Network SN5 (2008): for "social network*" + most frequent references + around 100 social networkers;|W | = 193376, |C | = 7950, |A| = 75930, |J| = 14651, |K | = 29267

UNKNOWN

BORGATTI_S

ROGERS_E

CARLEY_K

GALASKIE_J

GULATI_R

BURT_R

FREEMAN_L

WASSERMA_S

DOROGOVT_S

HOLLAND_P

LEINHARD_S

NEWMAN_M

KNOWLTON_A

VLAHOV_D

BARABASI_A

BARTHELE_M

COLEMAN_J

LIN_N

ROSS_N

JENKINS_R

LAUMANN_E

ALBERT_R

AMARAL_L

BOCCALET_S

GRONLUND_A

HOLME_P

WATTS_D

BRASS_D

MERTON_R

THOMPSON_J

WHITE_D

CELENTAN_D

CURTIS_R

DESJARLA_D

FRIEDMAN_S

GRANOVET_M

LATKIN_C

MARSDEN_P

NEAIGUS_A

ROTHENBE_R

VALENTE_T

KELLY_J

WEISNER_C

ANDERSON_C

JEONG_H

SNIJDERS_T

MILLER_M

KASKUTAS_L

BREIGER_R

KRACKHAR_D

WHITE_H

BROWN_G

KILDUFF_M

COSTA_L

MOLLOY_M

IBARRA_H

ADLER_P

GIRVAN_M

KLOVDAHL_A

POTTERAT_J

BRUGHA_T

MACCARTH_B

MAGLIANO_L

WING_J

COHEN_A

PARK_J

MANDELL_W

DEROSA_C

BONACICH_P

OSTROM_E

GRABOWSK_A

STROGATZ_S

BATAGELJ_V

FAUST_K

DISHION_T

DOREIAN_P

HUMMON_N

MOORE_C

WILLER_D

ATRAN_S

CAIRNS_RGEST_S

VANDUIJN_M

CRICK_N

ESPELAGE_D

FARMER_T

KINDERMA_T

LEUNG_MXIE_H

MIZRUCHI_M

HENDERSO_S

COOK_K

HUNT_J

BOORMAN_S

PATTISON_P

MORRIS_M

MORENO_Y

SCHWARTZ_N

FRIEDKIN_N

LAZEGA_E

MAGNUSSO_D

LAI_G

CARPENTE_S

PEARL_R

VANACKER_R

RODKIN_P

PELLEGRI_A

COIE_J

HYMEL_S

AMIRKHAN_Y

KRETZSCH_M

BEBBINGT_P

ROBINS_G

SKVORETZ_J

JANSSEN_MBREWIN_C

MASUDA_N

KIM_D

OLSSON_P

FOLKE_CHAHN_T

ADGER_WBERKES_F

GUNDERSO_L

HOLLING_C

SCHEFFER_M

WESTLEY_F

BALKUNDI_P

FARRELL_M

STRAUSS_D

EVERETT_M

FERLIGOJ_A

FARARO_T

HURLBERT_J

MUTH_S

SOLOMON_P

SEIKKULA_J

FIENBERG_S

HIGGINS_C

GAMBOA_G

BIENENST_E

ESTELL_D

CAIRNS_B

DARROW_W

WOODHOUS_D

GERGEN_K

MATZGER_H

DAPPORTO_L

PALAGI_E

JEANNE_R

RAU_P

REEVE_H

ROSELER_P

STARKS_P

STRASSMA_J

TURILLAZ_S

WESTEBER_M

WALKER_B

MELTZER_H

FIORILLO_AMALANGON_C

MAJ_M

MARKOVSK_B

MAHADEVA_R

SCHILLIN_C

MAY_P

MUSSAT_M

CORP_E

DELALAUR_L

DEPOMPER_M

DEYRIS_EFRANTZ_P

LEBEAU_E

LEMOIGNE_M

LEMOY_A

LEVOT_P

MAILLARD_J

MARECHAL_M

KILIC_C

AYDIN_I

TASKINTU_N

OZCURUME_G

KURT_G

EREN_E

LALE_T

OZEL_S

ZILELI_L

BASOGLU_M

MCGORRY_P

LEWIS_G

CADWALLA_T

AALTONEN_J

ALAKARE_B

ALANEN_Y

ANDERSEN_T

ANDERSON_H

FADDEN_G

SELVINIP

SHOTTER_J

DELUCCHI_K

FEDORA_P

HELD_T

LESAGE_A

IACOBUCC_D

MEDIN_D

GOFORTH_J

CLEMMER_J

SABORNIE_E

ABEL_E

LYNCH_E

MARASCO_C

GUARNERI_M

GOSSAGE_J

WHITE-CO_M

GOODHART_K

DECOTEAU_S

TRUJILLO_P

KALBERG_W

VILJOEN_D

HOYME_H

NIELSEN_R

NECKERMA_H

MORGAN_Z

VAPNARSK_V

EK_E

COLEY_J

TIMURA_C

BARAN_M

LESBAUPI_I

ARRUDA_M

BENJAMIN_C

BIONDI_ABOFF_C

GONCALVE_R

MATTOSO_J

PINAUD_J

STEDILE_J

TRINDADE_H

D’AMIA_G

AMATI_C

ANNONI_A

ARRIGONI_P

ASPARI_D

BECCARIA_G

BELGIOJO_A

BELTRAMI_L

BIANCONI_C

CASSIRAM_A

CATTANEO_CCHIZZOLI_G

DALLAJ_A

FRANCHET_G

GATTIPER_M

GIULINI_G

GOLDOLI_E

GOZZOLI_M

GUILINI_G

HONEGGER_A

KANNES_G

LATUADA_S

LUCCHELL_G

MERIGGI_M

MEZZANOT_G

MEZZANOT_P

MONTALTO_R

PAPAGNA_P

PATETTA_L

PIZZAGAL_F

REGGIORI_F

RICCI_G

ROMUSSI_C

ROSSI_M

SANDRI_M

SCOTTI_A

VACANI_C

VALLI_F

VERCELLO_V

ZOTTI_S

BUCHANAN_L

HOLLOWEL_J

GARIEPY_J

BROADBEL_L

MAVROVOU_M

BURGARD_A

FAMILI_I

VANDIEN_S

PFAENDTN_J

KLINKE_D

SUMATHI_R

Pajek

V. Batagelj Networks from data bases

Page 24: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

ESNA Pajek

Pajek – program for analysis and vi-sualization of large networks is freelyavailable, for noncommercial use, atits web site.

http://pajek.imfm.si/

An introduction to social networkanalysis with Pajek is available inthe book ESNA (de Nooy, Mrvar,Batagelj 2005). Second extendededition in September 2011.

ESNA in Japanese was publishedby Tokyo Denki University Press in2010; Chinese, November 2012.

Pajek 2.* → Pajek 3.*

V. Batagelj Networks from data bases

Page 25: V. Batagelj - Big data Networks from data bases

Networks fromdata bases

V. Batagelj

Two modenetworks

Multiplication

Derivednetworks

Pajek

References

Batagelj, V.: Social Network Analysis, Large-Scale. R.A. Meyers, ed.,Encyclopedia of Complexity and Systems Science, Springer 2009:8245-8265.

Batagelj, V, Cerinsek, M: On bibliographic networks. Scientometrics (2013).(DOI) 10.1007/s11192-012-0940-1.

Batagelj, V., Mrvar, A.: Analysis of Kinship Relations With Pajek. SocialScience Computer Review 26(2), 224-246, 2008.

The work was supported in part by the ARRS, Slovenia, grant P1-0294, as well asby grant N1-0011 within the EUROCORES Programme EUROGIGA (projectGReGAS) of the European Science Foundation.

http://pajek.imfm.si/lib/exe/fetch.php?media=pub:cns11.pdf

V. Batagelj Networks from data bases