44
Quick TIFF are ne Marko A. Rodriguez High-Performance Computing Challenge - April 21, 2008 The Network Data Structure in Computing Marko A. Rodriguez Los Alamos National Laboratory Vrije Universiteit Brussel [email protected] http://cnls.lanl.gov/~marko

The Network Data Structure in Computing

Embed Size (px)

Citation preview

Page 1: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

The Network Data Structure in Computing

Marko A. Rodriguez

Los Alamos National Laboratory

Vrije Universiteit Brussel

[email protected]

http://cnls.lanl.gov/~marko

Page 2: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

About me.

• Marko Antonio Rodriguez.

• Bachelors of Science in Cognitive Science from U.C. San Diego.

• Minor in the Arts in Computer Music from U.C. San Diego.

• Masters of Science in Computer Science from U.C. Santa Cruz.

• Visiting Researcher at the Center for Evolution, Complexity, and Cognition at the Free University of Brussels.

• Ph.D. in Computer Science from U.C. Santa Cruz.

• Researcher at the Los Alamos National Laboratory since 2005.

Page 3: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Research trends.

• MESUR: Metrics from Scholarly Usage of Resources. (http://www.mesur.org)

• Neno/Fhat: A Semantic Network Programming Language and Virtual Machine Architecture. (http://neno.lanl.gov)

• CDMS: Collective Decision Making Systems. (http://cdms.lanl.gov)

Page 4: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

What is a network?

• A network is a data structure that is used to connect vertices/nodes/dots by means of edges/links/lines.

• Networks are everywhere.o Social: friendship, trust, communication, collaboration.o Technological: web-pages, communication, software dependencies, circuits.o Scholarly: journals, authors, articles, institutions.o Natural: protein interaction, neural, food web.

Page 5: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

The undirected network.

• There is the undirected network of common knowledge.o Sometimes called an undirected single-relational network.o e.g. vertex i and vertex j are “related”.

• The semantic of the edge denotes the network type.o e.g. friendship network, collaboration network, etc.

i j

Page 6: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Example undirected network.

Herbert

Marko

Aric

Ed

Zhiwu

Alberto

Jen

Johan

Luda

Stephan

Whenzong

Page 7: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

The directed network.

• Then there is the directed network of common knowledge.o Sometimes called a directed single-relational network.o For example, vertex i is related to vertex j, but j is not related to i.

i j

Page 8: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Example directed network.

Muskrat

Bear

Fish

Fox

Meerkat

Lion

Human

Wolf

Deer

Beetle

Hyena

Page 9: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

The semantic network.

• Finally, there is the semantic network o Sometimes called a directed multi-relational network.o For example, vertex i is related to vertex j by the semantic s, but j is not

related to i by the semantic s.

i js

Page 10: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Example semantic network.

SantaFe

Marko

NewMexico

Ryan

California

UnitedStates

LANL

livesIn

worksWith

cityOf

originallyFrom

stateOfstateOf

locatedIn

hasLab

Cells

Atoms

madeOf

madeOf

researches

Oregon

southOf

hasResident

Arnold

governerOf

northOf

Page 11: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Google’s PageRank.

• PageRanko Used to rank web-pages that are connected by citation (hyper-link).

Note: this image was stolen off the web from somewhere.

Page 12: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

The components to calculate a stationary probability distribution.

• Take a single “random walker”.

• Place that random walker on any random vertex in the network.

• At every time step, the random walker transitions from its current node to an adjacent node in the network (i.e. takes a random outgoing edge from its current node.)

• Anytime the random walker is at a node, increment a “times visited” counter by 1.

• Let this algorithm run for an “infinite” amount of time.

• Normalize the “times visited” counters.o That is your centrality vector.

a

1

0.0123

Page 13: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

0

0

0

0

Page 14: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

1

0

0

0

Page 15: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

1

0

1

0

Page 16: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

1

0

1

1

Page 17: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

1

1

1

1

Page 18: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

1

1

2

1

Page 19: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

1

2

2

1

Page 20: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

2

2

2

1

Page 21: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

2

2

3

1

Page 22: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

2

2

3

2

Page 23: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

2

3

3

2

Page 24: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

2

3

4

2

Page 25: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

66785

133310

133321

66784

Page 26: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Random walker example.

a

c

b

d

0.167

0.332

0.332

0.167

Page 27: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Breather.

Page 28: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Example semantic network.

SantaFe

Marko

NewMexico

Ryan

California

UnitedStates

LANL

livesIn

worksWith

cityOf

originallyFrom

stateOfstateOf

locatedIn

hasLab

Cells

Atoms

madeOf

madeOf

researches

Oregon

southOf

hasResident

Arnold

governerOf

northOf

Page 29: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

What is the Semantic Web?

• The figurehead of the Semantic Web initiative, Tim Berners-Lee, describes the Semantic Web as

o “... an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

• Perhaps not the best definition. It implies a particular application space--namely the “web metadata and intelligent agents” space.

• My definition is that the Semantic Web is o “a distributed, standardized semantic network data model--a URG (Uniform

Resource Graph). It’s a uniform way of graphing resources.”

Page 30: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

What is a resource?

• Resource = Anything.o Anything that can be identified.

• The Uniform Resource Identifier (URI):o <scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]

- http://www.lanl.gov

- urn:uuid:550e8400-e29b-41d4-a716-446655440000

- urn:issn:0892-3310

- http://www.lanl.gov#MarkoRodriguez– prefix it to make it easier on the eyes -- lanl:MarkoRodriguez

• The Semantic Webo “first identify it, then relate it!”

Page 31: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

The technologies of the Semantic Web.

• Resource Description Framework (RDF): The foundation technology of the Semantic Web. RDF is a highly-distributed, semantic network data model. In RDF, URIs and literals (e.g. ints, doubles, strings) are related to one another in triples.

o <lanl:marko> <lanl:worksWith> <lanl:jhw>o <lanl:jhw> <lanl:wrote> <lanl:LAUR-07-2028>o <lanl:LAUR-07-2028> <lanl:hasTitle> “Web-Based Collective Decision Making

Systems”^^<xsd:string>

• RDF Schema (RDFS): The ontology is to the Semantic Web as the schema is to the relational database.

o “Anything of rdf:type lanl:Human can lanl:drive anything of rdf:type lanl:Car.”

Page 32: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

RDF and RDFS.

lanl:marko lanl:cookie

lanl:Human lanl:Food

lanl:isEatingrdf:type rdf:type

lanl:isEating

rdfs:domainrdfs:range

ontology

instance

RDF is not a syntax. It’s a data model. Various syntaxes exist to encode RDF including RDF/XML, N-TRIPLE, TRiX, N3, etc.

Page 33: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

PageRank in a semantic network?

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

?

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

?

?

Page 34: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Components of a grammar-based walker.

• A walker.o Discrete element.

• A grammar.o An abstract representation of legal path for the walker take.

- e.g. “you can traverse a lanl:friendOf edge from a lanl:Human to another lanl:Human.”

- Also includes rules: “increment a counter.”, “don’t ever return to this vertex.”

• A data set that respects the ontological “expectations” of the grammar.

Page 35: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

0

0

0

“Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.”

Page 36: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

“Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.”

1

0

0

Page 37: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

1

0

0

“Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.”

Page 38: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

1

0

1

“Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.”

Page 39: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

1

0

1

“Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.”

Page 40: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

2

0

1

“Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.”

Page 41: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

2

0

1

“Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.”

Page 42: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Grammars create implicit relationships.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

lanl:hasCoauthor

Page 43: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Conclusions.

• Many systems can be represented as a network.

• The semantic network is a more expressive, though less studied data model.

• The grammar technique can be used to port many of the common network analysis algorithms to the semantic network domain.

Page 44: The Network Data Structure in Computing

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Marko A. Rodriguez

High-Performance Computing Challenge - April 21, 2008

Related publications.

• Rodriguez, M.A., Watkins, J.H., Bollen, J., Gershenson, C., “Using RDF to Model the Structure and Process of Systems”, International Conference on Complex Systems, Boston, Massachusetts, LAUR-07-5720, October 2007.

• Rodriguez, M.A., Bollen, J., Van de Sompel, H., “A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage”, 2007 ACM/IEEE Joint Conference on Digital Libraries, pages 278-287, Vancouver, Canada, ACM/IEEE Computing, doi:10.1145/1255175.1255229, LA-UR-07-0665, June 2007.

• Rodriguez, M.A., "Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms", 2007 Hawaii International Conference on Systems Science (HICSS), pages 39-49, Waikoloa, Hawaii, IEEE Computer Society, ISSN: 1530-1605, doi:10.1109/HICSS.2007.487, LA-UR-06-2139, January 2007.

• Rodriguez, M.A., "A Multi-Relational Network to Support the Scholarly Communication Process", International Journal of Public Information Systems, volume 2007, issue 1, pages 13-29, ISSN: 1653-4360, LA-UR-06-2416, March 2007.

• Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks”, LA-UR-07-5287, August 2007.• Rodriguez, M.A., Watkins, J.H., “Grammar-Based Geodesics in Semantic Networks”, LA-UR-07-4042,

June 2007.• Rodriguez, M.A., Bollen, J., “Modeling Computations in a Semantic Network”, LA-UR-07-3678, May

2007. • Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate”, LA-UR-07-2885,

April 2007. • Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, Knowledge-Based

Systems, Elsevier, LA-UR-06-7791, in press, 2007.