80
Outline Introduction The Realization of Graphs Graph Traversals Conclusion The Graph Traversal Pattern (Marko A. Rodriguez, Peter Neubauer) Igor Bogicevic ([email protected]) September 5, 2010 Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

The Graph Traversal Pattern

Embed Size (px)

DESCRIPTION

NoSQL Presentation of "The Graph Traversal Pattern" paper from Marko A. Rodriguez and Peter Neubauer.

Citation preview

Page 1: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Igor Bogicevic ([email protected])

September 5, 2010

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 2: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Introduction

The Realization of GraphsBrief IntroductoryThe Indices of Relational TablesThe Graph as an Index

Graph TraversalsDefinition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Conclusion

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 3: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Introduction

I In the most common sense of the term, a graph is an ordered pair G = (V ,E)comprising a set V of vertices or nodes together with a set E of edges or lines,which are 2-element subsets of V (i.e, an edge is related with two vertices, andthe relation is represented as unordered pair of the vertices with respect to theparticular edge). To avoid ambiguity, this type of graph may be describedprecisely as undirected and simple.

I For directed graphs, E ⊆ (V ×V ) and for undirected graphs, E ⊆ {V ×V }. Thatis, E is a subset of all ordered or unordered permutations of V element pairings.

I In different contexts it may be useful to define the term graph with differentdegrees of generality. Whenever it is necessary to draw a strict distinction, thefollowing terms are used. Most commonly, in modern texts in graph theory, unlessstated otherwise, graph means ”undirected simple finite graph”

I We can talk about various other types of graphs: Mixed graphs, Multigraphs,Weighted graphs, etc.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 4: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Introduction

I In the most common sense of the term, a graph is an ordered pair G = (V ,E)comprising a set V of vertices or nodes together with a set E of edges or lines,which are 2-element subsets of V (i.e, an edge is related with two vertices, andthe relation is represented as unordered pair of the vertices with respect to theparticular edge). To avoid ambiguity, this type of graph may be describedprecisely as undirected and simple.

I For directed graphs, E ⊆ (V ×V ) and for undirected graphs, E ⊆ {V ×V }. Thatis, E is a subset of all ordered or unordered permutations of V element pairings.

I In different contexts it may be useful to define the term graph with differentdegrees of generality. Whenever it is necessary to draw a strict distinction, thefollowing terms are used. Most commonly, in modern texts in graph theory, unlessstated otherwise, graph means ”undirected simple finite graph”

I We can talk about various other types of graphs: Mixed graphs, Multigraphs,Weighted graphs, etc.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 5: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Introduction

I In the most common sense of the term, a graph is an ordered pair G = (V ,E)comprising a set V of vertices or nodes together with a set E of edges or lines,which are 2-element subsets of V (i.e, an edge is related with two vertices, andthe relation is represented as unordered pair of the vertices with respect to theparticular edge). To avoid ambiguity, this type of graph may be describedprecisely as undirected and simple.

I For directed graphs, E ⊆ (V ×V ) and for undirected graphs, E ⊆ {V ×V }. Thatis, E is a subset of all ordered or unordered permutations of V element pairings.

I In different contexts it may be useful to define the term graph with differentdegrees of generality. Whenever it is necessary to draw a strict distinction, thefollowing terms are used. Most commonly, in modern texts in graph theory, unlessstated otherwise, graph means ”undirected simple finite graph”

I We can talk about various other types of graphs: Mixed graphs, Multigraphs,Weighted graphs, etc.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 6: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Introduction

I In the most common sense of the term, a graph is an ordered pair G = (V ,E)comprising a set V of vertices or nodes together with a set E of edges or lines,which are 2-element subsets of V (i.e, an edge is related with two vertices, andthe relation is represented as unordered pair of the vertices with respect to theparticular edge). To avoid ambiguity, this type of graph may be describedprecisely as undirected and simple.

I For directed graphs, E ⊆ (V ×V ) and for undirected graphs, E ⊆ {V ×V }. Thatis, E is a subset of all ordered or unordered permutations of V element pairings.

I In different contexts it may be useful to define the term graph with differentdegrees of generality. Whenever it is necessary to draw a strict distinction, thefollowing terms are used. Most commonly, in modern texts in graph theory, unlessstated otherwise, graph means ”undirected simple finite graph”

I We can talk about various other types of graphs: Mixed graphs, Multigraphs,Weighted graphs, etc.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 7: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

Brief Introductory

I Relational databases have been around since the late 1960s (Edgar F. Codd, Arelational model of data for large shared data banks) and are todays mostpredominate data management tool. Relational databases maintain a collectionof tables. Each table can be defined by a set of rows and a set of columns.Semantically, rows denote objects and columns denote properties/attributes.

I In contrast, graph databases do not store data in disparate tables. Instead thereis a single data structure—the graph. Moreover, there is no concept of a “join”operation as every vertex and edge has a direct reference to its adjacent vertex oredge. The data structure is already “joined” by the edges that are defined.

I Main drawback - it’s hard to shard a graph.

I Main benefit - constant time cost for retrieving an adjacent vertex or edge. Inother words, regardless of the size of the graph as a whole, the cost of a localread operation at a vertex or edge remains constant.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 8: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

Brief Introductory

I Relational databases have been around since the late 1960s (Edgar F. Codd, Arelational model of data for large shared data banks) and are todays mostpredominate data management tool. Relational databases maintain a collectionof tables. Each table can be defined by a set of rows and a set of columns.Semantically, rows denote objects and columns denote properties/attributes.

I In contrast, graph databases do not store data in disparate tables. Instead thereis a single data structure—the graph. Moreover, there is no concept of a “join”operation as every vertex and edge has a direct reference to its adjacent vertex oredge. The data structure is already “joined” by the edges that are defined.

I Main drawback - it’s hard to shard a graph.

I Main benefit - constant time cost for retrieving an adjacent vertex or edge. Inother words, regardless of the size of the graph as a whole, the cost of a localread operation at a vertex or edge remains constant.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 9: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

Brief Introductory

I Relational databases have been around since the late 1960s (Edgar F. Codd, Arelational model of data for large shared data banks) and are todays mostpredominate data management tool. Relational databases maintain a collectionof tables. Each table can be defined by a set of rows and a set of columns.Semantically, rows denote objects and columns denote properties/attributes.

I In contrast, graph databases do not store data in disparate tables. Instead thereis a single data structure—the graph. Moreover, there is no concept of a “join”operation as every vertex and edge has a direct reference to its adjacent vertex oredge. The data structure is already “joined” by the edges that are defined.

I Main drawback - it’s hard to shard a graph.

I Main benefit - constant time cost for retrieving an adjacent vertex or edge. Inother words, regardless of the size of the graph as a whole, the cost of a localread operation at a vertex or edge remains constant.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 10: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

Brief Introductory

I Relational databases have been around since the late 1960s (Edgar F. Codd, Arelational model of data for large shared data banks) and are todays mostpredominate data management tool. Relational databases maintain a collectionof tables. Each table can be defined by a set of rows and a set of columns.Semantically, rows denote objects and columns denote properties/attributes.

I In contrast, graph databases do not store data in disparate tables. Instead thereis a single data structure—the graph. Moreover, there is no concept of a “join”operation as every vertex and edge has a direct reference to its adjacent vertex oredge. The data structure is already “joined” by the edges that are defined.

I Main drawback - it’s hard to shard a graph.

I Main benefit - constant time cost for retrieving an adjacent vertex or edge. Inother words, regardless of the size of the graph as a whole, the cost of a localread operation at a vertex or edge remains constant.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 11: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Indices of Relational Tables

I Direct data access is inefficient - for n elements, a linear scan runs in O(n)

I Binary search tree dramatically reduces the overhead - searching such indicestakes O(log2n)

I In effect, the join operation forms a graph that is dynamically constructed as onetable is linked to another table. While having the benefit of being able todynamically construct graphs, the limitation is that this graph is not explicit inthe relational structure, but instead must be inferred through a series ofindex-intensive operations.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 12: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Indices of Relational Tables

I Direct data access is inefficient - for n elements, a linear scan runs in O(n)

I Binary search tree dramatically reduces the overhead - searching such indicestakes O(log2n)

I In effect, the join operation forms a graph that is dynamically constructed as onetable is linked to another table. While having the benefit of being able todynamically construct graphs, the limitation is that this graph is not explicit inthe relational structure, but instead must be inferred through a series ofindex-intensive operations.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 13: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Indices of Relational Tables

I Direct data access is inefficient - for n elements, a linear scan runs in O(n)

I Binary search tree dramatically reduces the overhead - searching such indicestakes O(log2n)

I In effect, the join operation forms a graph that is dynamically constructed as onetable is linked to another table. While having the benefit of being able todynamically construct graphs, the limitation is that this graph is not explicit inthe relational structure, but instead must be inferred through a series ofindex-intensive operations.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 14: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Indices of Relational Tables

person.identifier index person.name index friend.person_a index

identifier name person_a person_b1

2

3

4

Alberto Pepe

...

...

...

1

1

1

2

3

4... ...

Figure: A table representation of people and their friends.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 15: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Graph as an IndexI A single-relational graph maintains a set of edges, where all the edges are

homogeneous in meaning. This means that separate relationships (i.e. friendship,kinship, etc.) are stored as separate graphs.

I Formally, a property graph can be defined as G = (V ,E , λ, µ), where edges aredirected (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and propertiesare a map from elements and keys to values (i.e. µ : (V ∪ E)× R → S).

I In the property graph model, it is common for the properties of the vertices (andsometimes edges) to be indexed using a tree structure analogous, in many ways,to those used by relational databases. This index can be represented by someexternal indexing system or endogenous to the graph as an embedded tree.

I The domain model defines how the elements of the problem space are related.

I The domain model partitions elements using semantics defined by the domainmodeler. Thus, in many ways, a graph can be seen as an indexing structure.

I In a graph database, there is no explicit join operation because vertices maintaindirect references to their adjacent edges. In many ways, the edges of the graphserve as explicit, “hard-wired” join structures, not computed in run time as joins.

I The act of traversing over an edge is the act of joining. However, what makesthis more efficient in a graph database is that traversing from one vertex toanother is a constant time operation. Thus, traversal time is defined solely by thenumber of elements touched by the traversal.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 16: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Graph as an IndexI A single-relational graph maintains a set of edges, where all the edges are

homogeneous in meaning. This means that separate relationships (i.e. friendship,kinship, etc.) are stored as separate graphs.

I Formally, a property graph can be defined as G = (V ,E , λ, µ), where edges aredirected (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and propertiesare a map from elements and keys to values (i.e. µ : (V ∪ E)× R → S).

I In the property graph model, it is common for the properties of the vertices (andsometimes edges) to be indexed using a tree structure analogous, in many ways,to those used by relational databases. This index can be represented by someexternal indexing system or endogenous to the graph as an embedded tree.

I The domain model defines how the elements of the problem space are related.

I The domain model partitions elements using semantics defined by the domainmodeler. Thus, in many ways, a graph can be seen as an indexing structure.

I In a graph database, there is no explicit join operation because vertices maintaindirect references to their adjacent edges. In many ways, the edges of the graphserve as explicit, “hard-wired” join structures, not computed in run time as joins.

I The act of traversing over an edge is the act of joining. However, what makesthis more efficient in a graph database is that traversing from one vertex toanother is a constant time operation. Thus, traversal time is defined solely by thenumber of elements touched by the traversal.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 17: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Graph as an IndexI A single-relational graph maintains a set of edges, where all the edges are

homogeneous in meaning. This means that separate relationships (i.e. friendship,kinship, etc.) are stored as separate graphs.

I Formally, a property graph can be defined as G = (V ,E , λ, µ), where edges aredirected (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and propertiesare a map from elements and keys to values (i.e. µ : (V ∪ E)× R → S).

I In the property graph model, it is common for the properties of the vertices (andsometimes edges) to be indexed using a tree structure analogous, in many ways,to those used by relational databases. This index can be represented by someexternal indexing system or endogenous to the graph as an embedded tree.

I The domain model defines how the elements of the problem space are related.

I The domain model partitions elements using semantics defined by the domainmodeler. Thus, in many ways, a graph can be seen as an indexing structure.

I In a graph database, there is no explicit join operation because vertices maintaindirect references to their adjacent edges. In many ways, the edges of the graphserve as explicit, “hard-wired” join structures, not computed in run time as joins.

I The act of traversing over an edge is the act of joining. However, what makesthis more efficient in a graph database is that traversing from one vertex toanother is a constant time operation. Thus, traversal time is defined solely by thenumber of elements touched by the traversal.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 18: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Graph as an IndexI A single-relational graph maintains a set of edges, where all the edges are

homogeneous in meaning. This means that separate relationships (i.e. friendship,kinship, etc.) are stored as separate graphs.

I Formally, a property graph can be defined as G = (V ,E , λ, µ), where edges aredirected (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and propertiesare a map from elements and keys to values (i.e. µ : (V ∪ E)× R → S).

I In the property graph model, it is common for the properties of the vertices (andsometimes edges) to be indexed using a tree structure analogous, in many ways,to those used by relational databases. This index can be represented by someexternal indexing system or endogenous to the graph as an embedded tree.

I The domain model defines how the elements of the problem space are related.

I The domain model partitions elements using semantics defined by the domainmodeler. Thus, in many ways, a graph can be seen as an indexing structure.

I In a graph database, there is no explicit join operation because vertices maintaindirect references to their adjacent edges. In many ways, the edges of the graphserve as explicit, “hard-wired” join structures, not computed in run time as joins.

I The act of traversing over an edge is the act of joining. However, what makesthis more efficient in a graph database is that traversing from one vertex toanother is a constant time operation. Thus, traversal time is defined solely by thenumber of elements touched by the traversal.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 19: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Graph as an IndexI A single-relational graph maintains a set of edges, where all the edges are

homogeneous in meaning. This means that separate relationships (i.e. friendship,kinship, etc.) are stored as separate graphs.

I Formally, a property graph can be defined as G = (V ,E , λ, µ), where edges aredirected (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and propertiesare a map from elements and keys to values (i.e. µ : (V ∪ E)× R → S).

I In the property graph model, it is common for the properties of the vertices (andsometimes edges) to be indexed using a tree structure analogous, in many ways,to those used by relational databases. This index can be represented by someexternal indexing system or endogenous to the graph as an embedded tree.

I The domain model defines how the elements of the problem space are related.

I The domain model partitions elements using semantics defined by the domainmodeler. Thus, in many ways, a graph can be seen as an indexing structure.

I In a graph database, there is no explicit join operation because vertices maintaindirect references to their adjacent edges. In many ways, the edges of the graphserve as explicit, “hard-wired” join structures, not computed in run time as joins.

I The act of traversing over an edge is the act of joining. However, what makesthis more efficient in a graph database is that traversing from one vertex toanother is a constant time operation. Thus, traversal time is defined solely by thenumber of elements touched by the traversal.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 20: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Graph as an IndexI A single-relational graph maintains a set of edges, where all the edges are

homogeneous in meaning. This means that separate relationships (i.e. friendship,kinship, etc.) are stored as separate graphs.

I Formally, a property graph can be defined as G = (V ,E , λ, µ), where edges aredirected (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and propertiesare a map from elements and keys to values (i.e. µ : (V ∪ E)× R → S).

I In the property graph model, it is common for the properties of the vertices (andsometimes edges) to be indexed using a tree structure analogous, in many ways,to those used by relational databases. This index can be represented by someexternal indexing system or endogenous to the graph as an embedded tree.

I The domain model defines how the elements of the problem space are related.

I The domain model partitions elements using semantics defined by the domainmodeler. Thus, in many ways, a graph can be seen as an indexing structure.

I In a graph database, there is no explicit join operation because vertices maintaindirect references to their adjacent edges. In many ways, the edges of the graphserve as explicit, “hard-wired” join structures, not computed in run time as joins.

I The act of traversing over an edge is the act of joining. However, what makesthis more efficient in a graph database is that traversing from one vertex toanother is a constant time operation. Thus, traversal time is defined solely by thenumber of elements touched by the traversal.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 21: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Graph as an IndexI A single-relational graph maintains a set of edges, where all the edges are

homogeneous in meaning. This means that separate relationships (i.e. friendship,kinship, etc.) are stored as separate graphs.

I Formally, a property graph can be defined as G = (V ,E , λ, µ), where edges aredirected (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and propertiesare a map from elements and keys to values (i.e. µ : (V ∪ E)× R → S).

I In the property graph model, it is common for the properties of the vertices (andsometimes edges) to be indexed using a tree structure analogous, in many ways,to those used by relational databases. This index can be represented by someexternal indexing system or endogenous to the graph as an embedded tree.

I The domain model defines how the elements of the problem space are related.

I The domain model partitions elements using semantics defined by the domainmodeler. Thus, in many ways, a graph can be seen as an indexing structure.

I In a graph database, there is no explicit join operation because vertices maintaindirect references to their adjacent edges. In many ways, the edges of the graphserve as explicit, “hard-wired” join structures, not computed in run time as joins.

I The act of traversing over an edge is the act of joining. However, what makesthis more efficient in a graph database is that traversing from one vertex toanother is a constant time operation. Thus, traversal time is defined solely by thenumber of elements touched by the traversal.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 22: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Brief IntroductoryThe Indices of Relational TablesThe Graph as an Index

The Graph as an Index

1

name=Alberto Pepe

2

name=...

3

name=...

4

name=...

...

friend

friend

friend

...

vertex.name index

Figure: A graph representation of people and their friends. Given the tree-nature of thevertex.name index, it is possible, and many times useful to model the index endogenous to thegraph.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 23: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Definition of Graph Traversals

I Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph insome algorithmic fashion.

I The most primitive, read-based operation on a graph is a single step traversalfrom element i to element j , where i , j ∈ (V ∪ E).

I These operations are defined over power multiset domains and ranges - powermultiset of A, denoted P̂(A), is the infinite set of all subsets of multisets of A.

I Let’s list single step traversals:I eout : P̂(V )→ P̂(E): traverse to the outgoing edges of the vertices.I ein : P̂(V )→ P̂(E): traverse to the incoming edges to the vertices.I vout : P̂(E)→ P̂(V ): traverse to the outgoing (i.e. tail) vertices of the edges.I vin : P̂(E)→ P̂(V ): traverse the incoming (i.e. head) vertices of the edges.I ε : P̂(V ∪ E)× R → P̂(S): get the element property values for key r ∈ R.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 24: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Definition of Graph Traversals

I Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph insome algorithmic fashion.

I The most primitive, read-based operation on a graph is a single step traversalfrom element i to element j , where i , j ∈ (V ∪ E).

I These operations are defined over power multiset domains and ranges - powermultiset of A, denoted P̂(A), is the infinite set of all subsets of multisets of A.

I Let’s list single step traversals:I eout : P̂(V )→ P̂(E): traverse to the outgoing edges of the vertices.I ein : P̂(V )→ P̂(E): traverse to the incoming edges to the vertices.I vout : P̂(E)→ P̂(V ): traverse to the outgoing (i.e. tail) vertices of the edges.I vin : P̂(E)→ P̂(V ): traverse the incoming (i.e. head) vertices of the edges.I ε : P̂(V ∪ E)× R → P̂(S): get the element property values for key r ∈ R.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 25: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Definition of Graph Traversals

I Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph insome algorithmic fashion.

I The most primitive, read-based operation on a graph is a single step traversalfrom element i to element j , where i , j ∈ (V ∪ E).

I These operations are defined over power multiset domains and ranges - powermultiset of A, denoted P̂(A), is the infinite set of all subsets of multisets of A.

I Let’s list single step traversals:I eout : P̂(V )→ P̂(E): traverse to the outgoing edges of the vertices.I ein : P̂(V )→ P̂(E): traverse to the incoming edges to the vertices.I vout : P̂(E)→ P̂(V ): traverse to the outgoing (i.e. tail) vertices of the edges.I vin : P̂(E)→ P̂(V ): traverse the incoming (i.e. head) vertices of the edges.I ε : P̂(V ∪ E)× R → P̂(S): get the element property values for key r ∈ R.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 26: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Definition of Graph Traversals

I Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph insome algorithmic fashion.

I The most primitive, read-based operation on a graph is a single step traversalfrom element i to element j , where i , j ∈ (V ∪ E).

I These operations are defined over power multiset domains and ranges - powermultiset of A, denoted P̂(A), is the infinite set of all subsets of multisets of A.

I Let’s list single step traversals:I eout : P̂(V )→ P̂(E): traverse to the outgoing edges of the vertices.I ein : P̂(V )→ P̂(E): traverse to the incoming edges to the vertices.I vout : P̂(E)→ P̂(V ): traverse to the outgoing (i.e. tail) vertices of the edges.I vin : P̂(E)→ P̂(V ): traverse the incoming (i.e. head) vertices of the edges.I ε : P̂(V ∪ E)× R → P̂(S): get the element property values for key r ∈ R.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 27: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Definition of Graph TraversalsI When edges are labeled and elements have properties, it is desirable to constrain

the traversal to edges of a particular label or elements with particular properties.These operations are known as filters and are abstractly defined in the followingitemization:

I elab± : P̂(E)× Σ→ P̂(E): allow (or filter) all edges with the label σ ∈ Σ.I εp± : P̂(V ∪ E)× R × S → P̂(V ∪ E): allow (or filter) all elements with the property

s ∈ S for key r ∈ R.I εε± : P̂(V ∪ E)× (V ∪ E)→ P̂(V ∪ E): allow (or filter) all elements that are the

provided element.

I Through function composition, we can define graph traversals of arbitrary length,i.e. Alberto Pepe’s friends:

f : P̂(V )→ P̂(S),

wheref (i) = ε (vin (elab+ (eout(i), friend)) , name) ,

then f (i) will return the names of Alberto Pepe’s friends. Through functioncurrying and composition, the previous definition can be represented more clearlywith the following function rule,

f (i) =“εname ◦ vin ◦ efriend

lab+ ◦ eout

”(i).

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 28: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Definition of Graph TraversalsI When edges are labeled and elements have properties, it is desirable to constrain

the traversal to edges of a particular label or elements with particular properties.These operations are known as filters and are abstractly defined in the followingitemization:

I elab± : P̂(E)× Σ→ P̂(E): allow (or filter) all edges with the label σ ∈ Σ.I εp± : P̂(V ∪ E)× R × S → P̂(V ∪ E): allow (or filter) all elements with the property

s ∈ S for key r ∈ R.I εε± : P̂(V ∪ E)× (V ∪ E)→ P̂(V ∪ E): allow (or filter) all elements that are the

provided element.

I Through function composition, we can define graph traversals of arbitrary length,i.e. Alberto Pepe’s friends:

f : P̂(V )→ P̂(S),

wheref (i) = ε (vin (elab+ (eout(i), friend)) , name) ,

then f (i) will return the names of Alberto Pepe’s friends. Through functioncurrying and composition, the previous definition can be represented more clearlywith the following function rule,

f (i) =“εname ◦ vin ◦ efriend

lab+ ◦ eout

”(i).

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 29: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Definition of Graph Traversals

1

name=Alberto Pepe

2

name=...

3

name=...

4

name=...

friend

friend

friendeout

efriendlab+

vin �name

Figure: A single path along along the f traversal.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 30: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing for Recommendation

I Recommendation systems are designed to help people deal with the problem ofinformation overload by filtering information in the system that doesn’t pertain tothe person.

I Content-based recommendation deals with recommending resources that sharecharacteristics (i.e. content) with a set of resources.

I On the other side collaborative filtering-based recommendation in concerned withdetermining the similarity of resources based upon the similarity of the taste ofthe people modeled within the system.

I Graph databases provide a good paradigm for dealing with both recommendationtechniques.

I Furthermore, using the graph traversal pattern, there exists a single graph datastructure that can be traversed in different ways to expose different types ofrecommendations—generally, different types of relationships between vertices.Being able to mix and match the types of traversals executed alters the semanticsof the final rankings and conveniently allows for hybrid recommendationalgorithms to emerge.

I Following figure presents a toy graph data set, where there exist a set of people,resources, and features related to each other by likes- and feature-labelededges.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 31: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing for Recommendation

I Recommendation systems are designed to help people deal with the problem ofinformation overload by filtering information in the system that doesn’t pertain tothe person.

I Content-based recommendation deals with recommending resources that sharecharacteristics (i.e. content) with a set of resources.

I On the other side collaborative filtering-based recommendation in concerned withdetermining the similarity of resources based upon the similarity of the taste ofthe people modeled within the system.

I Graph databases provide a good paradigm for dealing with both recommendationtechniques.

I Furthermore, using the graph traversal pattern, there exists a single graph datastructure that can be traversed in different ways to expose different types ofrecommendations—generally, different types of relationships between vertices.Being able to mix and match the types of traversals executed alters the semanticsof the final rankings and conveniently allows for hybrid recommendationalgorithms to emerge.

I Following figure presents a toy graph data set, where there exist a set of people,resources, and features related to each other by likes- and feature-labelededges.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 32: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing for Recommendation

I Recommendation systems are designed to help people deal with the problem ofinformation overload by filtering information in the system that doesn’t pertain tothe person.

I Content-based recommendation deals with recommending resources that sharecharacteristics (i.e. content) with a set of resources.

I On the other side collaborative filtering-based recommendation in concerned withdetermining the similarity of resources based upon the similarity of the taste ofthe people modeled within the system.

I Graph databases provide a good paradigm for dealing with both recommendationtechniques.

I Furthermore, using the graph traversal pattern, there exists a single graph datastructure that can be traversed in different ways to expose different types ofrecommendations—generally, different types of relationships between vertices.Being able to mix and match the types of traversals executed alters the semanticsof the final rankings and conveniently allows for hybrid recommendationalgorithms to emerge.

I Following figure presents a toy graph data set, where there exist a set of people,resources, and features related to each other by likes- and feature-labelededges.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 33: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing for Recommendation

I Recommendation systems are designed to help people deal with the problem ofinformation overload by filtering information in the system that doesn’t pertain tothe person.

I Content-based recommendation deals with recommending resources that sharecharacteristics (i.e. content) with a set of resources.

I On the other side collaborative filtering-based recommendation in concerned withdetermining the similarity of resources based upon the similarity of the taste ofthe people modeled within the system.

I Graph databases provide a good paradigm for dealing with both recommendationtechniques.

I Furthermore, using the graph traversal pattern, there exists a single graph datastructure that can be traversed in different ways to expose different types ofrecommendations—generally, different types of relationships between vertices.Being able to mix and match the types of traversals executed alters the semanticsof the final rankings and conveniently allows for hybrid recommendationalgorithms to emerge.

I Following figure presents a toy graph data set, where there exist a set of people,resources, and features related to each other by likes- and feature-labelededges.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 34: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing for Recommendation

I Recommendation systems are designed to help people deal with the problem ofinformation overload by filtering information in the system that doesn’t pertain tothe person.

I Content-based recommendation deals with recommending resources that sharecharacteristics (i.e. content) with a set of resources.

I On the other side collaborative filtering-based recommendation in concerned withdetermining the similarity of resources based upon the similarity of the taste ofthe people modeled within the system.

I Graph databases provide a good paradigm for dealing with both recommendationtechniques.

I Furthermore, using the graph traversal pattern, there exists a single graph datastructure that can be traversed in different ways to expose different types ofrecommendations—generally, different types of relationships between vertices.Being able to mix and match the types of traversals executed alters the semanticsof the final rankings and conveniently allows for hybrid recommendationalgorithms to emerge.

I Following figure presents a toy graph data set, where there exist a set of people,resources, and features related to each other by likes- and feature-labelededges.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 35: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing for Recommendation

I Recommendation systems are designed to help people deal with the problem ofinformation overload by filtering information in the system that doesn’t pertain tothe person.

I Content-based recommendation deals with recommending resources that sharecharacteristics (i.e. content) with a set of resources.

I On the other side collaborative filtering-based recommendation in concerned withdetermining the similarity of resources based upon the similarity of the taste ofthe people modeled within the system.

I Graph databases provide a good paradigm for dealing with both recommendationtechniques.

I Furthermore, using the graph traversal pattern, there exists a single graph datastructure that can be traversed in different ways to expose different types ofrecommendations—generally, different types of relationships between vertices.Being able to mix and match the types of traversals executed alters the semanticsof the final rankings and conveniently allows for hybrid recommendationalgorithms to emerge.

I Following figure presents a toy graph data set, where there exist a set of people,resources, and features related to each other by likes- and feature-labelededges.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 36: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing for Recommendation

2

1

2

3

4

likes

likes

likes

7likes

6likes

feature 8

feature

feature 5

p p

p

r

r

r

f

f

likes

Figure: A graph data structure containing people (p), their liked resources (r), and each resource’sfeatures (f).

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 37: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

I In order to identify resources that that are similar in features (i.e. content-basedrecommendation) to a resource, traverse to all resources that share the samefeatures. This is accomplished with the following function, f : P̂(V )→ P̂(V ),where

f (i) =“εiε− ◦ vout ◦ efeature

lab+ ◦ ein ◦ vin ◦ efeaturelab+ ◦ eout

”(i).

I Assuming i = 3, function f states, traverse to the outgoing edges of resourcevertex 3, only allow feature-labeled edges, and then traverse to the incomingvertices of those feature-labeled edges.

I At this point, the traverser is at feature vertex 8. Next, traverse to the incomingedges of feature vertex 8, only allow feature-labeled edges, and then traverse tothe outgoing vertices of these feature-labeled edges and at that point, thetraverser is at resource vertices 3 and 2.

I However, since we are trying to identify those resources similar in content tovertex 3, we need to filter out vertex 3. This is accomplished by the last stage ofthe function composition. Thus, given the toy graph data set, vertex 2 is similarto vertex 3 in content.

I Following diagram illustrates such traversal:

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 38: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

I In order to identify resources that that are similar in features (i.e. content-basedrecommendation) to a resource, traverse to all resources that share the samefeatures. This is accomplished with the following function, f : P̂(V )→ P̂(V ),where

f (i) =“εiε− ◦ vout ◦ efeature

lab+ ◦ ein ◦ vin ◦ efeaturelab+ ◦ eout

”(i).

I Assuming i = 3, function f states, traverse to the outgoing edges of resourcevertex 3, only allow feature-labeled edges, and then traverse to the incomingvertices of those feature-labeled edges.

I At this point, the traverser is at feature vertex 8. Next, traverse to the incomingedges of feature vertex 8, only allow feature-labeled edges, and then traverse tothe outgoing vertices of these feature-labeled edges and at that point, thetraverser is at resource vertices 3 and 2.

I However, since we are trying to identify those resources similar in content tovertex 3, we need to filter out vertex 3. This is accomplished by the last stage ofthe function composition. Thus, given the toy graph data set, vertex 2 is similarto vertex 3 in content.

I Following diagram illustrates such traversal:

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 39: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

I In order to identify resources that that are similar in features (i.e. content-basedrecommendation) to a resource, traverse to all resources that share the samefeatures. This is accomplished with the following function, f : P̂(V )→ P̂(V ),where

f (i) =“εiε− ◦ vout ◦ efeature

lab+ ◦ ein ◦ vin ◦ efeaturelab+ ◦ eout

”(i).

I Assuming i = 3, function f states, traverse to the outgoing edges of resourcevertex 3, only allow feature-labeled edges, and then traverse to the incomingvertices of those feature-labeled edges.

I At this point, the traverser is at feature vertex 8. Next, traverse to the incomingedges of feature vertex 8, only allow feature-labeled edges, and then traverse tothe outgoing vertices of these feature-labeled edges and at that point, thetraverser is at resource vertices 3 and 2.

I However, since we are trying to identify those resources similar in content tovertex 3, we need to filter out vertex 3. This is accomplished by the last stage ofthe function composition. Thus, given the toy graph data set, vertex 2 is similarto vertex 3 in content.

I Following diagram illustrates such traversal:

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 40: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

I In order to identify resources that that are similar in features (i.e. content-basedrecommendation) to a resource, traverse to all resources that share the samefeatures. This is accomplished with the following function, f : P̂(V )→ P̂(V ),where

f (i) =“εiε− ◦ vout ◦ efeature

lab+ ◦ ein ◦ vin ◦ efeaturelab+ ◦ eout

”(i).

I Assuming i = 3, function f states, traverse to the outgoing edges of resourcevertex 3, only allow feature-labeled edges, and then traverse to the incomingvertices of those feature-labeled edges.

I At this point, the traverser is at feature vertex 8. Next, traverse to the incomingedges of feature vertex 8, only allow feature-labeled edges, and then traverse tothe outgoing vertices of these feature-labeled edges and at that point, thetraverser is at resource vertices 3 and 2.

I However, since we are trying to identify those resources similar in content tovertex 3, we need to filter out vertex 3. This is accomplished by the last stage ofthe function composition. Thus, given the toy graph data set, vertex 2 is similarto vertex 3 in content.

I Following diagram illustrates such traversal:

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 41: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

I In order to identify resources that that are similar in features (i.e. content-basedrecommendation) to a resource, traverse to all resources that share the samefeatures. This is accomplished with the following function, f : P̂(V )→ P̂(V ),where

f (i) =“εiε− ◦ vout ◦ efeature

lab+ ◦ ein ◦ vin ◦ efeaturelab+ ◦ eout

”(i).

I Assuming i = 3, function f states, traverse to the outgoing edges of resourcevertex 3, only allow feature-labeled edges, and then traverse to the incomingvertices of those feature-labeled edges.

I At this point, the traverser is at feature vertex 8. Next, traverse to the incomingedges of feature vertex 8, only allow feature-labeled edges, and then traverse tothe outgoing vertices of these feature-labeled edges and at that point, thetraverser is at resource vertices 3 and 2.

I However, since we are trying to identify those resources similar in content tovertex 3, we need to filter out vertex 3. This is accomplished by the last stage ofthe function composition. Thus, given the toy graph data set, vertex 2 is similarto vertex 3 in content.

I Following diagram illustrates such traversal:

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 42: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

22

3

feature 8

feature

r

r

f

eout

efeaturelab+

vin

ein

efeaturelab+

vout

�i�−

Figure: A traversal that identifies resources that are similar in content to a set of resources basedupon shared features.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 43: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

I It’s simple to extend content-based recommendation to problems such as: “Givenwhat person i likes, what other resources have similar features?”

I Such a problem is solved using the previous function f defined above combinedwith a new composition that finds all the resources that person i likes. Thus, ifg : P̂(V )→ P̂(V ), where

g(i) =“vin ◦ e likes

lab+ ◦ eout

”(i),

then to determine those resources similar in features to the resources that personvertex 7 likes, compose function f and g : (f ◦ g).

I Those resources that share more features in common will be returned more byf ◦ g .

I Using this approach, function can return similar results so, if necessary,deduplication has to be handled with a separate mechanism.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 44: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

I It’s simple to extend content-based recommendation to problems such as: “Givenwhat person i likes, what other resources have similar features?”

I Such a problem is solved using the previous function f defined above combinedwith a new composition that finds all the resources that person i likes. Thus, ifg : P̂(V )→ P̂(V ), where

g(i) =“vin ◦ e likes

lab+ ◦ eout

”(i),

then to determine those resources similar in features to the resources that personvertex 7 likes, compose function f and g : (f ◦ g).

I Those resources that share more features in common will be returned more byf ◦ g .

I Using this approach, function can return similar results so, if necessary,deduplication has to be handled with a separate mechanism.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 45: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

I It’s simple to extend content-based recommendation to problems such as: “Givenwhat person i likes, what other resources have similar features?”

I Such a problem is solved using the previous function f defined above combinedwith a new composition that finds all the resources that person i likes. Thus, ifg : P̂(V )→ P̂(V ), where

g(i) =“vin ◦ e likes

lab+ ◦ eout

”(i),

then to determine those resources similar in features to the resources that personvertex 7 likes, compose function f and g : (f ◦ g).

I Those resources that share more features in common will be returned more byf ◦ g .

I Using this approach, function can return similar results so, if necessary,deduplication has to be handled with a separate mechanism.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 46: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Content-Based Recommendation

I It’s simple to extend content-based recommendation to problems such as: “Givenwhat person i likes, what other resources have similar features?”

I Such a problem is solved using the previous function f defined above combinedwith a new composition that finds all the resources that person i likes. Thus, ifg : P̂(V )→ P̂(V ), where

g(i) =“vin ◦ e likes

lab+ ◦ eout

”(i),

then to determine those resources similar in features to the resources that personvertex 7 likes, compose function f and g : (f ◦ g).

I Those resources that share more features in common will be returned more byf ◦ g .

I Using this approach, function can return similar results so, if necessary,deduplication has to be handled with a separate mechanism.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 47: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Collaborative Filtering-Based Recommendation

I With collaborative filtering, the objective is to identify a set of resources thathave a high probability of being liked by a person based upon identifying otherpeople in the system that enjoy similar likes.

I In example, if person a and person b share 90% of their liked resources incommon, then the remaining 10% they don’t share in common are candidates forrecommendation.

I For simplicity, this traversal is broken into two components f and g .

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 48: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Collaborative Filtering-Based Recommendation

I With collaborative filtering, the objective is to identify a set of resources thathave a high probability of being liked by a person based upon identifying otherpeople in the system that enjoy similar likes.

I In example, if person a and person b share 90% of their liked resources incommon, then the remaining 10% they don’t share in common are candidates forrecommendation.

I For simplicity, this traversal is broken into two components f and g .

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 49: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Collaborative Filtering-Based Recommendation

I With collaborative filtering, the objective is to identify a set of resources thathave a high probability of being liked by a person based upon identifying otherpeople in the system that enjoy similar likes.

I In example, if person a and person b share 90% of their liked resources incommon, then the remaining 10% they don’t share in common are candidates forrecommendation.

I For simplicity, this traversal is broken into two components f and g .

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 50: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Collaborative Filtering-Based RecommendationI First component is f : P̂(V )→ P̂(V ) where

f (i) =“εiε− ◦ vout ◦ e like

lab+ ◦ ein ◦ vin ◦ e likelab+ ◦ eout

”(i).

Function f traverses to all those people vertices that like the same resources asperson vertex i and who themselves are not vertex i (as a person is obviouslysimilar to themselves and thus, doesn’t contribute anything to the computation).The more resources liked that a person shares in common with i , the moretraversers will be located at that person’s vertex. In other words, if person i andperson j share 10 liked resources in common, then f (i) will return person j 10times.

I Next, function g is defined as

g(j) =“vin ◦ e like

lab+ ◦ eout

”(j).

Function g traverses to all the resources liked by vertex j . In composition,(g ◦ f )(i) determines all those resources that are liked by those people that havesimilar tastes to vertex i . If person j likes 10 resources in common with person i ,then the resources that person j likes will be returned at least 10 times by g ◦ f(perhaps more if a path exists to those resources from another person vertex aswell).

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 51: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Collaborative Filtering-Based RecommendationI First component is f : P̂(V )→ P̂(V ) where

f (i) =“εiε− ◦ vout ◦ e like

lab+ ◦ ein ◦ vin ◦ e likelab+ ◦ eout

”(i).

Function f traverses to all those people vertices that like the same resources asperson vertex i and who themselves are not vertex i (as a person is obviouslysimilar to themselves and thus, doesn’t contribute anything to the computation).The more resources liked that a person shares in common with i , the moretraversers will be located at that person’s vertex. In other words, if person i andperson j share 10 liked resources in common, then f (i) will return person j 10times.

I Next, function g is defined as

g(j) =“vin ◦ e like

lab+ ◦ eout

”(j).

Function g traverses to all the resources liked by vertex j . In composition,(g ◦ f )(i) determines all those resources that are liked by those people that havesimilar tastes to vertex i . If person j likes 10 resources in common with person i ,then the resources that person j likes will be returned at least 10 times by g ◦ f(perhaps more if a path exists to those resources from another person vertex aswell).

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 52: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Collaborative Filtering-Based RecommendationI Following figure diagrams a function path starting from vertex 7. Only one legal

path is presented for the sake of diagram clarity

2

1

2

3

4

likes

likes

likes

7likes

6likes

p p

p

r

r

r

likes

eout

elikeslab+

vin

ein

elikeslab+

vout

eoutelikeslab+vin �i

�−�j�−

f

g

Figure: A traversal that identifies resources that are similar in content to a resource basedupon shared features.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 53: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I A graph is a general-purpose data structure. A graph can be used to model lists,maps, trees, etc. As such, a graph can model an index.

I It was previously assumed that a graph database makes use of an externalindexing system to index the properties of its vertices and edges with assumptionthat specialized indexing systems are better suited for special-purpose queriessuch as those involving full-text search.

I In many cases, there is nothing that prevents the representation of an indexwithin the graph itself—vertices and edges can be indexed by other vertices andedges and given the nature of how vertices and edges directly reference eachother in a graph database, index look-up speeds are comparable.

I Endogenous indices afford graph databases a great flexibility in modeling adomain. Not only can objects and their relationships be modeled (e.g. people andtheir friendships), but also the indices that partition the objects into meaningfulsubsets (e.g. people within a 2D region of space).

I The domain of spatial analysis makes use of advanced indexing structures such asthe quadtree.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 54: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I A graph is a general-purpose data structure. A graph can be used to model lists,maps, trees, etc. As such, a graph can model an index.

I It was previously assumed that a graph database makes use of an externalindexing system to index the properties of its vertices and edges with assumptionthat specialized indexing systems are better suited for special-purpose queriessuch as those involving full-text search.

I In many cases, there is nothing that prevents the representation of an indexwithin the graph itself—vertices and edges can be indexed by other vertices andedges and given the nature of how vertices and edges directly reference eachother in a graph database, index look-up speeds are comparable.

I Endogenous indices afford graph databases a great flexibility in modeling adomain. Not only can objects and their relationships be modeled (e.g. people andtheir friendships), but also the indices that partition the objects into meaningfulsubsets (e.g. people within a 2D region of space).

I The domain of spatial analysis makes use of advanced indexing structures such asthe quadtree.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 55: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I A graph is a general-purpose data structure. A graph can be used to model lists,maps, trees, etc. As such, a graph can model an index.

I It was previously assumed that a graph database makes use of an externalindexing system to index the properties of its vertices and edges with assumptionthat specialized indexing systems are better suited for special-purpose queriessuch as those involving full-text search.

I In many cases, there is nothing that prevents the representation of an indexwithin the graph itself—vertices and edges can be indexed by other vertices andedges and given the nature of how vertices and edges directly reference eachother in a graph database, index look-up speeds are comparable.

I Endogenous indices afford graph databases a great flexibility in modeling adomain. Not only can objects and their relationships be modeled (e.g. people andtheir friendships), but also the indices that partition the objects into meaningfulsubsets (e.g. people within a 2D region of space).

I The domain of spatial analysis makes use of advanced indexing structures such asthe quadtree.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 56: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I A graph is a general-purpose data structure. A graph can be used to model lists,maps, trees, etc. As such, a graph can model an index.

I It was previously assumed that a graph database makes use of an externalindexing system to index the properties of its vertices and edges with assumptionthat specialized indexing systems are better suited for special-purpose queriessuch as those involving full-text search.

I In many cases, there is nothing that prevents the representation of an indexwithin the graph itself—vertices and edges can be indexed by other vertices andedges and given the nature of how vertices and edges directly reference eachother in a graph database, index look-up speeds are comparable.

I Endogenous indices afford graph databases a great flexibility in modeling adomain. Not only can objects and their relationships be modeled (e.g. people andtheir friendships), but also the indices that partition the objects into meaningfulsubsets (e.g. people within a 2D region of space).

I The domain of spatial analysis makes use of advanced indexing structures such asthe quadtree.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 57: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I A graph is a general-purpose data structure. A graph can be used to model lists,maps, trees, etc. As such, a graph can model an index.

I It was previously assumed that a graph database makes use of an externalindexing system to index the properties of its vertices and edges with assumptionthat specialized indexing systems are better suited for special-purpose queriessuch as those involving full-text search.

I In many cases, there is nothing that prevents the representation of an indexwithin the graph itself—vertices and edges can be indexed by other vertices andedges and given the nature of how vertices and edges directly reference eachother in a graph database, index look-up speeds are comparable.

I Endogenous indices afford graph databases a great flexibility in modeling adomain. Not only can objects and their relationships be modeled (e.g. people andtheir friendships), but also the indices that partition the objects into meaningfulsubsets (e.g. people within a 2D region of space).

I The domain of spatial analysis makes use of advanced indexing structures such asthe quadtree.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 58: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I Quadtrees partition a two-dimensional plane into rectangular boxes based uponthe spatial density of the points being indexed. Following figure diagrams howspace is partitioned as the density of points increases within a region of the index.:

Figure: A quadtree partition of a plane. This figure is an adaptation of a public domainimage provided courtesy of David Eppstein.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 59: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI In order to demonstrate how a quadtree index can be represented and traversed, a

toy graph data set is presented. This data set is diagrammed in following Figure:

a b

c

e

f

h

i

d

g

1

2 4

5 86 7

3

type=quadbl=[0,0]

tr=[100,100]

[100,100]

[0,0]

[0,100]

[100,0]

1

2

3

4

5

6

7

8

type=quadbl=[0,0]

tr=[50,100]

type=quadbl=[50,0]

tr=[100,100]

type=quadbl=[0,50]tr=[50,100]

type=quadbl=[50,0]tr=[100,50]

type=quadbl=[0,0]tr=[50,50]

type=quadbl=[50,50]tr=[100,100]

type=quadbl=[50,25]tr=[75,50]

bl=[25,20]tr=[90,45]

sub sub

9

9

type=quadbl=[50,25]tr=[62,37]

Figure: A quadtree index of a space that contains points of interest. The index is composedof the vertices 1-9 and the points of interest are the vertices a-i . While not diagrammed forthe sake of clarity, all edges are labeled sub (meaning subsumes) and each point of interestvertex has an associated bottom-left (bl) property, top-right (tr) property, and a typeproperty which is equal to “poi.”

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 60: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI The top half of Figure represents a quadtree index (vertices 1-9). This quadtree

index is partitioning “points of interest” (vertices a-i) located within thediagrammed plane.

I All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.For a quadtree vertex, these properties identify the two corner points defining arectangular bounding box (i.e. the region that the quadtree vertex is indexing)and the vertex type which is equal to “quad”.

I For a point of interest vertex, these properties denote the region of space that thepoint of interest exists within and the vertex type which is equal to “poi.”

I Quadtree vertex 1 denotes the entire region of space being indexed. This regionis defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.

I Within the region defined by vertex 1, there are 8 other defined regions thatpartition that space into smaller spaces (vertices 2-9).

I When one vertex subsumes another vertex by a directed edge labeled sub

(i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that isdefined by the incoming (i.e. head) vertex.

I Given these properties and edges, identifying point of interest vertices within aregion of space is simply a matter of traversing the quadtree index in adirected/algorithmic fashion.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 61: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI The top half of Figure represents a quadtree index (vertices 1-9). This quadtree

index is partitioning “points of interest” (vertices a-i) located within thediagrammed plane.

I All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.For a quadtree vertex, these properties identify the two corner points defining arectangular bounding box (i.e. the region that the quadtree vertex is indexing)and the vertex type which is equal to “quad”.

I For a point of interest vertex, these properties denote the region of space that thepoint of interest exists within and the vertex type which is equal to “poi.”

I Quadtree vertex 1 denotes the entire region of space being indexed. This regionis defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.

I Within the region defined by vertex 1, there are 8 other defined regions thatpartition that space into smaller spaces (vertices 2-9).

I When one vertex subsumes another vertex by a directed edge labeled sub

(i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that isdefined by the incoming (i.e. head) vertex.

I Given these properties and edges, identifying point of interest vertices within aregion of space is simply a matter of traversing the quadtree index in adirected/algorithmic fashion.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 62: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI The top half of Figure represents a quadtree index (vertices 1-9). This quadtree

index is partitioning “points of interest” (vertices a-i) located within thediagrammed plane.

I All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.For a quadtree vertex, these properties identify the two corner points defining arectangular bounding box (i.e. the region that the quadtree vertex is indexing)and the vertex type which is equal to “quad”.

I For a point of interest vertex, these properties denote the region of space that thepoint of interest exists within and the vertex type which is equal to “poi.”

I Quadtree vertex 1 denotes the entire region of space being indexed. This regionis defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.

I Within the region defined by vertex 1, there are 8 other defined regions thatpartition that space into smaller spaces (vertices 2-9).

I When one vertex subsumes another vertex by a directed edge labeled sub

(i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that isdefined by the incoming (i.e. head) vertex.

I Given these properties and edges, identifying point of interest vertices within aregion of space is simply a matter of traversing the quadtree index in adirected/algorithmic fashion.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 63: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI The top half of Figure represents a quadtree index (vertices 1-9). This quadtree

index is partitioning “points of interest” (vertices a-i) located within thediagrammed plane.

I All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.For a quadtree vertex, these properties identify the two corner points defining arectangular bounding box (i.e. the region that the quadtree vertex is indexing)and the vertex type which is equal to “quad”.

I For a point of interest vertex, these properties denote the region of space that thepoint of interest exists within and the vertex type which is equal to “poi.”

I Quadtree vertex 1 denotes the entire region of space being indexed. This regionis defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.

I Within the region defined by vertex 1, there are 8 other defined regions thatpartition that space into smaller spaces (vertices 2-9).

I When one vertex subsumes another vertex by a directed edge labeled sub

(i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that isdefined by the incoming (i.e. head) vertex.

I Given these properties and edges, identifying point of interest vertices within aregion of space is simply a matter of traversing the quadtree index in adirected/algorithmic fashion.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 64: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI The top half of Figure represents a quadtree index (vertices 1-9). This quadtree

index is partitioning “points of interest” (vertices a-i) located within thediagrammed plane.

I All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.For a quadtree vertex, these properties identify the two corner points defining arectangular bounding box (i.e. the region that the quadtree vertex is indexing)and the vertex type which is equal to “quad”.

I For a point of interest vertex, these properties denote the region of space that thepoint of interest exists within and the vertex type which is equal to “poi.”

I Quadtree vertex 1 denotes the entire region of space being indexed. This regionis defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.

I Within the region defined by vertex 1, there are 8 other defined regions thatpartition that space into smaller spaces (vertices 2-9).

I When one vertex subsumes another vertex by a directed edge labeled sub

(i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that isdefined by the incoming (i.e. head) vertex.

I Given these properties and edges, identifying point of interest vertices within aregion of space is simply a matter of traversing the quadtree index in adirected/algorithmic fashion.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 65: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI The top half of Figure represents a quadtree index (vertices 1-9). This quadtree

index is partitioning “points of interest” (vertices a-i) located within thediagrammed plane.

I All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.For a quadtree vertex, these properties identify the two corner points defining arectangular bounding box (i.e. the region that the quadtree vertex is indexing)and the vertex type which is equal to “quad”.

I For a point of interest vertex, these properties denote the region of space that thepoint of interest exists within and the vertex type which is equal to “poi.”

I Quadtree vertex 1 denotes the entire region of space being indexed. This regionis defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.

I Within the region defined by vertex 1, there are 8 other defined regions thatpartition that space into smaller spaces (vertices 2-9).

I When one vertex subsumes another vertex by a directed edge labeled sub

(i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that isdefined by the incoming (i.e. head) vertex.

I Given these properties and edges, identifying point of interest vertices within aregion of space is simply a matter of traversing the quadtree index in adirected/algorithmic fashion.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 66: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI The top half of Figure represents a quadtree index (vertices 1-9). This quadtree

index is partitioning “points of interest” (vertices a-i) located within thediagrammed plane.

I All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.For a quadtree vertex, these properties identify the two corner points defining arectangular bounding box (i.e. the region that the quadtree vertex is indexing)and the vertex type which is equal to “quad”.

I For a point of interest vertex, these properties denote the region of space that thepoint of interest exists within and the vertex type which is equal to “poi.”

I Quadtree vertex 1 denotes the entire region of space being indexed. This regionis defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.

I Within the region defined by vertex 1, there are 8 other defined regions thatpartition that space into smaller spaces (vertices 2-9).

I When one vertex subsumes another vertex by a directed edge labeled sub

(i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that isdefined by the incoming (i.e. head) vertex.

I Given these properties and edges, identifying point of interest vertices within aregion of space is simply a matter of traversing the quadtree index in adirected/algorithmic fashion.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 67: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I In Figure the shaded region represents the spatial query: “Which points ofinterest are within the rectangular region defined by the corner pointsbl = [25, 20] and tr = [90, 45]?”

I In order to locate all the points of interest in this region, iteratively execute thefollowing traversal starting from the root of the quadtree index (i.e. vertex 1).The function is defined as f : P̂(V )→ P̂(V ), where

f (i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ vin ◦ esublab+ ◦ eout

”(i).

I The defining aspect of f is the set of 4 εp+ filters that determine whether thecurrent vertex is overlapping or within the query rectangle. Those vertices notoverlapping or within the query rectangle are not traversed to. Thus, as thetraversal iterates, fewer and fewer paths are examined and the resulting point ofinterest vertices within the query rectangle are converged upon.

I A summary of the legal vertices traversed to at each iteration is enumerate below.

1. 2, 3, 42. 6, 9, 83. c, d , h

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 68: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I In Figure the shaded region represents the spatial query: “Which points ofinterest are within the rectangular region defined by the corner pointsbl = [25, 20] and tr = [90, 45]?”

I In order to locate all the points of interest in this region, iteratively execute thefollowing traversal starting from the root of the quadtree index (i.e. vertex 1).The function is defined as f : P̂(V )→ P̂(V ), where

f (i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ vin ◦ esublab+ ◦ eout

”(i).

I The defining aspect of f is the set of 4 εp+ filters that determine whether thecurrent vertex is overlapping or within the query rectangle. Those vertices notoverlapping or within the query rectangle are not traversed to. Thus, as thetraversal iterates, fewer and fewer paths are examined and the resulting point ofinterest vertices within the query rectangle are converged upon.

I A summary of the legal vertices traversed to at each iteration is enumerate below.

1. 2, 3, 42. 6, 9, 83. c, d , h

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 69: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I In Figure the shaded region represents the spatial query: “Which points ofinterest are within the rectangular region defined by the corner pointsbl = [25, 20] and tr = [90, 45]?”

I In order to locate all the points of interest in this region, iteratively execute thefollowing traversal starting from the root of the quadtree index (i.e. vertex 1).The function is defined as f : P̂(V )→ P̂(V ), where

f (i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ vin ◦ esublab+ ◦ eout

”(i).

I The defining aspect of f is the set of 4 εp+ filters that determine whether thecurrent vertex is overlapping or within the query rectangle. Those vertices notoverlapping or within the query rectangle are not traversed to. Thus, as thetraversal iterates, fewer and fewer paths are examined and the resulting point ofinterest vertices within the query rectangle are converged upon.

I A summary of the legal vertices traversed to at each iteration is enumerate below.

1. 2, 3, 42. 6, 9, 83. c, d , h

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 70: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I In Figure the shaded region represents the spatial query: “Which points ofinterest are within the rectangular region defined by the corner pointsbl = [25, 20] and tr = [90, 45]?”

I In order to locate all the points of interest in this region, iteratively execute thefollowing traversal starting from the root of the quadtree index (i.e. vertex 1).The function is defined as f : P̂(V )→ P̂(V ), where

f (i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ vin ◦ esublab+ ◦ eout

”(i).

I The defining aspect of f is the set of 4 εp+ filters that determine whether thecurrent vertex is overlapping or within the query rectangle. Those vertices notoverlapping or within the query rectangle are not traversed to. Thus, as thetraversal iterates, fewer and fewer paths are examined and the resulting point ofinterest vertices within the query rectangle are converged upon.

I A summary of the legal vertices traversed to at each iteration is enumerate below.

1. 2, 3, 42. 6, 9, 83. c, d , h

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 71: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI There is a more efficient traversal that can be evaluated.

f (i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ εtype=quadp+ ◦ vin ◦ esub

lab+ ◦ eout

”(i)

g(i) =“ε

try≤45p+ ◦ εtrx≤90

p+ ◦ εbly≥20p+ ◦ εblx≥25

p+

”(i)

h(i) =“εtype=quad

p+ ◦ vin ◦ esublab+ ◦ eout

”(i)

s(i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ εtype=poip+ ◦ vin ◦ esub

lab+ ◦ eout

”(i)

r(i) =“εtype=poi

p+ ◦ vin ◦ esublab+ ◦ eout

”(i)

I Function f traverses to those quadtree vertices that overlap or are within thequery rectangle. Function g allows only those quadtree vertices that arecompletely within the query rectangle. Function h traverses to subsumedquadtree vertices. Function s traverses to point of interest vertices that areoverlapping or within the query rectangle.

I Finally, function r traverses to subsumed point of interest vertices. Note thatfunctions h and r do no check the bounding box properties of their domainvertices. As a quadtree becomes large, this becomes a more efficient solution tofinding all points of interest within a query rectangle.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 72: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI There is a more efficient traversal that can be evaluated.

f (i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ εtype=quadp+ ◦ vin ◦ esub

lab+ ◦ eout

”(i)

g(i) =“ε

try≤45p+ ◦ εtrx≤90

p+ ◦ εbly≥20p+ ◦ εblx≥25

p+

”(i)

h(i) =“εtype=quad

p+ ◦ vin ◦ esublab+ ◦ eout

”(i)

s(i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ εtype=poip+ ◦ vin ◦ esub

lab+ ◦ eout

”(i)

r(i) =“εtype=poi

p+ ◦ vin ◦ esublab+ ◦ eout

”(i)

I Function f traverses to those quadtree vertices that overlap or are within thequery rectangle. Function g allows only those quadtree vertices that arecompletely within the query rectangle. Function h traverses to subsumedquadtree vertices. Function s traverses to point of interest vertices that areoverlapping or within the query rectangle.

I Finally, function r traverses to subsumed point of interest vertices. Note thatfunctions h and r do no check the bounding box properties of their domainvertices. As a quadtree becomes large, this becomes a more efficient solution tofinding all points of interest within a query rectangle.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 73: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous IndicesI There is a more efficient traversal that can be evaluated.

f (i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ εtype=quadp+ ◦ vin ◦ esub

lab+ ◦ eout

”(i)

g(i) =“ε

try≤45p+ ◦ εtrx≤90

p+ ◦ εbly≥20p+ ◦ εblx≥25

p+

”(i)

h(i) =“εtype=quad

p+ ◦ vin ◦ esublab+ ◦ eout

”(i)

s(i) =“ε

try≥20p+ ◦ εtrx≥25

p+ ◦ εbly≤45p+ ◦ εblx≤90

p+ ◦ εtype=poip+ ◦ vin ◦ esub

lab+ ◦ eout

”(i)

r(i) =“εtype=poi

p+ ◦ vin ◦ esublab+ ◦ eout

”(i)

I Function f traverses to those quadtree vertices that overlap or are within thequery rectangle. Function g allows only those quadtree vertices that arecompletely within the query rectangle. Function h traverses to subsumedquadtree vertices. Function s traverses to point of interest vertices that areoverlapping or within the query rectangle.

I Finally, function r traverses to subsumed point of interest vertices. Note thatfunctions h and r do no check the bounding box properties of their domainvertices. As a quadtree becomes large, this becomes a more efficient solution tofinding all points of interest within a query rectangle.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 74: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I The ability to model an index endogenous to a graph allows the domain modelerto represent not only objects and their relations (e.g. people and theirfriendships), but also “meta-objects” and their relationships (e.g. index nodes andtheir subsumptions).

I In this way, the domain modeler can organize their model according to partitionsthat make sense to how the model will be used to solve problems.

I Moreover, by combining the traversal of an index with the traversal of a domain,there exists a single unified means by which problems are solved within a graphdatabase—the graph traversal pattern.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 75: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I The ability to model an index endogenous to a graph allows the domain modelerto represent not only objects and their relations (e.g. people and theirfriendships), but also “meta-objects” and their relationships (e.g. index nodes andtheir subsumptions).

I In this way, the domain modeler can organize their model according to partitionsthat make sense to how the model will be used to solve problems.

I Moreover, by combining the traversal of an index with the traversal of a domain,there exists a single unified means by which problems are solved within a graphdatabase—the graph traversal pattern.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 76: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Definition of Graph TraversalsTraversing for RecommendationContent-Based RecommendationCollaborative Filtering-Based RecommendationTraversing Endogenous Indices

Traversing Endogenous Indices

I The ability to model an index endogenous to a graph allows the domain modelerto represent not only objects and their relations (e.g. people and theirfriendships), but also “meta-objects” and their relationships (e.g. index nodes andtheir subsumptions).

I In this way, the domain modeler can organize their model according to partitionsthat make sense to how the model will be used to solve problems.

I Moreover, by combining the traversal of an index with the traversal of a domain,there exists a single unified means by which problems are solved within a graphdatabase—the graph traversal pattern.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 77: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Conclusion

I Graphs are a flexible modeling construct that can be used to model a domain andthe indices that partition that domain into an efficient, searchable space.

I When the relations between the objects of the domain are seen as vertexpartitions, then a graph is simply an index that relates vertices to vertices byedges. The way in which these vertices relate to each other determines whichgraph traversals are most efficient to execute and which problems can be solvedby the graph data structure.

I Graph databases and the graph traversal pattern do not require a global analysisof data. For many problems, only local subsets of the graph need to be traversedto yield a solution.

I By structuring the graph in such a way as to minimize traversal steps, limit theuse of external indices, and reduce the number of set-based operations, modelersgain great efficiency that is difficult to accomplish with other data managementsolutions.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 78: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Conclusion

I Graphs are a flexible modeling construct that can be used to model a domain andthe indices that partition that domain into an efficient, searchable space.

I When the relations between the objects of the domain are seen as vertexpartitions, then a graph is simply an index that relates vertices to vertices byedges. The way in which these vertices relate to each other determines whichgraph traversals are most efficient to execute and which problems can be solvedby the graph data structure.

I Graph databases and the graph traversal pattern do not require a global analysisof data. For many problems, only local subsets of the graph need to be traversedto yield a solution.

I By structuring the graph in such a way as to minimize traversal steps, limit theuse of external indices, and reduce the number of set-based operations, modelersgain great efficiency that is difficult to accomplish with other data managementsolutions.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 79: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Conclusion

I Graphs are a flexible modeling construct that can be used to model a domain andthe indices that partition that domain into an efficient, searchable space.

I When the relations between the objects of the domain are seen as vertexpartitions, then a graph is simply an index that relates vertices to vertices byedges. The way in which these vertices relate to each other determines whichgraph traversals are most efficient to execute and which problems can be solvedby the graph data structure.

I Graph databases and the graph traversal pattern do not require a global analysisof data. For many problems, only local subsets of the graph need to be traversedto yield a solution.

I By structuring the graph in such a way as to minimize traversal steps, limit theuse of external indices, and reduce the number of set-based operations, modelersgain great efficiency that is difficult to accomplish with other data managementsolutions.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)

Page 80: The Graph Traversal Pattern

OutlineIntroduction

The Realization of GraphsGraph Traversals

Conclusion

Conclusion

I Graphs are a flexible modeling construct that can be used to model a domain andthe indices that partition that domain into an efficient, searchable space.

I When the relations between the objects of the domain are seen as vertexpartitions, then a graph is simply an index that relates vertices to vertices byedges. The way in which these vertices relate to each other determines whichgraph traversals are most efficient to execute and which problems can be solvedby the graph data structure.

I Graph databases and the graph traversal pattern do not require a global analysisof data. For many problems, only local subsets of the graph need to be traversedto yield a solution.

I By structuring the graph in such a way as to minimize traversal steps, limit theuse of external indices, and reduce the number of set-based operations, modelersgain great efficiency that is difficult to accomplish with other data managementsolutions.

Igor Bogicevic ([email protected]) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)