Click here to load reader

GraphREL: A Relational Graph Query Processor

  • View
    2.150

  • Download
    0

Embed Size (px)

Text of GraphREL: A Relational Graph Query Processor

  • 1.GraphREL: A Decomposition-Based and Selectivity-Aware Relational Framework for Processing Sub-graph QueriesSherif SakrSchool of Computer Science and EngineeringUniversity of New South Wales . http://www.cse.unsw.edu.au/ssakr/BIT Seminars 09 - Free University of Bolzano, Italy 16 November 2009S. Sakr (CSE, UNSW) BIT Seminars09 16 November 2009 1 / 40

2. OutlinePrevious Work: Pathnder - Relational XQuery Compiler. Current Work: GraphREL - General Graph Query Processor. Future Work: Scalable Graph Query Processing for New Generation of Database Applications.S. Sakr (CSE, UNSW) BIT Seminars0916 November 2009 2 / 40 3. OutlinePrevious Work: Pathnder - Relational XQuery Compiler. Current Work: GraphREL - General Graph Query Processor. Future Work: Scalable Graph Query Processing for New Generation of Database Applications.S. Sakr (CSE, UNSW) BIT Seminars0916 November 2009 3 / 40 4. Pathnder: A Relational XQuery Processor XQuery Expression Pathfinder Relational AlgebraMIL Code GeneratorSQL Code Generator MIL ScriptsSQL Scripts Monet DBMS Conventional RDBMS http://pathnder-xquery.org/S. Sakr (CSE, UNSW)BIT Seminars09 16 November 2009 4 / 40 5. Pathnder: A Relational XQuery Processor XMLXQuery DocumentExpression PathfinderRelational Algebra + Special Properties [VLDB04]Estimation Rules Translation Templates[VLDB08]XPath AcceleratorCardinality Properties Encoding Tuples XQuery EstimatorSQL Generator +Statistical Guide [SIGMOD07] Statistical Guide[IJWIS09]Cardinality Properties Aware [JDM09]Statistical HistogramsSQL ScriptsRelationalResults XMLXML Conventional RDBMS SerializerStatistical Histograms S. Sakr (CSE, UNSW) System Administrator BIT Seminars0916 November 2009 5 / 40 6. OutlinePrevious Work: Pathnder - Relational XQuery Compiler. Current Work: GraphREL - General Graph Query Processor. Future Work: Scalable Graph Query Processing for New Generation of Database Applications.S. Sakr (CSE, UNSW) BIT Seminars0916 November 2009 6 / 40 7. GraphREL: MotivationsGraphs are among the most complicated and general form of data structures. Recently, they have been widely used to model many complex structured and schemaless data such as social networks, chemical compounds, biological pathways, spatial databases, semantic web and business process models. Retrieving related graphs containing a query graph from a large graph database is a key performance issue in all of these graph-based applications. The success of any graph database application is directly dependent on the eciency of the graph indexing and query processing mechanisms. RDBMSs have repeatedly shown that they are very ecient, scalable and successful in hosting dierent kinds of data.S. Sakr (CSE, UNSW) BIT Seminars0916 November 2009 7 / 40 8. Preliminaries: Graph Data ModelIn labelled graphs, vertices and edges represent the entities and the relationships between them respectively. The attributes associated with these entities and relationships are called labels. A graph database D is a collection of member graphs D = {g1 , g2 , ...gn } where each member graph gi is denoted as (V , E , Lv , Le ).V is the set of vertices.E V V is the set of edges joining two distinct vertices.Lv is the set of vertex labels.Le is the set of edge labels. labelled graphs are classied according to the direction of their edges into two main classes:1 Directed-labelled graphs such as XML, RDF and trac networks.2 Undirected-labelled graphs such as social networks and chemicalcompounds.S. Sakr (CSE, UNSW)BIT Seminars09 16 November 2009 8 / 40 9. Preliminaries: Graph QueriesIn principle, queries in graph databases can be broadly classied into the follow- ing main categories:Subgraph queries: this category searches for a specic pattern in thegraph database. The pattern can be either a small graph or a graphwhere some parts of it are uncertain, e.g., vertices with wildcardlabels. Supergraph queries: this category searches for the graph databasemembers of which their whole structures are contained in the inputquery. Similarity (Approximate Matching) queries: this category ndsgraphs which are similar, but not necessarily isomorphic to a givenquery graph.S. Sakr (CSE, UNSW)BIT Seminars09 16 November 2009 9 / 40 10. Preliminaries: Subgraph Search QueriesGiven a graph database D = {g1 , g2 , ..., gn } and a graph query q, it returns the query answer set A = {gi |q gi , gi D}.A graph q is described as a sub-graph of another graph database member gi if the set of vertices and edges of q form subset of the vertices and edges of gi .Formally, g1 (V1 , E1 , Lv 1 , Le1 ) is dened as sub-graph of g2 (V2 , E2 , Lv 2 , Le2 ) if and only if:1 For every distinct vertex x V1 with a label vl Lv 1 , there is adistinct vertex y V2 with a label vl Lv 2 . 2 For every distinct edge edge ab E1 with a label el Le1 , there is adistinct edge ab E2 with a label el Le2 . S. Sakr (CSE, UNSW) BIT Seminars0916 November 2009 10 / 40 11. Preliminaries: Subgraph Search Queries A A z f exx CAmAA A B C n mC fe A e x x x C zA nB mA xBx C n C Any zn x x n ex nx C mC D DDn m DC mD D D AD x xfm f m AxxDAB B g2 g1g2 g3g3qq (a) Sample graph database (b) Graph queryFigure: An example graph database and graph query S. Sakr (CSE, UNSW) BIT Seminars09 16 November 2009 11 / 40 12. Our Approach: GraphRELRelational encoding of graph data. SQL translation of sub-graph search queries. Filtering phase. Optional verication phase. Partitioned B-tree Indexes. Statistical Summaries. Decomposition-Based and Selectivity-Aware SQL Translation.S. Sakr (CSE, UNSW)BIT Seminars0916 November 2009 12 / 40 13. Relational Encoding of Graph Data The starting point of our relational framework is to nd an ecient and suitable encoding for each graph member gi in the graph database D. We use the Vertex-Edge mapping scheme for storing directed labelled graphs with the following structure: Vertices(graphID, vertexID, vertexLabel) Edges(graphID, sVertex, dVertex, edgeLabel)S. Sakr (CSE, UNSW)BIT Seminars09 16 November 2009 13 / 40 14. Relational Encoding of Graph Data graphID vertexID vLabel graphID sVertex dVertexeLabel mA n 1 1 1A1 1 2nm 1 1 3m g1 6 BA 2 1 2A 1 2 3ny zn 1 3D 1 4 3x5 CD 3 1 4A1 5 4x1 5C1 6 5yx x 1 5 2zA4 1 6B 1 1 6m 2 1A 2 1 2e fA 1e 2 2C2 2 3m2 3D2 4 3m5 BC 2 g2 2 4 2n 2 4Cx nm 2 5 4x 2 5B4 C mD 3 2 1 5f Vertices TableEdges Table S. Sakr (CSE, UNSW)BIT Seminars09 16 November 2009 14 / 40 15. SQL Translation of Graph QueriesFiltering Phase: a sub-graph query q consists of a set of vertices QV with size equal m and a set of edges QE equal n is evaluated using the following SQL translation template:SELECT DISTINCT V1 .graphID, Vi .vertexIDFROM Vertices as V1 ,..., Vertices as Vm , Edges as E1 ,..., Edges as EnWHEREm (V1 .graphID = Vi .graphID) i=2AND n (V1 .graphID = Ej .graphID) j=1AND m (Vi .vertexLabel = QVi .vertexLabel) i=1AND n (Ej .edgeLabel = QEj .edgeLabel) j=1AND n (Ej .sVertex = Vf .vertexID AND Ej .dVertex = Vf .vertexID); j=1 Verication Phase: an optional phase which is used to verify that each vertex in the set of ltered vertices for each candidate graph is distinct. It is applied only if more than one vertex of the set of query vertices QV have the same label. This can be easily achieved using their vertex ID. S. Sakr (CSE, UNSW) BIT Seminars0916 November 2009 15 / 40 16. Partitioned B-tree IndexesPartitioned B-tree indexing is a slight variant of the B-tree indexing structure. The main idea is the use of low-selectivity leading columns to maintain partitions within the associated B-tree. In labelled graphs, it is generally the case that the number of distinct vertices and edges labels are far less than the number of vertices and edges respectively. For example, having an index dened in terms of columns (vertexLabel, graphID) can reduce the access cost of sub-graph query with only one label to one disk page. On the contrary, an index dened in terms of the two columns (graphID, vertexLabel) requires scanning a large number of disk pages. Having partitioned B-trees indexes of the high-selectivity attributes achieves xed execution times which are no longer dependent on the size of the whole graph database.S. Sakr (CSE, UNSW)BIT Seminars09 16 November 2009 16 / 40 17. Limitations of SQL-Based Translation ApproachAn obvious problem of the SQL translation template is that it involves a large number of conjunctive SQL predicates and join operations between the encoding tables.Most of relational query engines will certainly fail to execute the SQL translation queries of medium size or large sub-graph queries because they are too long and too complex (this does not mean they must consequently be too expensive).Therefore, we need a decomposition mechanism to divide this large and complex SQL translation query into a sequence of intermediate queries. Applying this decomposition mechanism blindly may lead to inecient execution plans with very large, non-required and expensive intermediate results. We use statistical summary information to achieve an ecient decomposition process.S. Sakr (CSE, UNSW) BIT Seminars09 16 November 2009 17 / 40 18. Statistical SummariesIn general, one of the most eective techniques for optimizing the execution times of SQL queries is to select the relational execution based on the accurate selectivity information of the query predicates. We construct three Markov tables to store information about the frequency of occurrence of the distinct labels of vertices, distinct labels of edges and connection between pair of vertices (edges).Vertex Label FrequencyEdge LabelFrequency Edge LabelFrequencyConnectionA100a 40ab3B200c 5ac15C38 e 28ae45D4l 54ec14E50 m 140em103L6n 3la5M10 o 20pc18N250p 15px45O3x 8xy25P40 y 60xz2R55 z 15za1Markov Table summary of Markov Table summary ofMarkov Table summary of vertices labels edges labels pair-wise edge connections S. Sakr (CSE, UNSW) BIT Seminars0916 November 2009 18 / 40 19. Decomposition-Based and Selectivity-Aware SQL TranslationIdentify

Search related