24
Graph Databases: Graph Databases: Efficient storage Efficient storage and Rapid retrieval and Rapid retrieval Robert Levinson Robert Levinson Machine Intelligence Machine Intelligence Laboratory Laboratory University of University of California California Santa Cruz Santa Cruz

Graph Databases: Efficient storage and Rapid retrieval

  • Upload
    palani

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Graph Databases: Efficient storage  and Rapid retrieval . Robert Levinson Machine Intelligence Laboratory University of California Santa Cruz. THE CG MARS LANDER. English-CG-English Translation. High level architecture. English Discourse. English Queries. - PowerPoint PPT Presentation

Citation preview

Page 1: Graph Databases: Efficient storage    and Rapid retrieval

Graph Databases: Efficient Graph Databases: Efficient storage storage and Rapid and Rapid retrieval retrieval

Robert LevinsonRobert Levinson

Machine Intelligence Machine Intelligence LaboratoryLaboratory

University of CaliforniaUniversity of California

Santa CruzSanta Cruz

Page 2: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDERHigh level architectureHigh level architecture

ADB

ADB Processor

CG Parser& Processor

Query Processor& Matcher

Answer: morespecific CGs in DB

CG Creator/Translator with Type HierarchyEnglish Translator,Source reference,& GUI

English Discourse English Queries

English-CG-English Translation

Santa Cruz:The CG Mars Lander

Page 3: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDER

replies

queries

CGs TH

document

English

Page 4: Graph Databases: Efficient storage    and Rapid retrieval

SUBGRAPH-ISOMORPHISMSUBGRAPH-ISOMORPHISM

NP-COMPLETE NP-COMPLETE 2 Main Methods:2 Main Methods: A. Backtracking SearchA. Backtracking Search B. Refinement O(n^2) on avg. B. Refinement O(n^2) on avg. (both exploit candidate binding lists, (both exploit candidate binding lists,

modulo type hierarchy)modulo type hierarchy) Key Idea: Amortize Cost OverKey Idea: Amortize Cost Over

» Millions of OperationsMillions of Operations» Mega-graph storageMega-graph storage

Page 5: Graph Databases: Efficient storage    and Rapid retrieval

Exploit Symmetry !! Exploit Symmetry !!

““Invariant with respect to Invariant with respect to transformation.”transformation.”

““Shared information between objectsShared information between objects

or systems or their representations.”or systems or their representations.”

AB+AC = A(B+C). AB+AC = A(B+C).

Page 6: Graph Databases: Efficient storage    and Rapid retrieval

Symmetry SynonymsSymmetry Synonyms

similaritysimilarity commonalitycommonality structurestructure mutual informationmutual information relationshiprelationship redundancyredundancy

Page 7: Graph Databases: Efficient storage    and Rapid retrieval

Total Information = Total Information = Diversity + SymmetryDiversity + Symmetry

Diversity corresponds to Comp Sci Diversity corresponds to Comp Sci “Complexity” = resources “Complexity” = resources required.required.

Diversity can often only be Diversity can often only be resolved with Combinatorial resolved with Combinatorial Search Search

Page 8: Graph Databases: Efficient storage    and Rapid retrieval

Conceptual Graph Conceptual Graph ProcessingProcessing

Concept Types “a cat is an animal “Concept Types “a cat is an animal “ Relation Types or Graph Type Relation Types or Graph Type

“mother-of” Is “parent- “mother-of” Is “parent-of”of”

Transitivity of Projection (subgraph-Transitivity of Projection (subgraph-isomorphism]isomorphism]

Redundant SubstructuresRedundant Substructures Redundant LiteralsRedundant Literals Redundant PointersRedundant Pointers

Page 9: Graph Databases: Efficient storage    and Rapid retrieval

6 Retrieval Methods: 6 Retrieval Methods:

Method I: Flat OrderingMethod I: Flat Ordering Method II: 2-Levels: Indexes, GraphsMethod II: 2-Levels: Indexes, Graphs Method III: Full Partial Order Method III: Full Partial Order

HierarchyHierarchy Method IV: Multi-Level Hierarchical Method IV: Multi-Level Hierarchical

RetrievalRetrieval Method V: Remember Node BindingsMethod V: Remember Node Bindings Method VI: UDS: The Universal Data Method VI: UDS: The Universal Data

Structure Structure

Page 10: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDERExploit Tuple-Based Exploit Tuple-Based Linear CGs ! Linear CGs !

(a conceptual graph (a conceptual graph syntaxsyntax

that supports rapid that supports rapid retirieval and question-retirieval and question-answering).answering).

Page 11: Graph Databases: Efficient storage    and Rapid retrieval

@CG000: {@CG000: {

AGNT (government, BE) }.AGNT (government, BE) }. @CG001: {@CG001: { AGNT AGNT

(Hungarian_American_Enterprise_Fund, invest),(Hungarian_American_Enterprise_Fund, invest), OBJ (invest, Dollars | 1000000 ),OBJ (invest, Dollars | 1000000 ), IN (Dollars | 1000000, IN (Dollars | 1000000,

first_business)first_business) }.}. @CG002 : {@CG002 : { AGNT (AGNT (@CG000@CG000, manage),, manage), OBJ (manage, OBJ (manage, @CG001@CG001) }.) }.

Page 12: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDER

A query:A query:/* Q2: Does anybody own the /* Q2: Does anybody own the ragrag

newspapernewspaper New York Post ? */New York Post ? */

Query::@bob_202 : {Query::@bob_202 : { ISA ( New_York_Post , newspaper ISA ( New_York_Post , newspaper

[ n34861 ] ) ,[ n34861 ] ) , CHRC ( newspaper [ n34861 ] , CHRC ( newspaper [ n34861 ] , ragrag

[ n9 ] ) ,[ n9 ] ) , AGNT ( own [ v9125 ] , ????? ) ,AGNT ( own [ v9125 ] , ????? ) ,}.}.

Page 13: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDERAnswer: Answer: /* A2: Rupert Murdoch once owned the troubled /* A2: Rupert Murdoch once owned the troubled

tabloidtabloid newspaper newspaper New York Post. */New York Post. */@CG1684_3 : {@CG1684_3 : { ISA ( New_York_Post , newspaper [ n34861 ] ) ,ISA ( New_York_Post , newspaper [ n34861 ] ) , CHRC ( newspaper [ n34861 ] , CHRC ( newspaper [ n34861 ] , tabloidtabloid

[ n27111 ] ) ,[ n27111 ] ) , CHRC ( newspaper [ n34861 ] , trouble CHRC ( newspaper [ n34861 ] , trouble

[ n25320 ] ) ,[ n25320 ] ) , AGNT ( own [ v9125 ] , Rupert Murdoch) ,AGNT ( own [ v9125 ] , Rupert Murdoch) , CHRC ( own [ v9125 ] , once )CHRC ( own [ v9125 ] , once )}.}.

Page 14: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDERCapabilitiesCapabilities & timings & timings:: Inputs:Inputs:

– CGs (tens of thousands) CGs (tens of thousands) – pre-processed parts of speech pre-processed parts of speech – Type Hierarchy (150,000 WORDNET Type Hierarchy (150,000 WORDNET

augmented English words) augmented English words) – natural language queriesnatural language queries

Outputs:Outputs:– CG (save & restore) DBCG (save & restore) DB– replies to queriesreplies to queries– specializations and maximal specializations and maximal

specializationsspecializations

Page 15: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDER

Capabilities & Capabilities & timingstimings::– benchmark machine:benchmark machine:

– Sun Ultra Enterprise 4000 (with 4 UltraSPARC 167Mhz Sun Ultra Enterprise 4000 (with 4 UltraSPARC 167Mhz and 512KB External Cache CPU and 256MB of main and 512KB External Cache CPU and 256MB of main memory)memory)

Read, process, and store an 18,000 CG input file in Read, process, and store an 18,000 CG input file in 1 hour 1 hour and 46 minutesand 46 minutes. .

Reloading of above DB takes on the order of Reloading of above DB takes on the order of secondsseconds. . A 150,000 word ontology is processed in A 150,000 word ontology is processed in 16 seconds16 seconds. . Each query is handled in at most Each query is handled in at most 5.5 seconds5.5 seconds.. For smaller database (hundreds of CGs only), the time For smaller database (hundreds of CGs only), the time

to handle a single query can be as low as to handle a single query can be as low as 0.2 seconds0.2 seconds. .

Page 16: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDER

Cost/benefit analysis:Cost/benefit analysis: assume N CGs and Q queriesassume N CGs and Q queries

Method I Cost: Method I Cost:

Method III Cost:Method III Cost:• N insertionsN insertions

• Q queriesQ queries

N Q

N log102 N

2Q log10

2N

+

Page 17: Graph Databases: Efficient storage    and Rapid retrieval

Cost/ Cost/ benefit benefit tabletableN

1010

1010

1010

100100

100100

100100

1,0001,000

1,0001,000

1,0001,000

1,0001,000

10,00010,000

10,00010,000

Q

11

1010

100100

11

1010

100100

11

1010

100100

1,0001,000

1,0001,000

10,00010,000

Method I

Cost

1010

100100

1,0001,000

100100

1,0001,000

10,00010,000

1,0001,000

10,00010,000

100,000100,000

1,000,0001,000,000

10,000,0010,000,0000

100,000,0100,000,00000

Method III

Cost

5.05.0

14.914.9

104.8104.8

296.6296.6

328.6328.6

688.6688.6

7,293.47,293.4

7,374.47,374.4

8,184.48,184.4

16,284.416,284.4

152,823.8152,823.8

296,823.8296,823.8

Page 18: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDER

6 UDS DESIGN PRINCIPLES:6 UDS DESIGN PRINCIPLES:1. 1. Every primitive data object, label Every primitive data object, label

or symbol should be stored only or symbol should be stored only once with pointers used to denote once with pointers used to denote the actual uses of the object.the actual uses of the object.

2.2. Every compound object should be Every compound object should be stored with the minimum stored with the minimum information required to represent information required to represent the combination of its parts.the combination of its parts.

Page 19: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDER3. 3. Given no loss of accuracy, objects Given no loss of accuracy, objects

should be processed at the highest should be processed at the highest level of abstraction possible.level of abstraction possible.

4.4. If one were to implement a If one were to implement a conceptual graph based on the conceptual graph based on the diagrammatic representation, the diagrammatic representation, the costs associated with storage and costs associated with storage and matching would be much higher matching would be much higher than they need to be.than they need to be.

Page 20: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDER

5.5. The same abstraction The same abstraction mechanism that goes from labels mechanism that goes from labels to graphs can be taken one step to graphs can be taken one step further to facilitate the storage further to facilitate the storage and retrieval of nested context and retrieval of nested context graphs.graphs.

6. 6. A graph is itself the best A graph is itself the best descriptor of its nodes.descriptor of its nodes.

Page 21: Graph Databases: Efficient storage    and Rapid retrieval

CONCLUDING THOUGHTSCONCLUDING THOUGHTS

The key to efficient implementation The key to efficient implementation of CGs is the exploitation of of CGs is the exploitation of symmetry or structure. symmetry or structure.

CG operations can be executed CG operations can be executed efficiently in real-time applications. efficiently in real-time applications.

At the implementation or machine At the implementation or machine level knowledge representation level knowledge representation formalisms sre often nearly the formalisms sre often nearly the same. same.

Page 22: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDERReferencesReferences

[1][1] C. Colin and R. Levinson, `` C. Colin and R. Levinson, ``Partial order Partial order maintenancemaintenance,'' Special Interest Group on ,'' Special Interest Group on Information Retrieval Forum, vol. 23, no. 3,4, Information Retrieval Forum, vol. 23, no. 3,4, pp. 34-59, 1988. pp. 34-59, 1988.

[2][2] G. Ellis, R. A. Levinson, and P. Robinson, G. Ellis, R. A. Levinson, and P. Robinson, ````Managing complex objects in PEIRCEManaging complex objects in PEIRCE,'' Special ,'' Special Issue on Object-Oriented Approaches in Artificial Issue on Object-Oriented Approaches in Artificial Intelligence and Human-Computer Interaction Intelligence and Human-Computer Interaction (IJMMS), vol. 41, pp. 109-148, 1994. (IJMMS), vol. 41, pp. 109-148, 1994.

[3][3] R. Hughey, R. Levinson, and J. D. Roberts, eds., R. Hughey, R. Levinson, and J. D. Roberts, eds., Issues in Parallel Hardware for Graph RetrievalIssues in Parallel Hardware for Graph Retrieval, , 1993. 1993.

Page 23: Graph Databases: Efficient storage    and Rapid retrieval

More references…More references…

[4]R. Levinson, ``[4]R. Levinson, ``A self-organizing retrieval system A self-organizing retrieval system for graphsfor graphs,'' in AAAI-84, pp. 203-206, Morgan ,'' in AAAI-84, pp. 203-206, Morgan Kaufman, 1984.Kaufman, 1984.

[5][5] R. Levinson, `` R. Levinson, ``Pattern associativity and the Pattern associativity and the retrieval of semantic networksretrieval of semantic networks,'' Computers and ,'' Computers and Mathematics with Applications, vol. 23, no. 6-9, Mathematics with Applications, vol. 23, no. 6-9, pp. 573-600, 1992. Part 2 of Special Issue on pp. 573-600, 1992. Part 2 of Special Issue on Semantic Networks in Artificial Intelligence, Fritz Semantic Networks in Artificial Intelligence, Fritz Lehmann, editor. Also reprinted on pages 573-600 Lehmann, editor. Also reprinted on pages 573-600 of the book, Semantic Networks in Artificial of the book, Semantic Networks in Artificial Intelligence, Fritz Lehmann, editor, Pergammon Intelligence, Fritz Lehmann, editor, Pergammon Press, 1992.Press, 1992.

Page 24: Graph Databases: Efficient storage    and Rapid retrieval

THE CG MARS LANDERTHE CG MARS LANDER ReferencesReferences[6][6] R. Levinson and G. Ellis, `` R. Levinson and G. Ellis, ``Multilevel hierarchical Multilevel hierarchical

retrievalretrieval,'' Knowledge-Based Systems, vol. 5, ,'' Knowledge-Based Systems, vol. 5, pp. 233-244, September 1992. Special Issue on pp. 233-244, September 1992. Special Issue on Conceptual Graphs. Conceptual Graphs.

[7][7] R. Levinson and G. Fuchs, `` R. Levinson and G. Fuchs, ``A pattern-weight A pattern-weight formulation of search knowledgeformulation of search knowledge,'' Tech. Rep. UCSC-,'' Tech. Rep. UCSC-CRL-91-15, University of California Santa Cruz, 2001. CRL-91-15, University of California Santa Cruz, 2001. Revision to appear in Computational Intelligence. Revision to appear in Computational Intelligence.

[8][8] R. A. Levinson, `` R. A. Levinson, ``UDS: A universal data structureUDS: A universal data structure,'' ,'' in Proc. 2nd International Conference on Conceptual in Proc. 2nd International Conference on Conceptual Structures, (College Park, Maryland USA), pp. 230-Structures, (College Park, Maryland USA), pp. 230-250, 1991. 250, 1991.