View
214
Download
0
Category
Tags:
Preview:
Citation preview
Structured Querying of Web Structured Querying of Web TextText
A Technical ChallengeA Technical Challenge
Kulsawasd JitkajornwanichKulsawasd Jitkajornwanich
University of Texas at ArlingtonUniversity of Texas at Arlington
kulsawasdj@hotmail.comkulsawasdj@hotmail.com
CSE6339 Web Mining | April 16, 2009 | 9:30 amCSE6339 Web Mining | April 16, 2009 | 9:30 am
by Cafarella, Re’, Suciu, Etzioni & Banko
IntroductionIntroduction
What is What is structured-querystructured-query?? 2 types of query: Structured-query & 2 types of query: Structured-query &
Unstructured-queryUnstructured-query 1. Structured-query1. Structured-query
Has “condition” in the query Has “condition” in the query Can make a complicated queryCan make a complicated query ex. ex. ““SQL query”SQL query”
List employee whose name start with ‘David’ and salary List employee whose name start with ‘David’ and salary > 5000> 5000
SELECTSELECT E.name, E.salary E.name, E.salary FROM FROM Employee EEmployee E WHERE WHERE E.name LIKE ‘David’, E. salary > 5000E.name LIKE ‘David’, E. salary > 5000
22
IntroductionIntroduction
What is What is structured-querystructured-query?? 2. Unstructured-query2. Unstructured-query
ex. ex. ““Keyword Search”Keyword Search” no “condition” in the queryno “condition” in the query simply do “string matching”simply do “string matching”
33
IntroductionIntroduction
--> we just talked about type of query --> we just talked about type of query <--<--
What about type of data?What about type of data? 2 types of data:2 types of data:
1. Structured-data1. Structured-data ex. Relational tablesex. Relational tables
2. Unstructured-data2. Unstructured-data ex. Web documentsex. Web documents
44
IntroductionIntroduction Objective of the paper:Objective of the paper:
To propose a tool called To propose a tool called ExDBExDB to make a to make a structured-structured-queryquery on web documents on web documents (unstructured-data)(unstructured-data)
55
RelationalDatabase
Web Text
SQL Query
SQL Query
ExDB
Unstructured-query(Keyword Search)
Structured-query(Complicated query
like SQL-query)
Search Engine
Structured-data
Unstructured-data
How it works: How it works: Big Picture ofBig Picture of ExDBExDB
66
Collection of web documents
ExDB Extractor
Fact Table
Type Table
Constraint TableUser
ExDB Complier
q(?x,?y):- invented(?x,?y)
RDBMS Database
Resulting Table
How it works: How it works: Big Picture ofBig Picture of ExDBExDB
77
Collection of web documents
ExDB Extractor
Fact Table
Type Table
Constraint TableUser
ExDB Complier
q(?x,?y):- invented(?x,?y)
RDBMS Database
Resulting Table
OutlineOutline
11stst Component Component: : ExDB ExtractorExDB Extractor What/How does it do in more detail?What/How does it do in more detail?
22ndnd Component: Component: ExDB CompilerExDB Compiler What/How does it do in more detail?What/How does it do in more detail?
Test your understanding!!Test your understanding!! Working on tasksWorking on tasks Compare result Compare result ExDBExDB & & GoogleGoogle ConclusionConclusion
88
How How ExDBExDB WorksWorks
11stst Component: Component: ExDB ExtractorExDB Extractor What does it do?What does it do?
To To extract dataextract data from the from the web documentsweb documents & & put itput it into the into the tablestables
99
How How ExDBExDB WorksWorks
22ndnd Component: Component: ExDB CompilerExDB Compiler What does it do?What does it do?
To To processprocess the user’s the user’s structured-query structured-query on on the tables from 1the tables from 1stst component ( component (ExDB ExDB ExtractorExtractor) and give the ) and give the resulting tableresulting table back to userback to user
ex. ex. q(?x, ?y):- invented(?x, ?y)q(?x, ?y):- invented(?x, ?y) <we will study this query syntax later on><we will study this query syntax later on>
1010
How it works: How it works: Big Picture ofBig Picture of ExDBExDB
1111
RDBMS Database
Collection of web documents
…was surprising. In
1877, Edison
invented the light bulb. Although he
…
ExDB Extractor
Fact Table
Type Table
Constraint Table User: Make a Make a query query using using ExDBExDB syntaxsyntax
ExDB Complier
11stst Component: Component:
ExDB ExDB ExtractorExtractor
22ndnd Component: Component:
ExDB ExDB CompilerCompiler
11stst Component: Component: ExDB ExtractorExDB Extractor
What does it do?What does it do? To To extract dataextract data from the from the web documentsweb documents & & put put
itit into the into the tablestables There are 3 tables:There are 3 tables:
1. Fact Table1. Fact Table 2. Type Table2. Type Table 3. Constraint Table3. Constraint Table
Additional column: stores tuple probabilityAdditional column: stores tuple probability Discussion:Discussion: Why do need this column?Why do need this column?
0<p<1, 0<p<1, p pi i = 1= 1 One way to assign probability: Counting occurrence One way to assign probability: Counting occurrence
frequencyfrequency Assume Assume IndependenceIndependence among tuples among tuples
1212
1.1 Fact Table1.1 Fact Table Stores Stores fact informationfact information
ex. “Edison invented light bulb” ex. “Edison invented light bulb” Uses Uses TextRunnerTextRunner to extract to extract How is it look like?How is it look like?
1313
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbabilityProbability
inventedinvented EdisonEdison Light bulbLight bulb 0.750.75
died-indied-in EdisonEdison 18771877 0.550.55
…… …… …… ……
Fact TableFact Table
Probability= no of occurrence / no of predicate occurrences
11stst Component: Component: ExDB ExtractorExDB Extractor
Example1: shows how to get Example1: shows how to get Fact tableFact table
1414
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbabilityProbability
InventedInvented EdisonEdison Light bulbLight bulb 0.750.75
InventedInvented EdisonEdison PhonographPhonograph 0.250.25
…… …… …… ……
Fact TableFact Table
…was surprising. In
1877, Edison
invented the light bulb. Although he
…
It was a big news when
Edison invented the light bulb.
…
…We all know that Edison invented light bulb.
…not only that Edison also invented the phonograph
.
Probability = no of occurrencesno of predicate
occurrences
Object
Predicate
TextRunnerTextRunnerTextRunnerTextRunner
Discussion:Discussion: What do you think might be a problem with this What do you think might be a problem with this
design of fact table?design of fact table? Cannot support Ternary-predicate --> ex. David donates books to Child Organization.
1515
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbabilityProbability
inventedinvented EdisonEdison Light bulbLight bulb 0.750.75
died-indied-in EdisonEdison 18771877 0.550.55
…… …… …… ……
Fact TableFact Table
11stst Component: Component: ExDB ExtractorExDB Extractor
1.2 Type Table1.2 Type Table Stores Stores object type informationobject type information
ex. Edison is a scientist.ex. Edison is a scientist. Uses Uses KnowItAllKnowItAll to extract to extract How is it look like?How is it look like?
1616
TypeType ObjectObject ProbabilityProbability
ScientistScientist EdisonEdison 0.730.73
CityCity BostonBoston 0.360.36
…… …… ……
Type TableType Table
Probability= no of occurrence / no of type occurences
11stst Component: Component: ExDB ExtractorExDB Extractor
1717
TypeType ObjectObject ProbabilityProbability
scientistscientist EdisonEdison 0.750.75
ScientistScientist BenjaminBenjamin 0.250.25
…… …… ……
Type TableType Table
…As we know,
Edison is a scientist.
Although he …
… there are many world-
famous scientists
such as Edison,
…
…However, someone claim that
Benjamin is also an
scientist.
…scientists
such as Edison, …
Probability = no of occurrencesno of type
occurrences
Object Type
KnowItAllKnowItAll
Example2: shows how to get Example2: shows how to get Type tableType table
1.3 Constraint Table1.3 Constraint Table Stores Stores constraint information constraint information of objects or of objects or
predicatespredicates There are 2 types of constraints discussed in There are 2 types of constraints discussed in
this paper: Synonym and Inclusion Dependencythis paper: Synonym and Inclusion Dependency Uses Uses DIRTDIRT to extract to extract
1. Synonym1. Synonym example for predicate: did-invented = example for predicate: did-invented = inventedinvented example for object: Edison T. =example for object: Edison T. = Edison Edison
2. Inclusion Dependency2. Inclusion Dependency example for predicate: be-guardian example for predicate: be-guardian be-parentbe-parent example for object: relative example for object: relative sistersister
1919
11stst Component: Component: ExDB ExtractorExDB Extractor
example shows howexample shows how DIRT DIRT worksworks
for for SynonymSynonym constraint constraint
1111
…was surprising. In
1877, Edison
invented the light bulb. Although he
…Collection of web documents
DIRT
Thomas E.
Edison T.Thomas Edison
Thomas Edison
example shows howexample shows how DIRT DIRT worksworks
for for Inclusion DependencyInclusion Dependency constraintconstraint
1111
…was surprising. In
1877, Edison
invented the light bulb. Although he
…Collection of web documents
DIRT
Be-parent
Be-guardian Be-babysitter
1.3 Constraint Table1.3 Constraint Table How is it look like?How is it look like?
2020
ConstraintConstraint Object 1Object 1 Object 2Object 2 ProbabilityProbability
SynonymSynonym EdisonEdison T. EdisonT. Edison 0.750.75
Inclusion Inclusion DependencyDependency
Be-parentBe-parent Be-guardianBe-guardian 0.550.55
…… …… …… ……
Constraints TableConstraints Table
11stst Component: Component: ExDB ExtractorExDB Extractor
Superset
Subset
Key point summary of 1Key point summary of 1stst component: component: (ExDB Extractor)(ExDB Extractor) 1. ExDB Extractor uses different kinds 1. ExDB Extractor uses different kinds of existing extractor: of existing extractor: TextRunnerTextRunner, , KnowItAllKnowItAll and and DIRTDIRT..
2. 2. Probabilistic columnProbabilistic column is used to is used to indicate the indicate the degree of correctnessdegree of correctness and and deal with deal with uncertainty problemuncertainty problem..
3. Drawback of fact table, only 3. Drawback of fact table, only Binary Binary PredicatePredicate is allowed. is allowed.
2222
11stst Component: Component: ExDB ExtractorExDB Extractor
How it works: How it works: Big Picture ofBig Picture of ExDBExDB
2323
RDBMS Database
Collection of web documents
…was surprising. In
1877, Edison
invented the light bulb. Although he
…
ExDB Extractor
Fact Table
Type Table
Constraint Table User: Make a Make a query query using using ExDBExDB syntaxsyntax
ExDB Complier
11stst Component: Component:
ExDB ExDB ExtractorExtractor
22ndnd Component: Component:
ExDB ExDB CompilerCompiler
What does it do?What does it do? To To processprocess the user’s the user’s structured-query structured-query
on the tables from 1on the tables from 1stst component ( component (ExDB ExDB ExtractorExtractor))
Result will be in Result will be in tabletable format and format and ranked ranked by highest probability valueby highest probability value..
ex. ex. q(?x, ?y):- invented(?x, ?y)q(?x, ?y):- invented(?x, ?y) However, users are not expected to However, users are not expected to
know the table schema.know the table schema.
2424
22ndnd Component: Component: ExDB CompilerExDB Compiler
ExDBExDB syntax: syntax: ?x ?x = variable = variable xx w w = constant value = constant value ww q(?x,?y):- q(?x,?y):- = define resulting table = define resulting table qq consisting consisting of column of column xx and and yy
invented(?x,?y) invented(?x,?y) = return list of object x and y = return list of object x and y regarding predicate “invented”regarding predicate “invented”
invented(<scientists> ?x,?y)invented(<scientists> ?x,?y) = return list of = return list of object object xx whose type is whose type is <scientists><scientists> and and yy regarding predicate regarding predicate “invented”“invented”
This syntax is calledThis syntax is called “Datalog-like notation” “Datalog-like notation” Let’s try some examples!Let’s try some examples!
2525
22ndnd Component: Component: ExDB CompilerExDB Compiler
q(?x, ?y):-invented(?x, ?y)q(?x, ?y):-invented(?x, ?y)
2626
Make a QueryMake a Query example:example:
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbabilityProbability
inventedinvented EdisonEdison Light bulbLight bulb 0.550.55
inventedinvented EdisonEdison TelescopeTelescope 0.140.14
inventedinvented EdisonEdison PhonographPhonograph 0.140.14
inventedinvented JasonJason Cell phoneCell phone 0.140.14
died-indied-in Mary T.Mary T. 18771877 0.050.05
Fact TableFact Table
example4:example4: list all inventions invented by Edisonlist all inventions invented by Edison
answeranswer:: q(?i):- invented(Edison, ?i)q(?i):- invented(Edison, ?i)
ii ProbabilityProbability
Light bulbLight bulb 0.550.55
TelescopeTelescope 0.140.14
PhonographPhonograph 0.140.14
q Tableq Table
2727
Make a QueryMake a Query example:example:
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbProb
inventedinvented EdisonEdison Light bulbLight bulb 0.700.70
died-indied-in EdisonEdison 19551955 0.400.40
inventedinvented David A.David A. GuitarGuitar 0.300.30
died-indied-in PeterPeter 19551955 0.200.20
died-indied-in Mary T.Mary T. 18001800 0.050.05
Fact TableFact Table
example5:example5: list all scientist died in 1955list all scientist died in 1955
TypeType ObjectObject ProbProb
scientistscientist EdisonEdison 0.500.50
scientistscientist PeterPeter 0.150.15
scientistscientist Mary T.Mary T. 0.150.15
scientistscientist David A.David A. 0.100.10
citycity BostonBoston 0.050.05
Type TableType Table
answer:answer: q(?i):- died-in(<scientist> ?i, 1955) q(?i):- died-in(<scientist> ?i, 1955)
2828
Make a QueryMake a Query example:example:
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbProb
inventedinvented EdisonEdison Light bulbLight bulb 0.700.70
died-indied-in EdisonEdison 19551955 0.400.40
inventedinvented David A.David A. GuitarGuitar 0.300.30
died-indied-in PeterPeter 19551955 0.200.20
died-indied-in Mary T.Mary T. 18001800 0.050.05
Fact TableFact Table
example5:example5: list all scientist died in 1955list all scientist died in 1955
TypeType ObjectObject ProbProb
scientistscientist EdisonEdison 0.500.50
scientistscientist PeterPeter 0.150.15
scientistscientist Mary T.Mary T. 0.150.15
scientistscientist David A.David A. 0.100.10
citycity BostonBoston 0.050.05
Type TableType Table
TypeType ObjectObject PredicatePredicate ObjectObject ProbProb
scientistscientist EdisonEdison died-indied-in 19551955 0.200.20
scientistscientist PeterPeter died-indied-in 19551955 0.030.03
Joining TableJoining Table 0.20 = 0.50 x 0.40 because we
assume independence among tuples;
i.e,P(t1, t2)=P(t1) *
P(t2)
?i?i ProbProb
EdisonEdison 0.200.20
PeterPeter 0.030.03
q Tableq Table
answer:answer: q(?i):- died-in(<scientist> ?i, 1955) q(?i):- died-in(<scientist> ?i, 1955)
2929
Make a QueryMake a Query example:example:
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbProb
inventedinvented EdisonEdison Light bulbLight bulb 0.700.70
died-indied-in EdisonEdison 19551955 0.400.40
inventedinvented David A.David A. GuitarGuitar 0.300.30
died-indied-in PeterPeter 19551955 0.200.20
died-indied-in Mary T.Mary T. 18001800 0.050.05
Fact TableFact Table
example6:example6: list all scientist who died after 1900, their list all scientist who died after 1900, their
inventions and year they diedinventions and year they died
TypeType ObjectObject ProbProb
scientistscientist EdisonEdison 0.500.50
scientistscientist PeterPeter 0.150.15
scientistscientist Mary T.Mary T. 0.150.15
scientistscientist David A.David A. 0.100.10
citycity BostonBoston 0.050.05
Type TableType Table
answeranswer:: q(?x, ?y, ?z):- invented(?x, ?y),q(?x, ?y, ?z):- invented(?x, ?y),
died-in(<scientist> ?x, ?z),died-in(<scientist> ?x, ?z), (z > 1900)(z > 1900)
3030
Make a QueryMake a Query example:example:
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbProb
inventedinvented EdisonEdison Light bulbLight bulb 0.700.70
died-indied-in EdisonEdison 19551955 0.400.40
inventedinvented David A.David A. GuitarGuitar 0.300.30
died-indied-in PeterPeter 19551955 0.200.20
died-indied-in Mary T.Mary T. 18001800 0.050.05
Fact TableFact Table
example6:example6: list all scientist who died after 1900, their list all scientist who died after 1900, their
inventions and year they diedinventions and year they died
TypeType ObjectObject ProbProb
scientistscientist EdisonEdison 0.500.50
scientistscientist PeterPeter 0.150.15
scientistscientist Mary T.Mary T. 0.150.15
scientistscientist David A.David A. 0.100.10
citycity BostonBoston 0.050.05
Type TableType Table
TypeType PredicatePredicate PredicatePredicate ObjectObject ObjectObject ObjectObject ProbProb
scientistscientist died-indied-in inventedinvented EdisonEdison 19551955 light light bulbbulb
0.140.14
Joining TableJoining Table
0.14 = 0.50 x 0.40 x 0.70
?x?x ?y?y ?z?z ProbProb
EdisonEdison Light Light bulbbulb
19551955 0.140.14
q Tableq Table
3131
Test Your Understanding!Test Your Understanding!
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbProb
inventedinvented EdisonEdison Light bulbLight bulb 0.700.70
playplay JohnJohn GuitarGuitar 0.400.40
inventedinvented David A.David A. GuitarGuitar 0.300.30
PlayPlay JacksonJackson PianoPiano 0.200.20
playplay JacksonJackson GuitarGuitar 0.050.05
Born-inBorn-in JohnJohn 19901990 0.050.05
Born-inBorn-in JacksonJackson 19801980 0.050.05
Born-inBorn-in BobbyBobby 19801980 0.050.05
Fact TableFact Table
Problem1:Problem1: list all singer who born in 1980, their instrumentslist all singer who born in 1980, their instruments
TypeType ObjectObject ProbProb
SingerSinger JohnJohn 0.500.50
instrumentinstrument GuitarGuitar 0.150.15
instrumentinstrument pianopiano 0.150.15
SingerSinger JacksonJackson 0.100.10
SingerSinger BobbyBobby 0.050.05
Type TableType Table
answeranswer:: q(?x, ?y):- q(?x, ?y):- play(<singer> ?x,<instrument> ?y),play(<singer> ?x,<instrument> ?y),
born-in(<singer> ?x, 1980)born-in(<singer> ?x, 1980)
3232
Test Your Understanding!Test Your Understanding!
PredicatePredicate Object 1Object 1 Object 2Object 2 ProbProb
Being-producerBeing-producer MattMatt BobbyBobby 0.700.70
Being-producerBeing-producer MattMatt JacksonJackson 0.400.40
Has-incomeHas-income BobbyBobby 25002500 0.300.30
Has-incomeHas-income JacksonJackson 30003000 0.200.20
Has-incomeHas-income MattMatt 20002000 0.050.05
Being-producerBeing-producer MattMatt JohnJohn 0.050.05
Has-incomeHas-income JohnJohn 10001000 0.050.05
Fact TableFact Table
Problem2:Problem2: list all singer who has income more than their list all singer who has income more than their
producerproducer
TypeType ObjectObject ProbProb
SingerSinger JohnJohn 0.500.50
ProducerProducer MattMatt 0.150.15
ProducerProducer DavidDavid 0.150.15
SingerSinger JacksonJackson 0.100.10
SingerSinger BobbyBobby 0.050.05
Type TableType Table
answeranswer:: q(?x):- has-income(<singer> ?x, ?y),q(?x):- has-income(<singer> ?x, ?y),
has-income(<producer> ?m, ?n),has-income(<producer> ?m, ?n), being-producer(?m, ?x),being-producer(?m, ?x),
(?y > ?n)(?y > ?n)
2626
Make a QueryMake a Query example:example:
PredicatePredicate Obj.1Obj.1 Obj.2Obj.2 ProbProb
inventedinvented EdisonEdison Light bulbLight bulb 0.550.55
inventedinvented EdisonEdison TelescopeTelescope 0.140.14
inventedinvented EdisonEdison PhonographPhonograph 0.140.14
inventedinvented JasonJason Cell phoneCell phone 0.140.14
died-indied-in MaryMary 18771877 0.050.05
Fact TableFact Table
example7:example7: list all inventions discovered by Edisonlist all inventions discovered by Edison
answeranswer:: q(?i):- discovered(Edison, ?i)q(?i):- discovered(Edison, ?i)
ii ProbabilityProbability
Light bulbLight bulb 0.55 x 0.50.55 x 0.5
TelescopeTelescope 0.14 x 0.50.14 x 0.5
PhonographPhonograph 0.14 x 0.50.14 x 0.5
q Tableq Table
ConstConst Obj.1Obj.1 Obj.2Obj.2 ProbProb
IDID InventedInvented discovereddiscovered 0.500.50
SynSyn EdisonEdison Edison TEdison T 0.150.15
SynSyn EdisonEdison Thomas EThomas E 0.100.10
Constraint TableConstraint Table
DiscussionDiscussion:: In this case, What can we do In this case, What can we do
to answer this query?to answer this query?
22
Make a QueryMake a Query Problem ScenarioProblem Scenario
3333
example8: (this example involves PROJECTION)example8: (this example involves PROJECTION) list all name who invented somethinglist all name who invented something
?x?x ?y?y ProbProb
EdisonEdison light bulblight bulb 0.340.34
EdisonEdison telescopetelescope 0.130.13
EdisonEdison PhonographPhonograph 0.130.13
TreeTree TableTable 0.090.09
Tree Tree Pen Pen 0.090.09
TreeTree PaperPaper 0.090.09
TreeTree FruitFruit 0.090.09
TreeTree ForestForest 0.090.09
TreeTree EraserEraser 0.090.09
TreeTree rulerruler 0.090.09
Joining TableJoining Table
answeranswer:: q(?x):- invented(?x, ?y)q(?x):- invented(?x, ?y)
?x?x ProbProb
TreeTree 0.630.63
EdisonEdison 0.600.60
q Tableq Table 0.63 = 0.09 x 7
Discussion:
•Can you see something wrong in the resulting table?
Problem scenario caused by projection Problem scenario caused by projection operation.operation.
Conventional Way: Conventional Way: newProb = newProb = duplicateProb duplicateProbii
New Way: New Way: usingusing “Panel of Expert” “Panel of Expert” techniquetechnique principle:principle:
1.define number n of duplicate output 1.define number n of duplicate output ex. n=5 (meaning that if in total, there are 10 ex. n=5 (meaning that if in total, there are 10 duplicate output, we will consider only 5 and duplicate output, we will consider only 5 and eliminate other 5) to eliminate low quality eliminate other 5) to eliminate low quality output.output.
2.newProb = calculate by selecting the max value 2.newProb = calculate by selecting the max value among those n duplicate output.among those n duplicate output.
newProb = max {duplicateProbnewProb = max {duplicateProbii}; i}; in n
3434
Solving Problem Scenario by usingSolving Problem Scenario by using ‘‘Panel of ExpertPanel of Expert’’
techniquetechnique
22
Make a QueryMake a Query Problem Scenario:Problem Scenario:
3535
example8: (problem caused by projection operation)example8: (problem caused by projection operation) list all name who invented somethinglist all name who invented something
?x?x ?y?y ProbProb
EdisonEdison light bulblight bulb 0.340.34
EdisonEdison telescopetelescope 0.130.13
EdisonEdison PhonographPhonograph 0.130.13
TreeTree WrongInfo1WrongInfo1 0.090.09
TreeTree WrongInfo2WrongInfo2 0.090.09
TreeTree WrongInfo3WrongInfo3 0.090.09
TreeTree WrongInfo4WrongInfo4 0.090.09
TreeTree WrongInfo5WrongInfo5 0.090.09
TreeTree WrongInfo6WrongInfo6 0.090.09
TreeTree WrongInfo7WrongInfo7 0.090.09
Joining TableJoining Table
answeranswer:: q(?x):- invented(?x, ?y),q(?x):- invented(?x, ?y),
?x?x ProbProb
EdisonEdison 0.340.34
TreeTree 0.090.09
q Tableq Table
0.63 = 0.09 x 7
Solved by“Panel of Expert”
technique
?x?x ProbProb
TreeTree 0.630.63
EdisonEdison 0.600.60
q Tableq Table
Key points summary of 2Key points summary of 2ndnd Component: Component: (ExDB Compiler)(ExDB Compiler)1.1. ExDBExDB has its own syntax. has its own syntax.2.2. Result will be in table format.Result will be in table format.3.3. Last column is probability value Last column is probability value ranked by ranked by
decreasingdecreasing order of probability value. The order of probability value. The assumption is that the assumption is that the higher probabilityhigher probability, the , the more more accurateaccurate..
4.4. Can implement top K to reduce time complexity Can implement top K to reduce time complexity (increase performance).(increase performance).
5.5. In case of In case of JOIN JOIN table, the resulting probability table, the resulting probability the product of 2 joining tablethe product of 2 joining table
6.6. In case of In case of PROJECTIONPROJECTION, use , use Panel of ExpertPanel of Expert to solve to solve the problem.the problem.
7.7. In case that user’s query contains In case that user’s query contains relationrelation which which does not exist in the Fact Tabledoes not exist in the Fact Table, we can use , we can use Constraint TableConstraint Table to answer such a query. to answer such a query.
3636
22ndnd Component: Component: ExDB CompilerExDB Compiler
Working On Task#1Working On Task#1
Synthetic TableSynthetic Table an additional feature to combine the an additional feature to combine the
result query q togetherresult query q together example:example:
3737
Synthetic Table generated by MERGING answers fromdied-in(?x,?y),invented(?x,?y),published(?x,?y),taught(?x,?y)
Working On Task#2Working On Task#2
Implementing with Google Search EngineImplementing with Google Search Engine
3838
list all scientist, their inventions,who died before 1955
Search Textbox
GO
q(?x, ?y):- invented(<scientists>?x, ?y), died-in(?x, ?z), (?z < 1955)
Compare result Compare result ExDBExDB && GoogleGoogle Test query:Test query: list all scientists who create list all scientists who create somethingsomething
3939
Output from ExDB
Output from GoogleComments:Comments:
ExDB performs much better than Google.ExDB performs much better than Google. For Google result, after investigating all the link, only For Google result, after investigating all the link, only
1 document comes close to the answer.1 document comes close to the answer. For ExDB, although they have some redundancy, answer is For ExDB, although they have some redundancy, answer is
still better.still better.
ConclusionConclusion
Only Binary Predicate is allowed.Only Binary Predicate is allowed.
Result will be in table format (different from Google search Result will be in table format (different from Google search engine).engine).
How How ExDBExDB get answer makes more sense since they get answer makes more sense since they integrate integrate all data together before we make a query on themall data together before we make a query on them..
Extractor has to run beforehand before allowing user to make Extractor has to run beforehand before allowing user to make a query.a query.
IE involved in this paper are TextRunner, KnowItAll, DIRT.IE involved in this paper are TextRunner, KnowItAll, DIRT.
User is not expected to know the schema of the table, instead, User is not expected to know the schema of the table, instead, system itself will try to match as much as they can to answer system itself will try to match as much as they can to answer the query (using synonym, inclusion independency). the query (using synonym, inclusion independency).
4040
Recommended