Upload
peyton-reede
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
The Palm-tree Index Indexing with the crowdAhmed R Mahmood* Walid G. Aref*
Eduard Dragut* Saleh Basalamah**
*Purdue University **Umm AlQura University
Outline
• Motivation• Taxonomy for Crowd-based Indexing• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work
Outline
• Motivation• Taxonomy for Crowd-based Indexing• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work
Outline
• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work
Problem Definition
• Let S be a set of N keys (e.g., images or videos) and q be a query
• B+-tree-like index is constructed over S • Study how to use human workers to search the index• Workers perform subjective comparisons between the
query image and tree keys, and make subjective decisions, e.g., – Less than, greater than, almost the same– Better, worse, almost the same– Cheaper, more expensive, almost the same
Outline
• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work
Index StructureWhy B+-tree?
What is tree order and height?How to construct tree?What are performance metrics?
Index Structure
• Why B+-tree?– To obtain predictive query cost– Cost reduction with more keys per node
• How is the tree order and height determined?– Set by the ability of workers to process at once a specific
number of keys
Index height
Fixed order
Erro
r
Index order
Fixed height
Erro
r
Order increaseHeight decrease
Fixed dataset size
Erro
r
Index Construction: How to grow a palm tree?
• Key associated with some “Quantitative Value”– Keys have a subjective property and an associated
quantitative value– Index constructed based on the quantitative value– Example: Damaged car images with repair cost
• Key car image• Subjective property car damage• Qualitative value repair cost
500 1000 200 100 100 200 500 1000
Index Construction: How to grow a palm tree? (Cont’d)
• Key associated with some “Qualitative Property”
• Keys have a subjective property only• Index constructed by successive insertions • e.g. images of butterflies to be ordered based on
beauty
Performance Metrics
• What are performance metrics?– Error: Distance between ground truth and
selected result– Cost: Total number of tasks to complete a job
Cost
Erro
r
Outline
• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work
Traversal Algorithms
• How to descend the tree?– Leaf-only aggregation– All-level aggregation– All-level aggregation with backtracking
Leaf-Only Aggregation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2 4 6 8 10 12 14 16
3 7 11 15
5 13
9 w1w3w2Tasks per worker
444
• Even budget distribution – Number of workers = Budget/Tree Height
Budget: 12
All-Levels Aggregation
w1w2 w3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2 4 6 8 10 12 14 16
3 7 11 15
5 13
9
w1w2
w3
• Even budget distribution – Replication per level = Budget/Tree Height
Tasks per level
3333
Budget: 12
All-Levels Aggregation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2 4 6 8 10 12 14 16
3 7 11 15
5 13
9
• Uneven budget distribution based on – Probability of distance d error at level l: Pdl
– Expected Distance Error per level: EDE
tasks per level
63
EDE
3
1.5
1
.5
Budget: 12
21
Algorithms: Crowd-Search Backtracking All-Levels Aggregation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2 4 6 8 10 12 14 16
3 7 11 15
5 13
9Node A
Node BNode C
Node D
Outline
• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work
Preliminary Experimental ResultsExperimental Setup
• Squares dataset– Generated 200 images of squares with different sizes
• Cars dataset– 1300 image of used cars associated with desired selling
prices– Collected using a custom crawler from the Craigslist
Website• Crowd:– Students in the DB Group at Purdue (and their spouses) – (IRB Approval)
Preliminary Experimental Results
• Higher error on cars dataset
• Error increases as fanout increases
• Error decreases as number of replications increase
• All-levels aggregation has less error than leaf-only aggregation
• Mean Error while changing the tree fanout and the number of workers (replications)
Preliminary Experimental Results
• Mean Cost while changing the tree fanout and the number of workers (replications)
• The taller the tree the higher the cost
• Higher cost on the cars dataset (has more keys)
• More replications involve higher cost
Order increaseHeight decrease
Fixed dataset size
Erro
r
Outline
• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work
Conclusions and Future Work
• Conclusions– The Palm-tree allows employing humans to
perform index operations on keys that cannot be indexed by computer
• Future Work– More extensive experimental evaluation– Mathematical analysis – Multi-dimensional indexing