63
Exotic Functional Data Structures: Hitchhiker Trees David Greenberg 9/17/16 Strange Loop

Hitchhiker Trees - Strangeloop 2016

Embed Size (px)

Citation preview

Page 1: Hitchhiker Trees - Strangeloop 2016

Exotic Functional Data Structures:Hitchhiker TreesDavid Greenberg9/17/16Strange Loop

Page 2: Hitchhiker Trees - Strangeloop 2016

Who am I?

Page 3: Hitchhiker Trees - Strangeloop 2016

The Basics

FunctionalData StructuresWhat are they, anyway?

Page 4: Hitchhiker Trees - Strangeloop 2016

Functional Data Structures

Immutable7 + 1 = 8But 7 is still 7

Page 5: Hitchhiker Trees - Strangeloop 2016

Functional Data Structures

x = [1, 2, 3]

y = x

y += [4]

if x == y:

print("I'm a sad panda”)

Page 6: Hitchhiker Trees - Strangeloop 2016

How to fix this?x = [1, 2, 3]

y = x[:]

y += [4]

if x != y:

print("I'm a happy panda")

Page 7: Hitchhiker Trees - Strangeloop 2016

A List of Fruit

Page 8: Hitchhiker Trees - Strangeloop 2016

Mutation in anImmutable World

Page 9: Hitchhiker Trees - Strangeloop 2016

REFERENCES

What are pointers?(besides hard)

Page 10: Hitchhiker Trees - Strangeloop 2016

Pointers!

Page 11: Hitchhiker Trees - Strangeloop 2016

Pointers and Sharing

Page 12: Hitchhiker Trees - Strangeloop 2016

Doing Better with Pointers

Page 13: Hitchhiker Trees - Strangeloop 2016

Linked List

Page 14: Hitchhiker Trees - Strangeloop 2016

Editing the Linked List

Page 15: Hitchhiker Trees - Strangeloop 2016

Worse Case Performance

Page 16: Hitchhiker Trees - Strangeloop 2016

Philosophy of Identity

Q: When isn’t an apple an apple?A: When an apple points to an orange points to a banana isn’t an apple points to an orange points to a mango.

Page 17: Hitchhiker Trees - Strangeloop 2016

Trees

Page 18: Hitchhiker Trees - Strangeloop 2016

Binary Search Trees

Page 19: Hitchhiker Trees - Strangeloop 2016

Lookups are log2(n)

1 = 20

2 = 21

4 = 22

ElementsperLevel

Page 20: Hitchhiker Trees - Strangeloop 2016

Big O AnalysisWe Care About the Dominating Factor

Page 21: Hitchhiker Trees - Strangeloop 2016

Performance Analysis/Algebra

We have L levelsLookups cost LOnly the last level mattersThere are 2L-1 elementsThus: n = 2L-1

log2(n) = L

Page 22: Hitchhiker Trees - Strangeloop 2016

Functional updatesPath Copying

Page 23: Hitchhiker Trees - Strangeloop 2016

Path Copying

Updates still log2(n)

Page 24: Hitchhiker Trees - Strangeloop 2016

Properties of TreesBalanced

How do we maintain this?

How to order the valuesSort themTrie

Page 25: Hitchhiker Trees - Strangeloop 2016

I/OChanging Our Cost ModelWhere did the 2 come from in log2(n)?

Page 26: Hitchhiker Trees - Strangeloop 2016

IDEAMore childrenFat nodes with ~B children

Page 27: Hitchhiker Trees - Strangeloop 2016

Going Wide

Page 28: Hitchhiker Trees - Strangeloop 2016

B Trees are Optimal for Reads

Lower Bound of logB(n) for sorted lookups

Controlling the base of the logarithm is awesome

log2(1000) = 9.96log5(1000) = 4.29log100(1000) = 1.5

Going wide gives big constant speedups for free

Under our I/O cost model

Page 29: Hitchhiker Trees - Strangeloop 2016

B Tree BookkeepingNot as simple as a Binary Search Tree

Page 30: Hitchhiker Trees - Strangeloop 2016

Separate Node TypesIndex & Data Nodes

Page 31: Hitchhiker Trees - Strangeloop 2016

B+ Tree

Reduce B to fit more levels on screen

Page 32: Hitchhiker Trees - Strangeloop 2016

Introducing Fractal Trees

Page 33: Hitchhiker Trees - Strangeloop 2016

Fractal Trees

Page 34: Hitchhiker Trees - Strangeloop 2016

BRIEF ASIDE

We can insert fasterlogb(n) is only for sorted lookups

Page 35: Hitchhiker Trees - Strangeloop 2016

Appending to a LogConstant time to appendAlready know the next index where we need to insert

A B C D E

Page 36: Hitchhiker Trees - Strangeloop 2016

Fractal Trees

Page 37: Hitchhiker Trees - Strangeloop 2016

Fractal InsertionInserting 0

Page 38: Hitchhiker Trees - Strangeloop 2016

Walking Through Insertions

Inserting -1

Page 39: Hitchhiker Trees - Strangeloop 2016

Walking Through Insertions

Inserting 28

Page 40: Hitchhiker Trees - Strangeloop 2016

Walking Through Insertions

Inserting 29

Page 41: Hitchhiker Trees - Strangeloop 2016

Walking Through Insertions

Inserting -2

Page 42: Hitchhiker Trees - Strangeloop 2016

Walking Through Insertions

Inserting 11.5

Page 43: Hitchhiker Trees - Strangeloop 2016

Walking Through Insertions

Inserting 100

Page 44: Hitchhiker Trees - Strangeloop 2016

What about Reads?

Page 45: Hitchhiker Trees - Strangeloop 2016

Looking up 20

Page 46: Hitchhiker Trees - Strangeloop 2016

Find the Path

Page 47: Hitchhiker Trees - Strangeloop 2016

Project Pending Operations

Page 48: Hitchhiker Trees - Strangeloop 2016

Broken for Scans

Page 49: Hitchhiker Trees - Strangeloop 2016

Only Project Values Within Range

Page 50: Hitchhiker Trees - Strangeloop 2016

Hitchhiker vs Fractal

Page 51: Hitchhiker Trees - Strangeloop 2016

Path Copying or Not!

Fractal Trees update in-place

Page 52: Hitchhiker Trees - Strangeloop 2016

Path Copying or Not!

Hitchhiker Trees use path-copying

Page 53: Hitchhiker Trees - Strangeloop 2016

Flush ControlTotal I/O I/O per

FlushAvg I/O per Insert

B+ Tree 21 3 3Fractal Tree 12 1 to 4 1.7Hitchhiker Tree

5 5 0.7

Page 54: Hitchhiker Trees - Strangeloop 2016

Real Branching FactorsB+ Trees have fan out of 1000-2000Hitchhiker Trees have fan out of 100-200But Hitchhiker Tree buffers hold 900-1000 elements!

Page 55: Hitchhiker Trees - Strangeloop 2016

I want to try it!On Github

Page 56: Hitchhiker Trees - Strangeloop 2016
Page 57: Hitchhiker Trees - Strangeloop 2016
Page 58: Hitchhiker Trees - Strangeloop 2016

Datacrypt is PluggableBackend StorageI/O ManagementSerializationSorting Algorithm

Page 59: Hitchhiker Trees - Strangeloop 2016

Works with RedisCalled the Outboard API

Page 60: Hitchhiker Trees - Strangeloop 2016

OutboardLooks like a hash mapData stored off-heap in RedisFunctional data structures mean free snapshotsAfter a VM restart, just reconnect to RedisLifetime of in-memory data doesn’t need to be tied to lifetime of runtime memory

Page 61: Hitchhiker Trees - Strangeloop 2016

What’ll we build next?Q&A

Thanks to:Andy Chambers for JDBC Backend &

GC ImprovementsCasey Marshall for S3 Backend

Page 62: Hitchhiker Trees - Strangeloop 2016

(Prefix) Tries

Page 63: Hitchhiker Trees - Strangeloop 2016

(Hash) Array Mapped Tries

We add the fat node trick from B treesWe hash keys first for even distributionNo need to store full hash: prefix is enough