53
DESIGNING A DATABASE LIKE AN ARCHAEOLOGIST Yoav Rubin @yoavrubin

CR17 - Designing a database like an archaeologist

Embed Size (px)

Citation preview

Page 1: CR17 - Designing a database like an archaeologist

DESIGNING A DATABASE LIKE

AN ARCHAEOLOGIST

Yoav Rubin@yoavrubin

Page 2: CR17 - Designing a database like an archaeologist

The background story

■How I got to it

■500 Lines or less■Datomic

Page 3: CR17 - Designing a database like an archaeologist

“…My  own  strategy  is to find a car, or the nearestequivalent, which looks as if it knows where it’s going and follow it.”The Long Dark Tea-Time of the Soul, Douglas Adams

Page 4: CR17 - Designing a database like an archaeologist

My understanding of Datomic

Page 5: CR17 - Designing a database like an archaeologist

Mental model - entities■ Built of entities

– Slices are entities■ Entities have attributes

– E.g., shape, color, quantity■ Attributes have values■ Key insight:

Things can “change” only by Adding layers A

B

CD

0

Page 6: CR17 - Designing a database like an archaeologist

Mental model - layers■ Each change creates a new

layer■ Update entity■ Delete entity■ Add entity

■ Each layer has its timestamp0

1

2

A

B

CD

Page 7: CR17 - Designing a database like an archaeologist

Mental model - layers■ Each change creates a new

layer■ Update entity■ Delete entity■ Add entity

■ Each layer has its timestamp0

1

2

3

A

B

C

DXD

Page 8: CR17 - Designing a database like an archaeologist

Mental model - layers■ Each change creates a new

layer■ Update entity■ Delete entity■ Add entity

■ Each layer has its timestamp0

1

2

3

4

A

B

CD

E

DX

Page 9: CR17 - Designing a database like an archaeologist

Mental model - Datom■ The basic data building block■ composed of a value of an

attribute of an entity at a specific time– E.g (E.A.V.T)

■ A, count, 3, 0■ A, Color, blue, 0■ A, Shape, rectangle, 0

■ A datom points to its previous version

■ A datom may represent a relationship between entities– An entity may point to

anotherentity

0

1

2

3

4

A

B

CD

E

DX

Page 10: CR17 - Designing a database like an archaeologist

Why archeology■ It’s like an archeological

excavation site.■ The excavation site is a database.■ Each artifact is an entity 

– With its corresponding ID.■ Each entity has a set of attributes

– which may change over time■ Each attribute has a

specific value at a specific time■ When you go deeper, you go back

in time■ A change is a new layer

– That hides the previous value

0

1

2

3

4

A

B

CD

E

DX

Page 11: CR17 - Designing a database like an archaeologist

Design approach: Bottom up

Page 12: CR17 - Designing a database like an archaeologist

Data model

Datom lifecycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 13: CR17 - Designing a database like an archaeologist

(defrecord Database [layers top-id curr-time]) (defrecord Layer [storage VAET AVET VEAT EAVT]) (defrecord Entity [id attrs])

(defrecord Attr [name value ts prev-ts])

(defprotocol Storage (get-entity [storage e-id] ) (write-entity [storage entity]) (drop-entity [storage entity]))

Data model - constructs

Page 14: CR17 - Designing a database like an archaeologist

(defn make-attr ([name value type ; these ones are required

& {:keys [cardinality] :or {cardinality :db/single}}] ; defaults {:pre [(contains? #{:db/single :db/multiple} cardinality)]} ; DbC preconditions (with-meta (Attr. name value -1 -1) ; creation {:type type :cardinality cardinality}))) ; metadata

(defn make-entity ([] (make-entity :db/no-id-yet))

([id] (Entity. id {})))

(defn add-attr [ent attr] (let [attr-id (keyword (:name attr))]

(assoc-in ent [:attrs attr-id] attr)))

Data model – basic creators

Page 15: CR17 - Designing a database like an archaeologist

(defn entity-at ([db ent-id] (entity-at db (:curr-time db) ent-id)) ([db ts ent-id] (get-entity (get-in db [:layers ts :storage]) ent-id)))

(defn attr-at ([db ent-id attr-name] (attr-at db ent-id attr-name (:curr-time db))) ([db ent-id attr-name ts] (get-in (entity-at db ts ent-id) [:attrs attr-name]))) (defn value-of-at ([db ent-id attr-name] (:value (attr-at db ent-id attr-name))) ([db ent-id attr-name ts] (:value (attr-at db ent-id attr-name ts))))

(defn indx-at ([db kind] (indx-at db kind (:curr-time db))) ([db kind ts] (kind ((:layers db) ts))))

Data model - accessors

Page 16: CR17 - Designing a database like an archaeologist

Data model

Datom lifecycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 17: CR17 - Designing a database like an archaeologist

Indexing - why■ The database accumulates facts

– Many of them

■ Needs to provide mechanisms to ask questions about these facts– Graph query APIs– Datalog query language APIs

■ That mechanism of insights extraction must be efficient

■ This is what indexes are all about

Page 18: CR17 - Designing a database like an archaeologist

Indexing - what

■ A fact can be identified by the triplet entityId, attributeName and value – at a specific time– A Datom

■ Datoms are indexed

Page 19: CR17 - Designing a database like an archaeologist

Indexing - how■ An index is a three leveled structure:

– First level: map from key to a second level map– Second level: map from a key to a third level set– Third level: a set

■ Represent the datom in an index structure– Each level represents different kind of items in it – Either entityId, attributeName or value

Page 20: CR17 - Designing a database like an archaeologist

Indexing - how

■ The name of the index is derived from the kind of items found in each levels

– EAVT: {entityId {attributeName #{value}}}– VEAT: {value {entityId #{attributeName}}}– AVET: {attributeName {value #{entityId}}}– VAET: {value {attributeName #{entityId}}}

Page 21: CR17 - Designing a database like an archaeologist

EAVT

Page 22: CR17 - Designing a database like an archaeologist

AVET

Page 23: CR17 - Designing a database like an archaeologist

(defn make-index [from-eav to-eav usage-pred] (with-meta {}

{:from-eav from-eav :to-eav to-eav :usage-pred usage-pred}))

Takes a triplet in the canonical EAV order and rearranges it in the index

order

Page 24: CR17 - Designing a database like an archaeologist

(defn make-index [from-eav to-eav usage-pred] (with-meta {}

{:from-eav from-eav :to-eav to-eav :usage-pred usage-pred}))

Takes an index triplet and rearrange it in the canonical

EAV order

Page 25: CR17 - Designing a database like an archaeologist

(defn make-index [from-eav to-eav usage-pred] (with-meta {}

{:from-eav from-eav :to-eav to-eav :usage-pred usage-pred}))

Decides for a given datom whether it should be indexed in

this index

Page 26: CR17 - Designing a database like an archaeologist

(defn make-db [] (atom (Database. [(Layer. (fdb.storage.InMemory.) ; storage (make-index #(vector %3 %2 %1) #(vector %3 %2 %1) #(ref? %));VAET (make-index #(vector %2 %3 %1) #(vector %3 %1 %2) always);AVET (make-index #(vector %3 %1 %2) #(vector %2 %3 %1) always);VEAT (make-index #(vector %1 %2 %3) #(vector %1 %2 %3) always);EAVT )] 0 0)))

Page 27: CR17 - Designing a database like an archaeologist

Data model

Datom lifecycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 28: CR17 - Designing a database like an archaeologist

(defn add-entity [db ent] …)

(defn remove-entity [db ent-id] …)

(defn update-entity ([db ent-id attr-name new-val] …) ([db ent-id attr-name new-val operation] …)

Operate on storage

Operate on indexes

New layerExtract top layer Add layer Return a

database

Page 29: CR17 - Designing a database like an archaeologist

Data model

Datom lifecycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 30: CR17 - Designing a database like an archaeologist

What’s in a transaction■ A database■ Set of operations to be performed in an ACI manner■ The desired API:

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

Not function calls!!

Page 31: CR17 - Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations■ Each operation adds a layer

– On top of the layer that the previous operation added■ Problem: several layers may be added during a transaction■ Solution: re-layer the initial DB with the latest layer

– Then set the time of the new layer– Use the top-id from the last layer

■ Updated the Atom that holds the DB– Or not in case of what-if

Page 32: CR17 - Designing a database like an archaeologist

(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))

Operations in the form of: [[op param1 param2…] [op param1 param2…]]

Page 33: CR17 - Designing a database like an archaeologist

(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))

Going over the operations in the

transaction

Page 34: CR17 - Designing a database like an archaeologist

(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))

Add a single layer on top of the previously added layer,

in each iteration. Build and execute the add /

update / remove call.

Page 35: CR17 - Designing a database like an archaeologist

(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))

No more operations:Construct the output of the

transaction – a fully updated db

Page 36: CR17 - Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

Page 37: CR17 - Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

Page 38: CR17 - Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

Page 39: CR17 - Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

■ _transact is a macro that creates a function that calls the operation it received as an argument

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

Page 40: CR17 - Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))

Page 41: CR17 - Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))

(defmacro what-if [db & txs] `(_transact ~db _what-if ~@txs))

(defn- _what-if [db f txs] (f db txs))

Page 42: CR17 - Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))

(defmacro what-if [db & txs] `(_transact ~db _what-if ~@txs))

(defn- _what-if [db f txs] (f db txs))

Page 43: CR17 - Designing a database like an archaeologist

Transaction vs What-if processTransaction What if(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(what-if db (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(_transact db-conn swap! (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(_transact db _what-if (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(swap! db-conn transact-on-db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])

(_what-if db transact-on-db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])

(transact-on-db db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])

(transact-on-db db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])

(add-entity db e1)(update-entity e2 atr2 val2 :db/add)

(add-entity db e1)(update-entity e2 atr2 val2 :db/add)

The given db-conn (an Atom) points to a new db

Return a new db

Page 44: CR17 - Designing a database like an archaeologist

Data model

Datom lifecycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 45: CR17 - Designing a database like an archaeologist

Evolutionary queries

■ Seeing how an entity’s attribute evolved throughout time■ Each attribute has a prev-ts property■ We can use it to look back and see what was before

Page 46: CR17 - Designing a database like an archaeologist

Evolutionary queries

(defn evolution-of [db ent-id attr-name] (loop [res [] ts (:curr-time db)] (if (= -1 ts) (reverse res) (let [attr (attr-at db ent-id attr-name ts)] (recur (conj res {(:ts attr) (:value attr)}) (:prev-ts attr))))))

■ Seeing how an entity’s attribute evolved throughout time■ Each attribute has a prev-ts property■ We can use it to look back and see what was before

Ends up with a vector showing evolutionary

steps {:<time> :<value>}

Page 47: CR17 - Designing a database like an archaeologist

Data model

Datom lifecycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 48: CR17 - Designing a database like an archaeologist

Graph queries

■ Treating the database as a graph■ Each entity models a node■ An entity may have attributes that their type is :db/ref

– The value of such attribute is an Id of another entity■ Each such attribute models a link

– The link’s label is the attribute name– The link’s target is the attribute value

Page 49: CR17 - Designing a database like an archaeologist

Completing the graph story

■ For each link we know– Source – the containing entity– Target – the value

■ Need for each node to know who are its links– Outgoing

■ By extracting from the entity the attributes whose type is :db/ref– Incoming

■ Using the VAET index– V: the current node’s Id– E: the set of entities pointing to this node

Page 50: CR17 - Designing a database like an archaeologist

Data model

Datom lifecycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 51: CR17 - Designing a database like an archaeologist

Summary

■ We have in memory functional DB with– Transactions– What-if– Graph queries– Evolution queries– Simple datalog queries

■ 488 lines, of which– 73 blank– 55 docstrings– Total – 360 lines

Page 52: CR17 - Designing a database like an archaeologist

Summary - what made it possible■ Design approach: bottom up

– With occasional top-down

■ Clojure’s magic– Persistent data structures– Macros – Data literals– HOFs– Destructuring

■ Clojure approach– Everything is a library– Design data structures and write data structures transformation code– The rest will follow

Page 53: CR17 - Designing a database like an archaeologist

“… I may not have gone where I intended to go, but I think I have ended up where I needed to be” The Long Dark Tea-Time of the Soul, Douglas Adams