Upload
yoavrubin
View
590
Download
3
Embed Size (px)
Citation preview
DESIGNING A DATABASE LIKE
AN ARCHAEOLOGIST
Yoav Rubin@yoavrubin
The background story■Me
■How I got to it
The background story
■500 Lines or less
■Datomic
My understanding of Datomic
Mental model - entities■ Built of entities■ Slices are entities■ Entities have attributes
– E.g., shape, color, quantity
■ Attributes have values■ Update to an entity adds a
new layer■ Add / edit / delete
A
B
CD
Mental model - layers■ Each change creates a new
layer■ Update entity■ Delete entity■ Add entity
■ Each layer has its timestamp0
1
2
A
B
CD
Mental model - layers■ Each change creates a new
layer■ Update entity■ Delete entity■ Add entity
■ Each layer has its timestamp0
1
2
3
A
B
C
D
D
X
Mental model - layers■ Each change creates a new
layer■ Update entity■ Delete entity■ Add entity
■ Each layer has its timestamp0
1
2
3
4
A
B
CD
E
DX
Mental model - Datom■ A value of an attribute of an
entity at a specific time is a datom
■ E.g., A@0: ■ Count: 3■ Color: blue■ Shape: rectangle
■ A datom points to its previous version
■ An entity may point to anotherentity– Represents a relationship
between entities■ Modeled as yet another
datom
0
1
2
3
4
A
B
CD
E
DX
Why archeology■ It’s like an archeological
excavation site.■ The excavation site is
a database.■ Each artifact is an entity with a
corresponding ID.■ Each entity has a set
of attributes, which may change over time.
■ Each attribute has a specific value at a specific time.
■ When you go deeper, you go back in time
■ An update just hides previousvalue
0
1
2
3
4
A
B
CD
E
DX
Design approach: Bottom up
Data model
Datom Life cycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
(defrecord Database [layers top-id curr-time]) (defrecord Layer [storage VAET AVET VEAT EAVT])
(defrecord Entity [id attrs])
(defrecord Attr [name value ts prev-ts])
(defprotocol Storage (get-entity [storage e-id] ) (write-entity [storage entity]) (drop-entity [storage entity]))
Data model - constructs
(defn make-attr ([name value type ; these ones are required
& {:keys [cardinality] :or {cardinality :db/single}}] ; defaults {:pre [(contains? #{:db/single :db/multiple} cardinality)]} ; DbC preconditions (with-meta (Attr. name value -1 -1) ; creation {:type type :cardinality cardinality}))) ; metadata
(defn make-entity ([] (make-entity :db/no-id-yet))
([id] (Entity. id {})))
(defn add-attr [ent attr] (let [attr-id (keyword (:name attr))]
(assoc-in ent [:attrs attr-id] attr)))
Data model – basic creators
(defn entity-at ([db ent-id] (entity-at db (:curr-time db) ent-id)) ([db ts ent-id] (stored-entity (get-in db [:layers ts :storage]) ent-id)))
(defn attr-at ([db ent-id attr-name] (attr-at db ent-id attr-name (:curr-time db))) ([db ent-id attr-name ts] (get-in (entity-at db ts ent-id) [:attrs attr-name]))) (defn value-of-at ([db ent-id attr-name] (:value (attr-at db ent-id attr-name))) ([db ent-id attr-name ts] (:value (attr-at db ent-id attr-name ts))))
(defn indx-at ([db kind] (indx-at db kind (:curr-time db))) ([db kind ts] (kind ((:layers db) ts))))
Data model - accessors
Indexes
Data model
Datom Life cycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
Indexing
■ The database accumulate facts – called datoms■ A datom is a triplet composed of three items [entityId attributeName value]
■ Need to index datoms
■ An index is a three leveled structure:– First level: map from key to a second level map– Second level: map from a key to a third level set– Third level: a set
■ Each level represents different kind of item in it (entityId, attributeName or value)
Indexing
■ The name of the index is derived from of the items in the levels
– EAVT: {entityId {attributeName #{value}}}– VEAT: {value {entityId #{attributeName}}}– AVET: {attributeName {value #{entityId}}}
– VAET: {value {attributeName #{entityId}}}
EAVT
AVET
(defn make-index [from-eav to-eav usage-pred] (with-meta {}
{:from-eav from-eav :to-eav to-eav :usage-pred usage-pred}))
(defn make-db [] (atom (Database. [(Layer. (fdb.storage.InMemory.) ; storage (make-index #(vector %3 %2 %1) #(vector %3 %2 %1) #(ref? %));VAET (make-index #(vector %2 %3 %1) #(vector %3 %1 %2) always);AVET (make-index #(vector %3 %1 %2) #(vector %2 %3 %1) always);VEAT (make-index #(vector %1 %2 %3) #(vector %1 %2 %3) always);EAVT )] 0 0)))
Data model
Datom Life cycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
(defn add-entity [db ent] …)
(defn remove-entity [db ent-id] …)
(defn update-entity ([db ent-id attr-name new-val] …) ([db ent-id attr-name new-val operation] …)
(defn add-entities [db ents-seq] (reduce add-entity db ents-seq))
Operate on storage
Operate on indexes
New layerExtract top layer Add layer Return a
database
No update to the DB’s curr-time!!
Transactions
Data model
Datom Life cycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
What’s in a transaction■ A database■ Set of operations to be performed in an ACI manner■ The desired API:
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
Not function calls!!
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations■ Each operation adds a layer
– On top of the layer that the previous operation added■ Problem: several layers may be added during a transaction■ Solution: re-layer the initial DB with the latest layer
– Then increase the time– Use the top-id from the last layer
■ Updated the Atom that holds the DB
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations
■ _transact is a macro that creates a function that calls the function it received as an argument
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))
(defmacro what-if [db & txs] `(_transact ~db _what-if ~@txs))
(defn- _what-if [db f txs] (f db txs))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))
(defmacro what-if [db & txs] `(_transact ~db _what-if ~@txs))
(defn- _what-if [db f txs] (f db txs))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations
■ Execute each operation in the - adding layers■ Re-layer the initial DB with the latest layer and update its time■ Create a new instance of DB■ Updated the Atom to hold the new instance
– Or not in case of what-if
(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))
Operations in the form of: [[op param1 param2…] [op param1 param2…]]
(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))
Going over the operations in the
transaction
(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))
Add a single layer on top of the previously added layer,
in each iteration. Build and execution of the add / update / remove call.
(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))
No more operations:Construct the output of the
transaction – a fully updated db
Transaction vs What-if processTransaction What if(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(what-if db (add-entity a3) (remove-entity a4))
(_transact db-conn swap! (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(_transact db _what-if (add-entity a3) (remove-entity a4))
(swap! db-conn transact-on-db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])
(_what-if db transact-on-db [[add-entity a3] [remove-entity a4]])
(transact-on-db db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])
(transact-on-db db [[add-entity a3] [remove-entity a4]])
(add-entity db e1)(update-entity e2 atr2 val2 :db/add)
(add-entity a3)(remove-entity a4)
Return the given db-conn (an atom) with updated state
Return a new db
Data model
Datom Life cycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
Evolutionary queries
■ Seeing how an entity’s attribute evolved throughout time■ Each attribute has a prev-ts property■ We can use it to look back and see what was before
Evolutionary queries
(defn evolution-of [db ent-id attr-name] (loop [res [] ts (:curr-time db)] (if (= -1 ts) (reverse res) (let [attr (attr-at db ent-id attr-name ts)] (recur (conj res {(:ts attr) (:value attr)}) (:prev-ts attr))))))
■ Seeing how an entity’s attribute evolved throughout time■ Each attribute has a prev-ts property■ We can use it to look back and see what was before
Ends up with a vector showing evolutionary
steps {:<time> :<value>}
Data model
Datom Life cycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
Graph queries
■ Treating the database as a graph■ Each entity is a node■ An entity may have attributes that their type is :db/ref
– The value of such attribute is an Id of another entity■ Each such attribute is a link
– The link’s label is the attribute name– The link’s target is the attribute value
0
1
2
3
4
A
B
CD
E
E points to A
DX
Completing the graph story
■ For each link we know– Source – the containing entity– Target – the value
■ Need for each node to know who are its links– Outgoing– Incoming– (at a give time)
(defn incoming-refs [db ts ent-id & ref-names] (let [vaet (indx-at db :VAET ts) all-attr-map (vaet ent-id) filtered-map (if ref-names (select-keys ref-names all-attr-map) all-attr-map)] (reduce into #{} (vals filtered-map))))
(defn outgoing-refs [db ts ent-id & ref-names] (let [val-filter-fn (if ref-names #(vals (select-keys ref-names %)) vals)] (if-not ent-id [] (->> (entity-at db ts ent-id) ; the entity at that timestamp (:attrs) ; take the attributes (val-filter-fn) ; filter them according to the given ref-names (filter ref?) ; take from it only the ones that are links (mapcat :value))))) ; take all the targets
We may want part of the
links
We may want part of the
links
(defn- traverse [pendings explored exploring-fn ent-at structure-fn] (let [cleaned-pendings (remove-explored pendings explored structure-fn) item (first cleaned-pendings) all-next-items (exploring-fn item) next-pends (reduce conj (structure-fn (rest cleaned-pendings)) all-next-items)] (when item (cons (ent-at item) (lazy-seq (traverse next-pends (conj explored item) exploring-fn ent-at structure-fn))))))
(defn traverse-db ([start-ent-id db algo direction] (traverse-db start-ent-id db algo direction (:curr-time db))) ([start-ent-id db algo direction ts] (let [structure-fn (if (= :graph/bfs algo) vec list*) explore-fn (if (= :graph/outgoing direction) outgoing-refs incoming-refs)] (traverse [start-ent-id] #{} (partial explore-fn db ts) (partial entity-at db ts) structure-fn))))
Example: BFS or DFS over the incoming or outgoing links
(defn- traverse [pendings explored exploring-fn ent-at structure-fn] (let [cleaned-pendings (remove-explored pendings explored structure-fn) item (first cleaned-pendings) all-next-items (exploring-fn item) next-pends (reduce conj (structure-fn (rest cleaned-pendings)) all-next-items)] (when item (cons (ent-at item) (lazy-seq (traverse next-pends (conj explored item) exploring-fn ent-at structure-fn))))))
(defn traverse-db ([start-ent-id db algo direction] (traverse-db start-ent-id db algo direction (:curr-time db))) ([start-ent-id db algo direction ts] (let [structure-fn (if (= :graph/bfs algo) vec list*) explore-fn (if (= :graph/outgoing direction) outgoing-refs incoming-refs)] (traverse [start-ent-id] #{} (partial explore-fn db ts) (partial entity-at db ts) structure-fn))))
Example: BFS or DFS over the incoming or outgoing links
Preparations
(defn- traverse [pendings explored exploring-fn ent-at structure-fn] (let [cleaned-pendings (remove-explored pendings explored structure-fn) item (first cleaned-pendings) all-next-items (exploring-fn item) next-pends (reduce conj (structure-fn (rest cleaned-pendings)) all-next-items)] (when item (cons (ent-at item) (lazy-seq (traverse next-pends (conj explored item) exploring-fn ent-at structure-fn))))))
(defn traverse-db ([start-ent-id db algo direction] (traverse-db start-ent-id db algo direction (:curr-time db))) ([start-ent-id db algo direction ts] (let [structure-fn (if (= :graph/bfs algo) vec list*) explore-fn (if (= :graph/outgoing direction) outgoing-refs incoming-refs)] (traverse [start-ent-id] #{} (partial explore-fn db ts) (partial entity-at db ts) structure-fn))))
Example: BFS or DFS over the incoming or outgoing links
Data model
Datom Life cycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
Simple Datalog queries
■ The database accumulates facts– A fact is a triplet structured like this:
[EntityId AttributeName Value]■ Need a query language that can operate on facts
■ Datalog query have two main components – Output structure– List of conditions – query clauses
■ A condition is structured the same way a fact is
The anatomy of a query
{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}
Vector describing the output structure (think of SELECT in
SQL)
The anatomy of a queryVector of query clauses.
The operator between them is ‘AND’. Each clause is built of 3
terms.{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}
The anatomy of a query
Term to operate on the Entity id part of a datom.
Here: variable, same symbol => same value.
{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}
The anatomy of a query
Term to operate on the Attribute part of a Datom.
Here – simple value means exact match
{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}
The anatomy of a query
“Equals” predicate to apply on the value
{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}
Term to operate on the value part of a datom.
{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}
The anatomy of a query
User provided predicate to act on a value
{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}
The anatomy of a query
User provided predicate to act on a value
Variables to be used at the output
What are the names and birthdays of entities who like pizza, speak English, and who have a birthday this month?
How does it work?
■ Need to transform the query clauses to predicates clauses– Each term is transform to a predicate (function returning true or false)
■ Need to execute the predicates on the data to find the facts that apply for each clause– Then AND them
■ Need to extract from these facts the output– Based on the user’s request
How does it work?
■ Need to transform the query clauses to predicates clauses– Each term is transform to a predicate (function returning true or false)
■ Need to execute the predicates on the data to find the facts that apply for each clause– Then AND them
■ Need to extract from these facts the output– Based on the user’s request
What’s in a query clause■ A query clause is built of 3 terms
■ Each term can be one of the following:– Variable – starts with ‘?’– Don’t care symbol – ‘_’– Single value – interpreted as equals– Unary operator with variable (negative? ?num)– Binary operator with variable as the first operand (> ?num 5)– Binary operator with variable as the second operand (> 5 ?num)
■ Each of these should be transformed to an executable predicate– If there was a variable, need to remember its symbol
(defmacro clause-term-expr [clause-term] (cond (variable? (str clause-term)) ;variable #(= % %) (not (coll? clause-term)) ;constant `#(= % ~clause-term) (= 2 (count clause-term)) ;unary operator `#(~(first clause-term) %) (variable? (str (second clause-term)));binary operator, 1st operand is variable `#(~(first clause-term) % ~(last clause-term)) (variable? (str (last clause-term)));binary operator, 2nd operand is variable `#(~(first clause-term) ~(second clause-term) %)))
(defmacro clause-term-meta [clause-term] (cond (coll? clause-term) ;unary or binary operator (first (filter #(variable? % false) (map str clause-term))) (variable? (str clause-term) false) ;variable without don’ t care (str clause-term) :no-variable-in-clause )));constant or don’t care nil))
Term becomes an executable
form
(defmacro clause-term-expr [clause-term] (cond (variable? (str clause-term)) ;variable #(= % %) (not (coll? clause-term)) ;constant `#(= % ~clause-term) (= 2 (count clause-term)) ;unary operator `#(~(first clause-term) %) (variable? (str (second clause-term)));binary operator, 1st operand is variable `#(~(first clause-term) % ~(last clause-term)) (variable? (str (last clause-term)));binary operator, 2nd operand is variable `#(~(first clause-term) ~(second clause-term) %)))
(defmacro clause-term-meta [clause-term] (cond (coll? clause-term) ;unary or binary operator (first (filter #(variable? % false) (map str clause-term))) (variable? (str clause-term) false) ;variable without don’ t care (str clause-term) :no-variable-in-clause )));constant or don’t care nil))
Extracting the name of the
variable used in the term
(defmacro pred-clause [clause] (loop [[trm# & rst-trm#] clause exprs# [] metas# []] (if trm# (recur rst-trm# (conj exprs# `(clause-term-expr ~ trm#)) (conj metas#`(clause-term-meta ~ trm#))) (with-meta exprs# {:db/variable metas#}))))
(defmacro q-clauses-to-pred-clauses [clauses] (loop [[frst# & rst#] clauses preds-vecs# []] (if-not frst# preds-vecs# (recur rst# `(conj ~preds-vecs# (pred-clause ~frst#))))))
Going over the terms in a clause
(a triplet)
(defmacro pred-clause [clause] (loop [[trm# & rst-trm#] clause exprs# [] metas# []] (if trm# (recur rst-trm# (conj exprs# `(clause-term-expr ~ trm#)) (conj metas#`(clause-term-meta ~ trm#))) (with-meta exprs# {:db/variable metas#}))))
(defmacro q-clauses-to-pred-clauses [clauses] (loop [[frst# & rst#] clauses preds-vecs# []] (if-not frst# preds-vecs# (recur rst# `(conj ~preds-vecs# (pred-clause ~frst#))))))
Going over conditions in a
query
Query Clause Predicate Clause Meta Clause
[?e :likes "pizza"] [#(= % %) #(= % :likes) #(= % "pizza")] ["?e" nil nil][?e :name ?nm] [#(= % %) #(= % :name) #(= % %)] ["?e" nil "?
nm"][?e :speak "English"] [#(= % %) #(= % :speak) #(= % "English")] ["?e" nil nil][?e :birthday (birthday-this-month? ?bd)]
[#(= % %) #(= % :birthday) #(birthday-this-month? %)]
["?e" nil "?bd"]
:where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]
Query Clause Predicate Clause Meta Clause
[?e :likes "pizza"] [#(= % %) #(= % :likes) #(= % "pizza")] ["?e" nil nil][?e :name ?nm] [#(= % %) #(= % :name) #(= % %)] ["?e" nil "?
nm"][?e :speak "English"] [#(= % %) #(= % :speak) #(= % "English")] ["?e" nil nil][?e :birthday (birthday-this-month? ?bd)]
[#(= % %) #(= % :birthday) #(birthday-this-month? %)]
["?e" nil "?bd"]
:where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]
Query Clause Predicate Clause Meta Clause
[?e :likes "pizza"] [#(= % %) #(= % :likes) #(= % "pizza")] ["?e" nil nil][?e :name ?nm] [#(= % %) #(= % :name) #(= % %)] ["?e" nil "?
nm"][?e :speak "English"] [#(= % %) #(= % :speak) #(= % "English")] ["?e" nil nil][?e :birthday (birthday-this-month? ?bd)]
[#(= % %) #(= % :birthday) #(birthday-this-month? %)]
["?e" nil "?bd"]
:where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]
Query Clause Predicate Clause Meta Clause
[?e :likes "pizza"] [#(= % %) #(= % :likes) #(= % "pizza")] ["?e" nil nil][?e :name ?nm] [#(= % %) #(= % :name) #(= % %)] ["?e" nil "?
nm"][?e :speak "English"] [#(= % %) #(= % :speak) #(= % "English")] ["?e" nil nil][?e :birthday (birthday-this-month? ?bd)]
[#(= % %) #(= % :birthday) #(birthday-this-month? %)]
["?e" nil "?bd"]
:where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]
How does it work?
■ Need to transform the query clauses to predicates clauses– Each term is transform to a predicate (function returning true or false)
■ Need to execute the predicates on the data to find the facts that apply for each clause– Then AND them
■ Need to extract from these facts the output– Based on the user’s request
Executing the query ■ Need to build a query plan
– The query itself gets executed on an index– Not on the data !!– Need to decide on which index to use
■ Executing the query means applying each of the clauses on an index– Each of the terms on the right level of the index– Remember, index is:
■ Top level – map■ Second level – map■ Third level – set
■ There may be a need to restructure the clauses to the index structure– The query clause is ordered as E->A->V– Index – not necessarily
Building a query plan
■ There’s actually only one predefined query plan– Operates on single index– Only one variable can be used in different clauses – the joining
variable■ Needs to receive the index to operate on to be fully operable
– That index is decided based on the joining variable■ It is executed on the third level
(defn index-of-joining-variable [query-clauses] (let [metas-seq (map #(:db/variable (meta %)) query-clauses) ;extracting the meta clauses collapsing-fn (fn [accV v] (map #(when (= %1 %2) %1) accV v)) ;clause collapsing fn collapsed (reduce collapsing-fn metas-seq)] ;reducing query to one triplet, with one variable (first (keep-indexed #(when (variable? %2 false) %1) collapsed)))) ;taking the index of the variable
(defn build-query-plan [query] (let [term-ind (index-of-joining-variable query) ind-to-use (case term-ind 0 :AVET 1 :VEAT 2 :EAVT)] (partial single-index-query-plan query ind-to-use)))
(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))
Finding the index of the joining
variable
(defn index-of-joining-variable [query-clauses] (let [metas-seq (map #(:db/variable (meta %)) query-clauses) ;extracting the meta clauses collapsing-fn (fn [accV v] (map #(when (= %1 %2) %1) accV v)) ;clause collapsing fn collapsed (reduce collapsing-fn metas-seq)] ;reducing query to one triplet, with one variable (first (keep-indexed #(when (variable? %2 false) %1) collapsed)))) ;taking the index of the variable
(defn build-query-plan [query] (let [term-ind (index-of-joining-variable query) ind-to-use (case term-ind 0 :AVET 1 :VEAT 2 :EAVT)] (partial single-index-query-plan query ind-to-use)))
(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))
Deciding which index to use in
the query
(defn index-of-joining-variable [query-clauses] (let [metas-seq (map #(:db/variable (meta %)) query-clauses) ;extracting the meta clauses collapsing-fn (fn [accV v] (map #(when (= %1 %2) %1) accV v)) ;clause collapsing fn collapsed (reduce collapsing-fn metas-seq)] ;reducing query to one triplet, with one variable (first (keep-indexed #(when (variable? %2 false) %1) collapsed)))) ;taking the index of the variable
(defn build-query-plan [query] (let [term-ind (index-of-joining-variable query) ind-to-use (case term-ind 0 :AVET 1 :VEAT 2 :EAVT)] (partial single-index-query-plan query ind-to-use)))
(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))
Constructing the plan
Executing the plan
■ Apply each clause on the index– Each such application returns a result clause
■ All the paths in the index that passed all the predicates
■ Collecting all the results and ‘AND’ing them– By looking at the values of the joining variable
■ The third level items that are found in all of the result clauses
Index
The levels
Level 1
Level 2
Level 3
Applying one predicate clause
Level 1
Level 2
Level 3
Applying another predicate clause
Level 1
Level 2
Level 3
The joining variable need to see which items are found in all of the sets
(defn query-index [index pred-clauses] (let [result-clauses (filter-index index pred-clauses) relevant-items (items-that-answer-all-conditions (map last result-clauses) (count pred-clauses)) cleaned-result-clauses (map (partial mask-path-leaf-with-items relevant-items) result-clauses)] (filter #(not-empty (last %)) cleaned-result-clauses)))
(defn filter-index [index predicate-clauses] (for [pred-clause predicate-clauses :let [[lvl1-prd lvl2-prd lvl3-prd] (apply (from-eav index) pred-clause)] [k1 l2map] index ; keys and values of the first level :when (try (lvl1-prd k1) (catch Exception e false)) [k2 l3-set] l2map ; keys and values of the second level :when (try (lvl2-prd k2) (catch Exception e false)) :let [res (set (filter lvl3-prd l3-set))] ] (with-meta [k1 k2 res] (meta pred-clause))))
(defn query-index [index pred-clauses] (let [result-clauses (filter-index index pred-clauses) relevant-items (items-that-answer-all-conditions (map last result-clauses) (count pred-clauses)) cleaned-result-clauses (map (partial mask-path-leaf-with-items relevant-items) result-clauses)] (filter #(not-empty (last %)) cleaned-result-clauses)))
(defn filter-index [index predicate-clauses] (for [pred-clause predicate-clauses :let [[lvl1-prd lvl2-prd lvl3-prd] (apply (from-eav index) pred-clause)] [k1 l2map] index ; keys and values of the first level :when (try (lvl1-prd k1) (catch Exception e false)) [k2 l3-set] l2map ; keys and values of the second level :when (try (lvl2-prd k2) (catch Exception e false)) :let [res (set (filter lvl3-prd l3-set))] ] (with-meta [k1 k2 res] (meta pred-clause))))
Adapting to the index’s structure
[ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]
Use the AVET index
Entity ID Attribute Name
Attribute Value
1 :name :likes:speak:birthday
USAPizzaEnglishJuly 4, 1776
2 :name :likes:speak:birthday
FranceRed wineFrenchJuly 14, 1789
3 :name :likes:speak:birthday
CanadaSnowEnglishJuly 1, 1867
[ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]
Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?nm"][:speak "English" #{1, 3}] ["?e" nil nil][:birthday "July 4, 1776" #{1}]
["?e" nil "?bd"]
[:name France #{2}] ["?e" nil "?nm"][:birthday "July 14, 1789" #{2}]
["?e" nil "?bd"]
[:name Canada #{3}] ["?e" nil "?nm"][:birthday "July 1, 1867" {3}]
["?e" nil "?bd"]
Use the AVET index
Entity ID Attribute Name
Attribute Value
1 :name :likes:speak:birthday
USAPizzaEnglishJuly 4, 1776
2 :name :likes:speak:birthday
FranceRed wineFrenchJuly 14, 1789
3 :name :likes:speak:birthday
CanadaSnowEnglishJuly 1, 1867
[ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]
Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?nm"][:speak "English" #{1, 3}] ["?e" nil nil][:birthday "July 4, 1776" #{1}]
["?e" nil "?bd"]
[:name France #{2}] ["?e" nil "?nm"][:birthday "July 14, 1789" #{2}]
["?e" nil "?bd"]
[:name Canada #{3}] ["?e" nil "?nm"][:birthday "July 1, 1867" {3}]
["?e" nil "?bd"]
Use the AVET index
Notice that the result clauses have the triplet structure of the index
(AVE)
(defn items-that-answer-all-conditions [items-seq num-of-conditions] (->> items-seq ; take the items-seq (map vec) ; make each collection (actually a set) into a vector (reduce into []) ;reduce all the vectors into one vector (frequencies) ;count for each item in how many collections (sets) it was in (filter #(<= num-of-conditions (last %))) ;items that answered all conditions (map first) ; take from the duos the items themselves (set)))
(defn query-index [index pred-clauses] (let [result-clauses (filter-index index pred-clauses) relevant-items (items-that-answer-all-conditions (map last result-clauses) (count pred-clauses)) cleaned-result-clauses (map (partial mask-path-leaf-with-items relevant-items) result-clauses)] (filter #(not-empty (last %)) cleaned-result-clauses)))
(defn mask-path-leaf-with-items [relevant-items path] (update-in path [2] CS/intersection relevant-items))
ANDing the results
(defn items-that-answer-all-conditions [items-seq num-of-conditions] (->> items-seq ; take the items-seq (map vec) ; make each collection (actually a set) into a vector (reduce into []) ;reduce all the vectors into one vector (frequencies) ;count for each item in how many collections (sets) it was in (filter #(<= num-of-conditions (last %))) ;items that answered all conditions (map first) ; take from the duos the items themselves (set)))
(defn query-index [index pred-clauses] (let [result-clauses (filter-index index pred-clauses) relevant-items (items-that-answer-all-conditions (map last result-clauses) (count pred-clauses)) cleaned-result-clauses (map (partial mask-path-leaf-with-items relevant-items) result-clauses)] (filter #(not-empty (last %)) cleaned-result-clauses)))
(defn mask-path-leaf-with-items [relevant-items path] (update-in path [2] CS/intersection relevant-items))
Filtering the ANDed results
[ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]
Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?
nm"][:speak "English" #{1, 3}]
["?e" nil nil]
[:birthday "July 4, 1776" #{1}]
["?e" nil "?bd"]
[:name France #{2}] ["?e" nil "?nm"]
[:birthday "July 14, 1789" #{2}]
["?e" nil "?bd"]
[:name Canada #{3}] ["?e" nil "?nm"]
[:birthday "July 1, 1867" {3}]
["?e" nil "?bd"]
Use the AVET index
Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?
nm"][:birthday "July 4, 1776" #{1}]
["?e" nil "?bd"]
[:speak "English" #{1}] ["?e" nil nil]
How does it work?
■ Need to transform the query clauses to predicates clauses– Each term is transform to a predicate (function returning true or false)
■ Need to execute the predicates on the data to find the facts that apply for each clause– Then AND them
■ Need to extract from these facts the output– Based on the user’s request
Reporting the results
■ Transform the results clauses into a binding pairs structure– A structure that follows an index structure (map->map->set)
■ Now in each level we have a pair that is a match from the result clause and the meta clause
■ Extract from the bind pairs the variables that the user requested
Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?
nm"][:birthday "July 4, 1776" #{1}]
["?e" nil "?bd"]
[:speak "English" #{1}] ["?e" nil nil]
{[1 "?e"] { {[:likes nil] ["Pizza" nil]} {[:name nil] ["USA" "?nm"]} {[:speaks nil] ["English" nil]} {[:birthday nil] ["July 4, 1776" "?bd"]} }}
Bind pairs structure
(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))
(defn bind-variables-to-query [q-res index] (let [seq-res-path (mapcat (partial combine-path-and-meta (from-eav index)) q-res) res-path (map #(->> %1 (partition 2)(apply (to-eav index))) seq-res-path)] (reduce #(assoc-in %1 (butlast %2) (last %2)) {} res-path)))
(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))
(defn bind-variables-to-query [q-res index] (let [seq-res-path (mapcat (partial combine-path-and-meta (from-eav index)) q-res) res-path (map #(->> %1 (partition 2)(apply (to-eav index))) seq-res-path)] (reduce #(assoc-in %1 (butlast %2) (last %2)) {} res-path)))
(defn combine-path-and-meta [from-eav-fn path] (let [expanded-path [(repeat (first path)) (repeat (second path)) (last path)] ;path’s set is cut to items meta-of-path (apply from-eav-fn (map repeat (:db/variable (meta path)))) ;meta in index order combined-data-and-meta-path (interleave meta-of-path expanded-path)] ;interleaving all (apply (partial map vector) combined-data-and-meta-path)))
6 items vector, result and its meta
(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))
(defn bind-variables-to-query [q-res index] (let [seq-res-path (mapcat (partial combine-path-and-meta (from-eav index)) q-res) res-path (map #(->> %1 (partition 2)(apply (to-eav index))) seq-res-path)] (reduce #(assoc-in %1 (butlast %2) (last %2)) {} res-path)))
Restructuring the 6 items vector to be pairs in an EAV structure
(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))
(defn bind-variables-to-query [q-res index] (let [seq-res-path (mapcat (partial combine-path-and-meta (from-eav index)) q-res) res-path (map #(->> %1 (partition 2)(apply (to-eav index))) seq-res-path)] (reduce #(assoc-in %1 (butlast %2) (last %2)) {} res-path)))
Building the 3 pairs into binding pairs structure
{[1 "?e"] { {[:likes nil] ["Pizza" nil]} {[:name nil] ["USA" "?nm"]} {[:speaks nil] ["English" nil]} {[:birthday nil] ["July 4, 1776" "?bd"]} }}
Reporting
■ We have a superset of the answer
■ Need to take only the variables the user requested
:find [?nm ?bd ]
(defn unify [binded-res-col needed-vars] (map (partial locate-vars-in-query-res needed-vars) binded-res-col))
(defn locate-vars-in-query-res [vars-set q-res] (let [[e-pair av-map] q-res e-res (resultify-bind-pair vars-set [] e-pair)] (map (partial resultify-av-pair vars-set e-res) av-map)))
(defn resultify-bind-pair [vars-set accum pair] (let [[ var-name _] pair] (if (contains? vars-set var-name) (conj accum pair) accum)))
(defn resultify-av-pair [vars-set accum-res av-pair] (reduce (partial resultify-bind-pair vars-set) accum-res av-pair))
Entity pair => result
(defn unify [binded-res-col needed-vars] (map (partial locate-vars-in-query-res needed-vars) binded-res-col))
(defn locate-vars-in-query-res [vars-set q-res] (let [[e-pair av-map] q-res e-res (resultify-bind-pair vars-set [] e-pair)] (map (partial resultify-av-pair vars-set e-res) av-map)))
(defn resultify-bind-pair [vars-set accum pair] (let [[ var-name _] pair] (if (contains? vars-set var-name) (conj accum pair) accum)))
(defn resultify-av-pair [vars-set accum-res av-pair] (reduce (partial resultify-bind-pair vars-set) accum-res av-pair))
Attribute value pair => result
{[1 "?e"] { {[:likes nil] ["Pizza" nil]} {[:name nil] ["USA" "?nm"]} {[:speaks nil] ["English" nil]} {[:birthday nil] ["July 4, 1776" "?bd"]} }}
:find [?nm ?bd ]
[("?nm" "USA") ("?bd" "July 4, 1776")]
…
Running the show(defmacro q [db query] (let [pred-clauses# (q-clauses-to-pred-clauses ~(:where query)) needed-vars# (symbol-col-to-set ~(:find query)) query-plan# (build-query-plan pred-clauses#) query-internal-res# (query-plan# ~db)] (unify query-internal-res# needed-vars#)))
Running the show(defmacro q [db query] (let [pred-clauses# (q-clauses-to-pred-clauses ~(:where query)) needed-vars# (symbol-col-to-set ~(:find query)) query-plan# (build-query-plan pred-clauses#) query-internal-res# (query-plan# ~db)] (unify query-internal-res# needed-vars#)))
Each of these steps does data structure transformation!
Summary
■ We have in memory functional DB with– Transactions– What-if– Graph queries– Evolution queries– Simple datalog queries
■ 488 lines, of which– 73 blank– 55 docstrings– Total – 360 lines
Summary - what made it possible■ Priorities
– Normal project: correct > optimized > readable > short– This project: correct > readable > short > optimized
■ Ignored by design– Networking– Durability– Nothing besides clojure.core
■ (Almost succeeded)
Summary - what made it possible■ Design approach: bottom up
– With occasional top-down
■ Clojure’s magic– Persistent data structures– Macros – Data literals– HOFs– Destructuring
■ Clojure approach– Everything is a library– Design data structures and write data structures transformation code– The rest will follow
Thank You