Upload
ben-stopford
View
3.614
Download
5
Tags:
Embed Size (px)
Citation preview
Data Storage for Extreme Use Cases The Lay of the Land and a Peek at ODC
Ben Stopford RBS
How fast is a HashMap lookup
~20 ns
Thatrsquos how long it takes light to travel a room
How fast is a database lookup
~20 ms
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
How fast is a HashMap lookup
~20 ns
Thatrsquos how long it takes light to travel a room
How fast is a database lookup
~20 ms
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Thatrsquos how long it takes light to travel a room
How fast is a database lookup
~20 ms
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
How fast is a database lookup
~20 ms
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions