Upload
ngdata
View
5.270
Download
2
Tags:
Embed Size (px)
DESCRIPTION
An introductory presentation on NOSQL technology for SAI (2010-04-20)
Citation preview
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
N-O-SQLnew database technologies on the rise
http://www.flickr.com/photos/wolfgangstaudt/2215246206/
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Who am I
» Steven Noels - [email protected]
»Outerthought : scalable content applications
»makers of Daisy and Lily open source CMS
2
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Agenda
3
» raison d’être: what brought us here
» concepts: required theory readings
»market overview: trees & the forest
» experiences and (h)in(d)sights
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Raison d’être
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
History
5
hierarchical databases
IMS
OODBMS
XMLDB RDBMS
1. standardization
2. simplification
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Inconsistency through slave lag
6
John
Qui
nn (
Dig
g)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Scaling writes (1)
7
John
Qui
nn (
Dig
g)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Scaling writes (2)
8
John
Qui
nn (
Dig
g)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Issues with partitioning
» lose the ability to make arbitrary queries
» have to predict data access patterns when formulating partitioning strategy
» complex and fragile systems
9
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Replication complexity
10
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Scaling relational systems
11
»When scaling relational systems you loose their advantages but retain their overhead
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
History
12
RDBMS NOSQL
cachingdenormalisationshardingreplication ...
3. scaling
4. rethinkingthe problem
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Moore vs Kryder
» seek time isconstant (networklatency as well?)
» transfer rate ! spindles !
» as a principle, writes arehard to scale
13
✖
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Cambrian Explosion
14
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
?Buzz-oriented development
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Cambrian Explosion
16
N-O-SQL
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
The Perspective of Cost
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Common themes
19
» SCALE SCALE SCALE
» new datamodels
» devops
»N-O-SQL
»The Cloud :technology is of no interest anymore
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Numbers of scale
20
http://qos.doubleclick.net/counters/
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Types of scaling
21
» scaling for usage» volume of users
» volume of data
availabilityreplication
» scaling types of ops» concurrent read
» concurrent write
partioningconsistency
distribution
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Distributed systems are hard !
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
8 fallacies of distributed computing» The network is reliable.
» Latency is zero.
» Bandwidth is infinite.
» The network is secure.
» Topology doesn't change.
» There is one administrator.
» Transport cost is zero.
» The network is homogeneous.
23
Pete
r D
euts
ch a
nd Ja
mes
Gos
ling
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
New Data
» sparse structures
»weak schemas
» graphs
» semi-structured
» document-oriented
24
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
N-O-SQL =not only SQL !
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The NOSQL footprint
26
AC
ID,
sim
ple
oper
atio
nal
const
rain
ts
free-structured or sparse data
SQL
NOSQL
referential integrity,typed data
high
ly scalable an
davailab
le (com
plex
ity)
HBase
Cassandra
CouchDB
MongoDB
neo4j
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
NOSQL, if you need ...
» horizontal scaling (out rather than up)
» unusually common data (aka free-structured)
» speed (especially for writes)
» the bleeding edge
27
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
SQL/RDBMS, if you need ...
» SQL
»ACID
» normalisation
» a defined liability
28
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Theory
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Robust systems
30
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Academic background
»Amazon Dynamo
»Google BigTable
» Eric Brewer CAP theorem
31
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Amazon Dynamo
32
» coined the term ‘eventual consistency’
» consistent hashing
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Consistent hashing
33
http://horicky.blogspot.com/2009/11/nosql-patterns.html
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Consistent hashing
34
- node C+ node D
http://www.lexemetech.com/2007/11/consistent-hashing.html
»multi-dimensional column-oriented database
» on top of GoogleFileSystem
» object versioning
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Google BigTable
35
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
CAP theorem
36
strong consistency
highavailability
partition-tolerance
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
CAP
»Strong Consistency: all clients see the same view, even in the presence of updates
»High Availability: all clients can find some replica of the data, even in the presence of failures
»Partition-tolerance: the system properties hold even when the system is partitioned
37
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Consistency
38
»Where is my data I just updated?
» Ideal world :
The result of every write-operation is reflected by subsequent read-operations.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Consistency
39
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Sunny-day scenario
40
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Network partioning
41
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Culture Clash
42
»Classic distributed systems: focus on ACID
» atomic
» consistent
» isolated
» durable
»Modern internet systems: focus on BASE
» basically available
» soft-state (or scalable)
» eventually consistent
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Culture Clash
43
»ACID» highest priority: strong
consistency for transactions
» availability less important
» pessimistic
» rigorous analysis
» complex mechanisms
» BASE» availability and scaling
highest priorities
» weak consistency
» optimistic
» best effort
» simple and fast
spectrum
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Building for failure
» defensive programming
» creating replicas
» disk flushing
»watch out for failure of utility infrastructure
» conscious sync/async decisions
44
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Possible storage failures
45
» Application errors
» Repeatable DB failures
» Unrepeatable DB failures
» OS errors
» Local cluster HW failure
» Local cluster network partitioning
» Disaster
» WAN network failure between remote clusters Mic
hael
Sto
nebr
eake
r
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Availability ≠ total async !
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The Enterprise Service Bus
47
✘bus =
congestion
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Bus systems
48
» objects don’t fit in a pipe
» object ➙ message
» serialization / de-serialization cost
»message size
» queuing = cost
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Use a mixture of both
»async + sync
49
stuff which matters !
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Numbers of scale
50
http://qos.doubleclick.net/counters/
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Processing large datasets :
Map/Reduce
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Smart Data
» sparse as a feature
»weak schemas
» ad-hoc indexing
» organic analytics
» near-data processing
» live(ly) datawarehouse
» distribution ➙ parallellization ➙ performance
52
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Hadoop: HDFS + MapReduce» single filesystem + single execution-space
53
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
MapReduce example: WordCount
54
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
MapReduce
55
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
MapReduce and HDFS
56
© lars george
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Physical architecture
57
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Processing large datasets with MR
58
»Benefit from parallellisation
» Less modelling upfront (ad-hoc processing)
»Compartmentalized approach reduces operational risks
»AsterData et al. have SQL/MR hybrids for huge-scale BI
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Market overview
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Categories
» key-value stores
» column stores
» document stores
» graph databases
60
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Key-value stores
»Redis
»Voldemort
»Tokyo Cabinet
61
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Redis
»REmote DIctionary Server
» http://code.google.com/p/redis/
» vmware
62
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Redis Features» persisted memcache, ‘awesome’
» RAM-based + persistable
» key ➙ values: string, list, set
» higher-level ops
» i.e. push/pop and sort for lists
» fast (very)
» configurable durability
» client-managed sharding
63
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Voldemort
» http://project-voldemort.com/
64
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Voldemort
» persistent
» distributed
» fault-tolerant
» hash table
65
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Voldemort
66
API: GET, PUT,DELETE
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Voldemort
67
routing logic moving up the stack,smaller latency
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Voldemort data format
68
» key+values = arrays of bytes
» So how do we objects ⬌ bytes ?
» json
» string
» java-serialization
» protobuf
» identity
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Tokyo Cabinet
» http://1978th.net/tokyocabinet/
»mixi.jp (i.e. Facebook Japan)
69
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Product Family
70
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Tokyo Cabinet
71
»memory or filesystem
» hash, b-tree, fixed-length, table
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Column stores
»BigTable
»HBase
»Cassandra
72
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
BigTable
» http://labs.google.com/papers/bigtable.html
» layered on top of GFS
73
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
HBase
» http://hadoop.apache.org/hbase/
» StumbleUpon / Adobe / Cloudera
74
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
HBase
» sorted» distributed» column-oriented»multi-dimensional» highly-available» high-performance
» persisted» storage system
» adds random access reads and writes atop HDFS
75
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
HBase data model
76
»Distributed multi-dimensional sparse map
»Multi-dimensional keys:(table, row, family:column, timestamp) → value
»Keys are arbitrary strings
»Access to row data is atomic
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Storage architecture
77
© lars george
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Cassandra
» http://cassandra.apache.org/
»Rackspace / Facebook
78
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Cassandra
»Key-value store (with added structure)
»Reliability (identical nodes)
» Eventual consistent
»Distributed
»Tunable
» Partitioning
» Replication
79
CA
P
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Cassandra write pattern
80
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Cassandra applicability
81
FIT
» Scalable reliability (through identical nodes)» Linear scaling»Write throughput» Large Data Sets
NO FIT
» Flexible indexing»Only PK-based
querying»Big Binary Data» 1 Row must fit in
RAM entirely
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Document stores
»CouchDB
»MongoDB
»Riak
»MarkLogic
82
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
CouchDB
» http://couchdb.apache.org/
» couch.io
83
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
CouchDB
» fault-tolerant
» schema-free
» document-oriented
» accessible via a RESTful HTTP/JSON API
84
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
CouchDB documents
{ “_id”: ”BCCD12CBB”, “_rev”: ”AB764C”, “type”: ”person”, “name”: ”Darth Vader”, “age”: 63, “headware”: [“Helmet”, “Sombrero”], “dark_side”: true }
85
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
CouchDB REST API
»HTTP
» PUT /db/docid
»GET /db/docid
» POST /db/docid
»DELETE /db/docid
86
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
CouchDB Views»MapReduce-based
» Filter, Collate, Aggregate
» Javascript
87
function (Key, Values) { var sum = 0; for(var i in Values) sum += Values[i]; return sum; }
function (doc) { for(var i in doc.tags) emit(doc.tags[i], 1); }
map reduce
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
CouchDB
» be careful on semantics
» replication ≠ partioning/sharding !
» distributed database = distributable database
» sharded / distributed deploymentrequires proxy layer
88
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
MongoDB
» http://www.mongodb.org/
» 10gen
89
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
MongoDB
» cfr. CouchDB, really
» except for:
»C++
» performance focus
» runtime queries (mapreduce still available)
» native drivers (no REST/HTTP layering)
» no MVCC: update-in-place
» auto sharding (alpha)
90
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Riak
» http://riak.basho.com/
»Basho Technologies
91
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Riak
» buckets/keys, links
» values/content = bucket + metadata
» pluggable storage engines (fs, (D)ETS, InnoDB)
»HTTP/REST API
» automatic distribution
»mapreduce using Javascript
92
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Jackrabbit
» http://jackrabbit.apache.org/
»Day Software
93
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Jackrabbit
» reference implementation for JSR 170 & 283
» remoting: WebDAV & RMI
» persistence: RDBMS, fs, memory
94
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Jackrabbit
» Java-centric (duh)
» complex repository model (nodes+properties)
»mixins, inheritance
»workspaces
» query language
» no partioning/sharding
95
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
JCR API levels
96
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Graph databases
»Neo4j
»AllegroGraph (RDF)
97
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Neo4j
» http://neo4j.org/
»Neo Technology
98
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Neo4j» data = nodes + relationships + key/value properties
99
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Neo4j
»many language bindings, little remoting
» ‘whiteboard’ friendly
» scaling to complexity (rather than volume?)
» lots of focus on domain modelling
» SPARQL/SAIL impl for triple geeks
»mostly RAM centric (with disk swapping & persistence)
100
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Experiences & (h)in(d)sights
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
NOSQL applicability
»Horizontal scaling
»Multi-Master
»Data representation
» search of simplicity
» data that doesn’t fit the E-R model(graphs, trees, versions)
» Speed
102
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Tools for the trade
» non-relational data: Couch, Mongo, Riak
»massive quantities: Cassandra, HBase
» persistent caching: Redis, Voldemort
» graphs: neo4j
103
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Tool selection
» be careful on the marketeese:smoke and mirrors beware!
»monitor dev list, IRC, Twitter, blogs
»monitor project ‘sponsors’
»mix-and-match
»DON’T NOSQL WITHOUT INTERNAL SYS ARCHS & DEV(OP)S !
104
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
aptness
complexity
inte
rnet
ente
rpri
seco
rpor
ate
com
mun
ity
NOSQL}S
QL}
105
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Our NOSQL-based project: Lily
» (open source)
» scalable store (Apache HBase)
» and search (Apache SOLR)
» content repository
»α due mid 2010
»www.lilycms.org or @outerthought
106
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Lily architecture
107
Lily client
client
client
Lily store node
store node
store node
distributed process coordination
and configuration (ZooKeeper)
query indexerupdate
WAL M/RMQ
documents2ary
indexes
WAL /
MQ
}
}
}
Lily Store Server
HBase Region Server
Hadoop DFS
index replica
replica replica
} SOLR
inverted index
REST
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
When combining store and search, make sure your (search) index doesn’t become the store.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Key lessons learned
109
» importance of keyspace design
» secondary indexing
» data de-normalization
» schema vs. code flexibility?
» distribution is everywhereand you shouldn’t forget about it
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Reading material
»Amazon Dynamo, Google BigTable, CAP
» http://nosql.mypopescu.com/
» http://nosql-database.org/
» http://twitter.com/nosqlupdate
» http://highscalability.com/
110
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Questions?
111
http://www.flickr.com/photos/leehaywood/4237636853/
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 112
» @stevenn
Thanks for your attention !