N-O-SQL, new database technologies on the rise

Preview:

DESCRIPTION

An introductory presentation on NOSQL technology for SAI (2010-04-20)

Citation preview

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

N-O-SQLnew database technologies on the rise

http://www.flickr.com/photos/wolfgangstaudt/2215246206/

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Who am I

» Steven Noels - stevenn@outerthought.org

»Outerthought : scalable content applications

»makers of Daisy and Lily open source CMS

2

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Agenda

3

» raison d’être: what brought us here

» concepts: required theory readings

»market overview: trees & the forest

» experiences and (h)in(d)sights

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Raison d’être

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

History

5

hierarchical databases

IMS

OODBMS

XMLDB RDBMS

1. standardization

2. simplification

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Inconsistency through slave lag

6

John

Qui

nn (

Dig

g)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling writes (1)

7

John

Qui

nn (

Dig

g)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling writes (2)

8

John

Qui

nn (

Dig

g)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Issues with partitioning

» lose the ability to make arbitrary queries

» have to predict data access patterns when formulating partitioning strategy

» complex and fragile systems

9

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Replication complexity

10

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling relational systems

11

»When scaling relational systems you loose their advantages but retain their overhead

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

History

12

RDBMS NOSQL

cachingdenormalisationshardingreplication ...

3. scaling

4. rethinkingthe problem

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Moore vs Kryder

» seek time isconstant (networklatency as well?)

» transfer rate ! spindles !

» as a principle, writes arehard to scale

13

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cambrian Explosion

14

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15

?Buzz-oriented development

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cambrian Explosion

16

N-O-SQL

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18

The Perspective of Cost

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Common themes

19

» SCALE SCALE SCALE

» new datamodels

» devops

»N-O-SQL

»The Cloud :technology is of no interest anymore

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Numbers of scale

20

http://qos.doubleclick.net/counters/

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Types of scaling

21

» scaling for usage» volume of users

» volume of data

availabilityreplication

» scaling types of ops» concurrent read

» concurrent write

partioningconsistency

distribution

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Distributed systems are hard !

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

8 fallacies of distributed computing» The network is reliable.

» Latency is zero.

» Bandwidth is infinite.

» The network is secure.

» Topology doesn't change.

» There is one administrator.

» Transport cost is zero.

» The network is homogeneous.

23

Pete

r D

euts

ch a

nd Ja

mes

Gos

ling

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

New Data

» sparse structures

»weak schemas

» graphs

» semi-structured

» document-oriented

24

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

N-O-SQL =not only SQL !

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

The NOSQL footprint

26

AC

ID,

sim

ple

oper

atio

nal

const

rain

ts

free-structured or sparse data

SQL

NOSQL

referential integrity,typed data

high

ly scalable an

davailab

le (com

plex

ity)

HBase

Cassandra

CouchDB

MongoDB

neo4j

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NOSQL, if you need ...

» horizontal scaling (out rather than up)

» unusually common data (aka free-structured)

» speed (especially for writes)

» the bleeding edge

27

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

SQL/RDBMS, if you need ...

» SQL

»ACID

» normalisation

» a defined liability

28

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Theory

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Robust systems

30

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Academic background

»Amazon Dynamo

»Google BigTable

» Eric Brewer CAP theorem

31

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Amazon Dynamo

32

» coined the term ‘eventual consistency’

» consistent hashing

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Consistent hashing

33

http://horicky.blogspot.com/2009/11/nosql-patterns.html

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Consistent hashing

34

- node C+ node D

http://www.lexemetech.com/2007/11/consistent-hashing.html

»multi-dimensional column-oriented database

» on top of GoogleFileSystem

» object versioning

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Google BigTable

35

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CAP theorem

36

strong consistency

highavailability

partition-tolerance

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CAP

»Strong Consistency: all clients see the same view, even in the presence of updates

»High Availability: all clients can find some replica of the data, even in the presence of failures

»Partition-tolerance: the system properties hold even when the system is partitioned

37

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Consistency

38

»Where is my data I just updated?

» Ideal world :

The result of every write-operation is reflected by subsequent read-operations.

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Consistency

39

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Sunny-day scenario

40

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Network partioning

41

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Culture Clash

42

»Classic distributed systems: focus on ACID

» atomic

» consistent

» isolated

» durable

»Modern internet systems: focus on BASE

» basically available

» soft-state (or scalable)

» eventually consistent

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Culture Clash

43

»ACID» highest priority: strong

consistency for transactions

» availability less important

» pessimistic

» rigorous analysis

» complex mechanisms

» BASE» availability and scaling

highest priorities

» weak consistency

» optimistic

» best effort

» simple and fast

spectrum

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Building for failure

» defensive programming

» creating replicas

» disk flushing

»watch out for failure of utility infrastructure

» conscious sync/async decisions

44

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Possible storage failures

45

» Application errors

» Repeatable DB failures

» Unrepeatable DB failures

» OS errors

» Local cluster HW failure

» Local cluster network partitioning

» Disaster

» WAN network failure between remote clusters Mic

hael

Sto

nebr

eake

r

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Availability ≠ total async !

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

The Enterprise Service Bus

47

✘bus =

congestion

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Bus systems

48

» objects don’t fit in a pipe

» object ➙ message

» serialization / de-serialization cost

»message size

» queuing = cost

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Use a mixture of both

»async + sync

49

stuff which matters !

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Numbers of scale

50

http://qos.doubleclick.net/counters/

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Processing large datasets :

Map/Reduce

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Smart Data

» sparse as a feature

»weak schemas

» ad-hoc indexing

» organic analytics

» near-data processing

» live(ly) datawarehouse

» distribution ➙ parallellization ➙ performance

52

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Hadoop: HDFS + MapReduce» single filesystem + single execution-space

53

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MapReduce example: WordCount

54

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MapReduce

55

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MapReduce and HDFS

56

© lars george

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Physical architecture

57

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Processing large datasets with MR

58

»Benefit from parallellisation

» Less modelling upfront (ad-hoc processing)

»Compartmentalized approach reduces operational risks

»AsterData et al. have SQL/MR hybrids for huge-scale BI

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Market overview

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Categories

» key-value stores

» column stores

» document stores

» graph databases

60

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Key-value stores

»Redis

»Voldemort

»Tokyo Cabinet

61

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Redis

»REmote DIctionary Server

» http://code.google.com/p/redis/

» vmware

62

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Redis Features» persisted memcache, ‘awesome’

» RAM-based + persistable

» key ➙ values: string, list, set

» higher-level ops

» i.e. push/pop and sort for lists

» fast (very)

» configurable durability

» client-managed sharding

63

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

» http://project-voldemort.com/

» LinkedIn

64

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

» persistent

» distributed

» fault-tolerant

» hash table

65

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

66

API: GET, PUT,DELETE

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

67

routing logic moving up the stack,smaller latency

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort data format

68

» key+values = arrays of bytes

» So how do we objects ⬌ bytes ?

» json

» string

» java-serialization

» protobuf

» identity

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Tokyo Cabinet

» http://1978th.net/tokyocabinet/

»mixi.jp (i.e. Facebook Japan)

69

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Product Family

70

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Tokyo Cabinet

71

»memory or filesystem

» hash, b-tree, fixed-length, table

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Column stores

»BigTable

»HBase

»Cassandra

72

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

BigTable

» http://labs.google.com/papers/bigtable.html

»Google

» layered on top of GFS

73

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase

» http://hadoop.apache.org/hbase/

» StumbleUpon / Adobe / Cloudera

74

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase

» sorted» distributed» column-oriented»multi-dimensional» highly-available» high-performance

» persisted» storage system

» adds random access reads and writes atop HDFS

75

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase data model

76

»Distributed multi-dimensional sparse map

»Multi-dimensional keys:(table, row, family:column, timestamp) → value

»Keys are arbitrary strings

»Access to row data is atomic

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Storage architecture

77

© lars george

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra

» http://cassandra.apache.org/

»Rackspace / Facebook

78

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra

»Key-value store (with added structure)

»Reliability (identical nodes)

» Eventual consistent

»Distributed

»Tunable

» Partitioning

» Replication

79

CA

P

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra write pattern

80

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra applicability

81

FIT

» Scalable reliability (through identical nodes)» Linear scaling»Write throughput» Large Data Sets

NO FIT

» Flexible indexing»Only PK-based

querying»Big Binary Data» 1 Row must fit in

RAM entirely

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Document stores

»CouchDB

»MongoDB

»Riak

»MarkLogic

82

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB

» http://couchdb.apache.org/

» couch.io

83

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB

» fault-tolerant

» schema-free

» document-oriented

» accessible via a RESTful HTTP/JSON API

84

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB documents

{ “_id”: ”BCCD12CBB”, “_rev”: ”AB764C”, “type”: ”person”, “name”: ”Darth Vader”, “age”: 63, “headware”: [“Helmet”, “Sombrero”], “dark_side”: true }

85

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB REST API

»HTTP

» PUT /db/docid

»GET /db/docid

» POST /db/docid

»DELETE /db/docid

86

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB Views»MapReduce-based

» Filter, Collate, Aggregate

» Javascript

87

function (Key, Values) { var sum = 0; for(var i in Values) sum += Values[i]; return sum; }

function (doc) { for(var i in doc.tags) emit(doc.tags[i], 1); }

map reduce

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB

» be careful on semantics

» replication ≠ partioning/sharding !

» distributed database = distributable database

» sharded / distributed deploymentrequires proxy layer

88

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MongoDB

» http://www.mongodb.org/

» 10gen

89

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MongoDB

» cfr. CouchDB, really

» except for:

»C++

» performance focus

» runtime queries (mapreduce still available)

» native drivers (no REST/HTTP layering)

» no MVCC: update-in-place

» auto sharding (alpha)

90

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Riak

» http://riak.basho.com/

»Basho Technologies

91

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Riak

» buckets/keys, links

» values/content = bucket + metadata

» pluggable storage engines (fs, (D)ETS, InnoDB)

»HTTP/REST API

» automatic distribution

»mapreduce using Javascript

92

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Jackrabbit

» http://jackrabbit.apache.org/

»Day Software

93

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Jackrabbit

» reference implementation for JSR 170 & 283

» remoting: WebDAV & RMI

» persistence: RDBMS, fs, memory

94

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Jackrabbit

» Java-centric (duh)

» complex repository model (nodes+properties)

»mixins, inheritance

»workspaces

» query language

» no partioning/sharding

95

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

JCR API levels

96

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Graph databases

»Neo4j

»AllegroGraph (RDF)

97

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Neo4j

» http://neo4j.org/

»Neo Technology

98

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Neo4j» data = nodes + relationships + key/value properties

99

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Neo4j

»many language bindings, little remoting

» ‘whiteboard’ friendly

» scaling to complexity (rather than volume?)

» lots of focus on domain modelling

» SPARQL/SAIL impl for triple geeks

»mostly RAM centric (with disk swapping & persistence)

100

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Experiences & (h)in(d)sights

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NOSQL applicability

»Horizontal scaling

»Multi-Master

»Data representation

» search of simplicity

» data that doesn’t fit the E-R model(graphs, trees, versions)

» Speed

102

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Tools for the trade

» non-relational data: Couch, Mongo, Riak

»massive quantities: Cassandra, HBase

» persistent caching: Redis, Voldemort

» graphs: neo4j

103

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Tool selection

» be careful on the marketeese:smoke and mirrors beware!

»monitor dev list, IRC, Twitter, blogs

»monitor project ‘sponsors’

»mix-and-match

»DON’T NOSQL WITHOUT INTERNAL SYS ARCHS & DEV(OP)S !

104

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

aptness

complexity

inte

rnet

ente

rpri

seco

rpor

ate

com

mun

ity

NOSQL}S

QL}

105

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Our NOSQL-based project: Lily

» (open source)

» scalable store (Apache HBase)

» and search (Apache SOLR)

» content repository

»α due mid 2010

»www.lilycms.org or @outerthought

106

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Lily architecture

107

Lily client

client

client

Lily store node

store node

store node

distributed process coordination

and configuration (ZooKeeper)

query indexerupdate

WAL M/RMQ

documents2ary

indexes

WAL /

MQ

}

}

}

Lily Store Server

HBase Region Server

Hadoop DFS

index replica

replica replica

} SOLR

inverted index

REST

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

When combining store and search, make sure your (search) index doesn’t become the store.

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Key lessons learned

109

» importance of keyspace design

» secondary indexing

» data de-normalization

» schema vs. code flexibility?

» distribution is everywhereand you shouldn’t forget about it

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Reading material

»Amazon Dynamo, Google BigTable, CAP

» http://nosql.mypopescu.com/

» http://nosql-database.org/

» http://twitter.com/nosqlupdate

» http://highscalability.com/

110

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Questions?

111

http://www.flickr.com/photos/leehaywood/4237636853/

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 112

» stevenn@outerthought.org

» @stevenn

Thanks for your attention !