112
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org N-O-SQL new database technologies on the rise http://www.flickr.com/photos/wolfgangstaudt/2215246206/

N-O-SQL, new database technologies on the rise

  • Upload
    ngdata

  • View
    5.270

  • Download
    2

Embed Size (px)

DESCRIPTION

An introductory presentation on NOSQL technology for SAI (2010-04-20)

Citation preview

Page 1: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

N-O-SQLnew database technologies on the rise

http://www.flickr.com/photos/wolfgangstaudt/2215246206/

Page 2: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Who am I

» Steven Noels - [email protected]

»Outerthought : scalable content applications

»makers of Daisy and Lily open source CMS

2

Page 3: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Agenda

3

» raison d’être: what brought us here

» concepts: required theory readings

»market overview: trees & the forest

» experiences and (h)in(d)sights

Page 4: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Raison d’être

Page 5: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

History

5

hierarchical databases

IMS

OODBMS

XMLDB RDBMS

1. standardization

2. simplification

Page 6: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Inconsistency through slave lag

6

John

Qui

nn (

Dig

g)

Page 7: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling writes (1)

7

John

Qui

nn (

Dig

g)

Page 8: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling writes (2)

8

John

Qui

nn (

Dig

g)

Page 9: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Issues with partitioning

» lose the ability to make arbitrary queries

» have to predict data access patterns when formulating partitioning strategy

» complex and fragile systems

9

Page 10: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Replication complexity

10

Page 11: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling relational systems

11

»When scaling relational systems you loose their advantages but retain their overhead

Page 12: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

History

12

RDBMS NOSQL

cachingdenormalisationshardingreplication ...

3. scaling

4. rethinkingthe problem

Page 13: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Moore vs Kryder

» seek time isconstant (networklatency as well?)

» transfer rate ! spindles !

» as a principle, writes arehard to scale

13

Page 14: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cambrian Explosion

14

Page 15: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15

?Buzz-oriented development

Page 16: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cambrian Explosion

16

N-O-SQL

Page 17: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17

Page 18: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18

The Perspective of Cost

Page 19: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Common themes

19

» SCALE SCALE SCALE

» new datamodels

» devops

»N-O-SQL

»The Cloud :technology is of no interest anymore

Page 20: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Numbers of scale

20

http://qos.doubleclick.net/counters/

Page 21: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Types of scaling

21

» scaling for usage» volume of users

» volume of data

availabilityreplication

» scaling types of ops» concurrent read

» concurrent write

partioningconsistency

distribution

Page 22: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Distributed systems are hard !

Page 23: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

8 fallacies of distributed computing» The network is reliable.

» Latency is zero.

» Bandwidth is infinite.

» The network is secure.

» Topology doesn't change.

» There is one administrator.

» Transport cost is zero.

» The network is homogeneous.

23

Pete

r D

euts

ch a

nd Ja

mes

Gos

ling

Page 24: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

New Data

» sparse structures

»weak schemas

» graphs

» semi-structured

» document-oriented

24

Page 25: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

N-O-SQL =not only SQL !

Page 26: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

The NOSQL footprint

26

AC

ID,

sim

ple

oper

atio

nal

const

rain

ts

free-structured or sparse data

SQL

NOSQL

referential integrity,typed data

high

ly scalable an

davailab

le (com

plex

ity)

HBase

Cassandra

CouchDB

MongoDB

neo4j

Page 27: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NOSQL, if you need ...

» horizontal scaling (out rather than up)

» unusually common data (aka free-structured)

» speed (especially for writes)

» the bleeding edge

27

Page 28: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

SQL/RDBMS, if you need ...

» SQL

»ACID

» normalisation

» a defined liability

28

Page 29: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Theory

Page 30: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Robust systems

30

Page 31: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Academic background

»Amazon Dynamo

»Google BigTable

» Eric Brewer CAP theorem

31

Page 32: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Amazon Dynamo

32

» coined the term ‘eventual consistency’

» consistent hashing

Page 33: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Consistent hashing

33

http://horicky.blogspot.com/2009/11/nosql-patterns.html

Page 34: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Consistent hashing

34

- node C+ node D

http://www.lexemetech.com/2007/11/consistent-hashing.html

Page 35: N-O-SQL, new database technologies on the rise

»multi-dimensional column-oriented database

» on top of GoogleFileSystem

» object versioning

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Google BigTable

35

Page 36: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CAP theorem

36

strong consistency

highavailability

partition-tolerance

Page 37: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CAP

»Strong Consistency: all clients see the same view, even in the presence of updates

»High Availability: all clients can find some replica of the data, even in the presence of failures

»Partition-tolerance: the system properties hold even when the system is partitioned

37

Page 38: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Consistency

38

»Where is my data I just updated?

» Ideal world :

The result of every write-operation is reflected by subsequent read-operations.

Page 39: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Consistency

39

Page 40: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Sunny-day scenario

40

Page 41: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Network partioning

41

Page 42: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Culture Clash

42

»Classic distributed systems: focus on ACID

» atomic

» consistent

» isolated

» durable

»Modern internet systems: focus on BASE

» basically available

» soft-state (or scalable)

» eventually consistent

Page 43: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Culture Clash

43

»ACID» highest priority: strong

consistency for transactions

» availability less important

» pessimistic

» rigorous analysis

» complex mechanisms

» BASE» availability and scaling

highest priorities

» weak consistency

» optimistic

» best effort

» simple and fast

spectrum

Page 44: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Building for failure

» defensive programming

» creating replicas

» disk flushing

»watch out for failure of utility infrastructure

» conscious sync/async decisions

44

Page 45: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Possible storage failures

45

» Application errors

» Repeatable DB failures

» Unrepeatable DB failures

» OS errors

» Local cluster HW failure

» Local cluster network partitioning

» Disaster

» WAN network failure between remote clusters Mic

hael

Sto

nebr

eake

r

Page 46: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Availability ≠ total async !

Page 47: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

The Enterprise Service Bus

47

✘bus =

congestion

Page 48: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Bus systems

48

» objects don’t fit in a pipe

» object ➙ message

» serialization / de-serialization cost

»message size

» queuing = cost

Page 49: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Use a mixture of both

»async + sync

49

stuff which matters !

Page 50: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Numbers of scale

50

http://qos.doubleclick.net/counters/

Page 51: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Processing large datasets :

Map/Reduce

Page 52: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Smart Data

» sparse as a feature

»weak schemas

» ad-hoc indexing

» organic analytics

» near-data processing

» live(ly) datawarehouse

» distribution ➙ parallellization ➙ performance

52

Page 53: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Hadoop: HDFS + MapReduce» single filesystem + single execution-space

53

Page 54: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MapReduce example: WordCount

54

Page 55: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MapReduce

55

Page 56: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MapReduce and HDFS

56

© lars george

Page 57: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Physical architecture

57

Page 58: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Processing large datasets with MR

58

»Benefit from parallellisation

» Less modelling upfront (ad-hoc processing)

»Compartmentalized approach reduces operational risks

»AsterData et al. have SQL/MR hybrids for huge-scale BI

Page 59: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Market overview

Page 60: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Categories

» key-value stores

» column stores

» document stores

» graph databases

60

Page 61: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Key-value stores

»Redis

»Voldemort

»Tokyo Cabinet

61

Page 62: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Redis

»REmote DIctionary Server

» http://code.google.com/p/redis/

» vmware

62

Page 63: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Redis Features» persisted memcache, ‘awesome’

» RAM-based + persistable

» key ➙ values: string, list, set

» higher-level ops

» i.e. push/pop and sort for lists

» fast (very)

» configurable durability

» client-managed sharding

63

Page 64: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

» http://project-voldemort.com/

» LinkedIn

64

Page 65: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

» persistent

» distributed

» fault-tolerant

» hash table

65

Page 66: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

66

API: GET, PUT,DELETE

Page 67: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

67

routing logic moving up the stack,smaller latency

Page 68: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort data format

68

» key+values = arrays of bytes

» So how do we objects ⬌ bytes ?

» json

» string

» java-serialization

» protobuf

» identity

Page 69: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Tokyo Cabinet

» http://1978th.net/tokyocabinet/

»mixi.jp (i.e. Facebook Japan)

69

Page 70: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Product Family

70

Page 71: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Tokyo Cabinet

71

»memory or filesystem

» hash, b-tree, fixed-length, table

Page 72: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Column stores

»BigTable

»HBase

»Cassandra

72

Page 73: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

BigTable

» http://labs.google.com/papers/bigtable.html

»Google

» layered on top of GFS

73

Page 74: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase

» http://hadoop.apache.org/hbase/

» StumbleUpon / Adobe / Cloudera

74

Page 75: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase

» sorted» distributed» column-oriented»multi-dimensional» highly-available» high-performance

» persisted» storage system

» adds random access reads and writes atop HDFS

75

Page 76: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase data model

76

»Distributed multi-dimensional sparse map

»Multi-dimensional keys:(table, row, family:column, timestamp) → value

»Keys are arbitrary strings

»Access to row data is atomic

Page 77: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Storage architecture

77

© lars george

Page 78: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra

» http://cassandra.apache.org/

»Rackspace / Facebook

78

Page 79: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra

»Key-value store (with added structure)

»Reliability (identical nodes)

» Eventual consistent

»Distributed

»Tunable

» Partitioning

» Replication

79

CA

P

Page 80: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra write pattern

80

Page 81: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra applicability

81

FIT

» Scalable reliability (through identical nodes)» Linear scaling»Write throughput» Large Data Sets

NO FIT

» Flexible indexing»Only PK-based

querying»Big Binary Data» 1 Row must fit in

RAM entirely

Page 82: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Document stores

»CouchDB

»MongoDB

»Riak

»MarkLogic

82

Page 83: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB

» http://couchdb.apache.org/

» couch.io

83

Page 84: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB

» fault-tolerant

» schema-free

» document-oriented

» accessible via a RESTful HTTP/JSON API

84

Page 85: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB documents

{ “_id”: ”BCCD12CBB”, “_rev”: ”AB764C”, “type”: ”person”, “name”: ”Darth Vader”, “age”: 63, “headware”: [“Helmet”, “Sombrero”], “dark_side”: true }

85

Page 86: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB REST API

»HTTP

» PUT /db/docid

»GET /db/docid

» POST /db/docid

»DELETE /db/docid

86

Page 87: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB Views»MapReduce-based

» Filter, Collate, Aggregate

» Javascript

87

function (Key, Values) { var sum = 0; for(var i in Values) sum += Values[i]; return sum; }

function (doc) { for(var i in doc.tags) emit(doc.tags[i], 1); }

map reduce

Page 88: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB

» be careful on semantics

» replication ≠ partioning/sharding !

» distributed database = distributable database

» sharded / distributed deploymentrequires proxy layer

88

Page 89: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MongoDB

» http://www.mongodb.org/

» 10gen

89

Page 90: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MongoDB

» cfr. CouchDB, really

» except for:

»C++

» performance focus

» runtime queries (mapreduce still available)

» native drivers (no REST/HTTP layering)

» no MVCC: update-in-place

» auto sharding (alpha)

90

Page 91: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Riak

» http://riak.basho.com/

»Basho Technologies

91

Page 92: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Riak

» buckets/keys, links

» values/content = bucket + metadata

» pluggable storage engines (fs, (D)ETS, InnoDB)

»HTTP/REST API

» automatic distribution

»mapreduce using Javascript

92

Page 93: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Jackrabbit

» http://jackrabbit.apache.org/

»Day Software

93

Page 94: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Jackrabbit

» reference implementation for JSR 170 & 283

» remoting: WebDAV & RMI

» persistence: RDBMS, fs, memory

94

Page 95: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Jackrabbit

» Java-centric (duh)

» complex repository model (nodes+properties)

»mixins, inheritance

»workspaces

» query language

» no partioning/sharding

95

Page 96: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

JCR API levels

96

Page 97: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Graph databases

»Neo4j

»AllegroGraph (RDF)

97

Page 98: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Neo4j

» http://neo4j.org/

»Neo Technology

98

Page 99: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Neo4j» data = nodes + relationships + key/value properties

99

Page 100: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Neo4j

»many language bindings, little remoting

» ‘whiteboard’ friendly

» scaling to complexity (rather than volume?)

» lots of focus on domain modelling

» SPARQL/SAIL impl for triple geeks

»mostly RAM centric (with disk swapping & persistence)

100

Page 101: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Experiences & (h)in(d)sights

Page 102: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NOSQL applicability

»Horizontal scaling

»Multi-Master

»Data representation

» search of simplicity

» data that doesn’t fit the E-R model(graphs, trees, versions)

» Speed

102

Page 103: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Tools for the trade

» non-relational data: Couch, Mongo, Riak

»massive quantities: Cassandra, HBase

» persistent caching: Redis, Voldemort

» graphs: neo4j

103

Page 104: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Tool selection

» be careful on the marketeese:smoke and mirrors beware!

»monitor dev list, IRC, Twitter, blogs

»monitor project ‘sponsors’

»mix-and-match

»DON’T NOSQL WITHOUT INTERNAL SYS ARCHS & DEV(OP)S !

104

Page 105: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

aptness

complexity

inte

rnet

ente

rpri

seco

rpor

ate

com

mun

ity

NOSQL}S

QL}

105

Page 106: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Our NOSQL-based project: Lily

» (open source)

» scalable store (Apache HBase)

» and search (Apache SOLR)

» content repository

»α due mid 2010

»www.lilycms.org or @outerthought

106

Page 107: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Lily architecture

107

Lily client

client

client

Lily store node

store node

store node

distributed process coordination

and configuration (ZooKeeper)

query indexerupdate

WAL M/RMQ

documents2ary

indexes

WAL /

MQ

}

}

}

Lily Store Server

HBase Region Server

Hadoop DFS

index replica

replica replica

} SOLR

inverted index

REST

Page 108: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

When combining store and search, make sure your (search) index doesn’t become the store.

Page 109: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Key lessons learned

109

» importance of keyspace design

» secondary indexing

» data de-normalization

» schema vs. code flexibility?

» distribution is everywhereand you shouldn’t forget about it

Page 110: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Reading material

»Amazon Dynamo, Google BigTable, CAP

» http://nosql.mypopescu.com/

» http://nosql-database.org/

» http://twitter.com/nosqlupdate

» http://highscalability.com/

110

Page 111: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Questions?

111

http://www.flickr.com/photos/leehaywood/4237636853/

Page 112: N-O-SQL, new database technologies on the rise

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 112

» [email protected]

» @stevenn

Thanks for your attention !