49
NoSQL for SQL Professionals Dipti Borkar Director, Product Management

Characteristics of no sql databases

Embed Size (px)

Citation preview

Page 1: Characteristics of no sql databases

NoSQL for SQL Professionals

Dipti Borkar Director, Product Management

Page 2: Characteristics of no sql databases

Link to Slides

http://bit.ly/17pgrcP

Page 3: Characteristics of no sql databases

NoSQL+ +

More Data More Users Interactive Apps

Macro Trends Driving NoSQL Technology

Page 4: Characteristics of no sql databases

Lacking Solutions, Users Forced to Invent

DynamoOctober 2007

CassandraAugust 2008

VoldemortFebruary 2009November 2006

Bigtable

Very few organizations can build and maintain database software technology.But every organization building interactive web applications needs this technology.

Page 5: Characteristics of no sql databases

What Is Biggest Data Management Problem Driving Use of NoSQL in Coming Year?

Lack of flexibility/rigid schemas

Inability to scale out data

Performance challenges

Cost All of these Other

49%

35%

29%

16% 12% 11%

Source: Couchbase Survey, December 2011, n = 1351.

Page 6: Characteristics of no sql databases

Relational vs. NoSQL

Page 7: Characteristics of no sql databases

Key Differences

Page 8: Characteristics of no sql databases

RDBMS Scales UpGet a bigger, more complex server

Users

Application Scales OutJust add more commodity web servers

Users

System CostApplication Performance

Relational Technology Scales Up

Relational Database

Web/App Server Tier

Expensive and disruptive sharding, doesn’t perform at web scale

System CostApplication Performance

Won’t scale beyond this point

Page 9: Characteristics of no sql databases

Couchbase Server Scales Out Like App Tier

NoSQL Database Scales OutCost and performance mirrors app tier

Users

Scaling out flattens the cost and performance curves

Couchbase Distributed Data Store

Application Scales OutJust add more commodity web servers

Users

System CostApplication Performance

Application Performance System Cost

Web/App Server Tier

Page 10: Characteristics of no sql databases
Page 11: Characteristics of no sql databases

Differences

• 1. Tables vs Document­ Relational has tables with predefined columns: Schema pre-determined before

data can be inserted.­ Best practice is to normalize by splitting into several tables, joined by PK-FK

relation.

Page 12: Characteristics of no sql databases

Differences

• Tables vs Document (contd.)­ In Couchbase, there are no tables only documents­ A logical entity is stored within a single document ­ Different documents do not need to have the same set of fields or structure­ You differentiate different types of documents either based on key names you

provide or by adding attributes

Page 13: Characteristics of no sql databases

Relational vs Document Data Model

Relational data model Document data modelCollection of complex documents with

arbitrary, nested data formats andvarying “record” format.

Highly-structured table organization with rigidly-defined data formats and

record structure.

C1 C2 C3 C4

JSONJSON

JSON

{

}

Page 14: Characteristics of no sql databases

Differences

• Joins vs logical single document­ Single logical document. No need for joins.­ If normalized and several documents, then use a series of gets

• Transactions­ Relational: Atomicity can span several records across several tables.­ NoSQL: Atomicity confined to at document level

recipe= couchbase.get("my-recipe-id"); reviews = couchbase.multiget(recipe.comments);

Page 15: Characteristics of no sql databases

Key Couchbase Concepts

Couchbase Cluster

Multitenant Architecture

Server Nodes

User/application data

based on bucket partitioning

Which live on

Data Buckets

DocumentsRead/write from/to

That form a

Clients

Servers

dynamically scalable

Page 16: Characteristics of no sql databases

RDBMS Example: User Profile

Address Info

1 DEN 30303CO

2 MV 94040CA

3 CHI 60609IL

User Info

KEY First ZIP_idLast

4 NY 10010NY

1 Dipti 2Borkar

2 Joe 2Smith

3 Ali 2Dodson

4 John 3Doe

ZIP_id CITY ZIPSTATE

1 2

2 MV 94040CA

To get information about specific user, you perform a join across two tables

Page 17: Characteristics of no sql databases

Document Example: User Profile

All data in a single document

{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }

JSON

= +

Page 18: Characteristics of no sql databases

Making a Change Using RDBMSUser ID First Last Zip

1 Dipti Borkar 94040

2 Joe Smith 94040

3 Ali Dodson 94040

4 Sarah Gorin NW1

5 Bob Young 30303

6 Nancy Baker 10010

7 Ray Jones 31311

8 Lee Chen V5V3M

• • •

50000 Doug Moore 04252

50001 Mary White SW195

50002 Lisa Clark 12425

Country ID

TEL3

001

Country ID Country name

001 USA

002 UK

003 Argentina

004 Australia

005 Aruba

006 Austria

007 Brazil

008 Canada

009 Chile

• • •

130 Portugal

131 Romania

132 Russia

133 Spain

134 Sweden

User ID Photo ID Comment

2 d043 NYC

2 b054 Bday

5 c036 Miami

7 d072 Sunset

5002 e086 Spain

Photo Table

001

007

001

133

133

User ID Status ID Text

1 a42 At conf

4 b26 excited

5 c32 hockey

12 d83 Go A’s

5000 e34 sailing

Status Table

134

007

008

001

005

Country Table

User ID Affl ID Affl Name

2 a42 Cal

4 b96 USC

7 c14 UW

8 e22 Oxford

Affiliations TableCountry

ID

001

001

001

002

Country ID

Country ID

001

001

002

001

001

001

008

001

002

001

User Table

...

Page 19: Characteristics of no sql databases

Making the Same Change With a Document DB

{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: { “TEXT”: “At Conf” }

}

“GEO_LOC”: “134” },“COUNTRY”: ”USA”

Just add information to a document

JSON

,}

Page 20: Characteristics of no sql databases

User ID First Last Zip

1 Frank Wiegel 94040

2 Joe Smith 94040

3 Ali Dodson 94040

4 Sarah Gorin NW1

5 Bob Young 30303

6 Nancy Baker 10010

7 Ray Jones 31311

8 Lee Chen V5V3

• • •

5000 Doug Moore 04252

5001 Mary White 41694

5002 Lisa Clark 12425

User ID

PhotoID Comment

2 d043 NYC

2 b054 Bday

5 c036 Miami

7 d072 Sunset

5002 e086 Spain

User Table Photo Table

User ID

Status ID Text

1 a42 At conf

4 b26 excited

5 c32 hockey

12 d83 Go A’s

5000 e34 sailing

Status Table

User ID

AffiliationsID

AffiliationsName

2 a42 Cal

4 b96 USC

7 c14 UW

8 e22 Oxford

Affiliations Table

Relational vs Document Performance

1 Frank 94040Weigel

a421 At conf

5 Bob 30303Young

c0365 Miami

4 Sarah NW1Gorin

b264 hockey

JSON

{

}

JSON

{

}

JSON

{

}JSON

{

}

JSON

{

}JSON

{

}

JSON

{

}JSON

{

}

JSON

{

}JSON

{

}8 Lee V5V3Chen

e228 Oxford5002 Lisa 12425Clark

e0865002 Spain

c0325 excited

Faster response times and higher throughput

Page 21: Characteristics of no sql databases

Document Databases Easily Accommodate Unstructured Data

{ “ID”: 1, “NAME”: “Fairmont San Francisco”, “DESCRIPTION”: “Historic grandeur…”, “AVG_REVIEWER_SCORE”: “4.3”, “AMENITY”: {“TYPE”: “gym”, DESCRIPTION: “fitness center” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, “RATE_TYPE”: “nightly”, “PRICE”: “$199”, “REVIEWS”: [“review_1”, “review_2”], “ATTRACTIONS”: “Chinatown”, }

JSON

{ “ID”: 2, “NAME”: “W San Francisco”, “DESCRIPTION”: “Chic, hip accommodations..”, “AVG_REVIEWER_SCORE”: “4.0”, “AMENITY”: {“TYPE”: “spa”, DESCRIPTION: “Bliss Spa” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, {“TYPE”: “dining”, “DESCRIPTION”: “bar/lounge”}, “RATE_TYPE”: “nightly”, “PRICE”: “$194”, “REVIEWS”: [“review_1”, “review_2”],} JSON

Hotels

Page 22: Characteristics of no sql databases

Document Databases Easily Accommodate Unstructured Data

{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON

{ “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel & Location”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “5”, “REVIEW_DATE”: “May 29, 2013”, “USER_PROFILE_ID”: “271”,

}

JSON

{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but a few kinks”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “4”, “REVIEW_DATE”: “May 22, 2013”, “USER_PROFILE_ID”: “923”,

}

JSON

Hotels

Reviews

Page 23: Characteristics of no sql databases

Document Databases Easily Accommodate Unstructured Data

{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON

Hotel Descriptions

Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…} JSON

{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}

JSON

User Profiles { “USER_ID”: 1, “DISPLAY_NAME ”: “Ted’s Trip Experience”, “CITY”: “Saratoga”, “STATE”: “California”,“NUM_OF_REVIEWS”: “8”, }

JSON

{ “USER_ID”: 1, “DISPLAY_NAME ”: “WhatWhat567”, “CITY”: “Kansas City”, “STATE”: “MO”,“NUM_OF_REVIEWS”: “3”, } JSON

Page 24: Characteristics of no sql databases

Document Databases Easily Accommodate Unstructured Data

{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON

Hotel Descriptions

Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…} JSON

{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}

JSON

User Profiles { “USER_ID”: 1, “DISPLAY”: “Ted’s Trip…”,…}

JSON

{ “USER_ID”: 2, “DISPLAY”: “WhatWhat …”,…}

JSON

Document IDs associates related objects

Hotels points to reviews

Reviews points to users

Page 25: Characteristics of no sql databases

Indexing with Document DatabasesIndex on AVG_REVIEWER_SCORE

Page 26: Characteristics of no sql databases

Indexing with Document DatabasesIndex on AVG_REVIEWER_SCORE

…4.0, doc_id4.0, doc_id4.1, doc_id4.3, doc_id5.0, doc_id…

Index

Page 27: Characteristics of no sql databases

Querying with Document DatabasesQuery on AVG_REVIEWER_SCORE

…3.4, doc_id3.4, doc_id3.5, doc_id3.6, doc_id3.7, doc_id3.8, doc_id4.0, doc_id4.1, doc_id4.3, doc_id4.5, doc_id4.7, doc_id4.9, doc_id5.0, doc_id…5.0, doc_id

Index Matching ResultsQuery

Page 28: Characteristics of no sql databases

Flavors of NoSQL

Page 29: Characteristics of no sql databases

Key-Value

memcached

membase

redis

Data Structure Document Column Graph

mongoDB

couchbase cassandra

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Neo4j

NoSQL catalog

Page 30: Characteristics of no sql databases

The Key-Value Store – the foundation of NoSQL

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Page 31: Characteristics of no sql databases

Memcached – the NoSQL precursor

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

memcached

In-memory onlyLimited set of operationsBlob Storage: Set, Add, Replace, CASRetrieval: GetStructured Data: Append, Increment

“Simple and fast.”

Challenges: cold cache, disruptive elasticity

Page 32: Characteristics of no sql databases

Couchbase – document-oriented database

Key

{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}

Auto-shardingDisk-based with built-in memcached cacheCache refill on restartMemcached compatible (drop in replace)Highly-available (data replication)Add or remove capacity to live cluster

When values are JSON objects (“documents”):Create indices, views and query against the views

JSONOBJECT

(“DOCUMENT”)

Couchbase

Page 33: Characteristics of no sql databases

NoSQL catalog

Key-Value

memcached

membase

redis

Data Structure Document Column Graph

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

membase couchbase

Page 34: Characteristics of no sql databases

MongoDB – Document-oriented database

Key

{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}

Disk-based with in-memory “caching”BSON (“binary JSON”) format and wire protocolMaster-slave replicationAuto-shardingValues are BSON objectsSupports ad hoc queries – best when indexed

BSONOBJECT

(“DOCUMENT”)

MongoDB

Page 35: Characteristics of no sql databases

MongoDB Architecture

Page 36: Characteristics of no sql databases

NoSQL catalog

Key-Value

memcached

membase

redis

Data Structure Document Column Graph

mongoDB

couchbase

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Page 37: Characteristics of no sql databases

Cassandra – Column overlays

Disk-based systemClustered External caching required for low-latency reads“Columns” are overlaid on the dataNot all rows must have all columnsSupports efficient queries on columnsRestart required when adding columnsGood cross-datacenter support

CassandraKey

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Column 1

Column 2

Column 3 (not present)

Page 38: Characteristics of no sql databases

Cassandra Architecture

Page 39: Characteristics of no sql databases

NoSQL catalog

Key-Value

memcached

membase

redis

Data Structure Document Column Graph

mongoDB

couchbase cassandra

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Page 40: Characteristics of no sql databases

Neo4j – Graph database

Disk-based systemExternal caching required for low-latency readsNodes, relationships and pathsProperties on nodesDelete, Insert, Traverse, etc.

Neo4j

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Page 41: Characteristics of no sql databases

NoSQL catalog

Key-Value

memcached

membase

redis

Data Structure Document Column Graph

mongoDB

couchbase cassandra

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Neo4j

Page 42: Characteristics of no sql databases

Where is NoSQL a good fit?

Page 43: Characteristics of no sql databases

Market AdoptionInternet Companies Enterprises

• Communications

• Retail

• Financial Services

• Health Care

• Automotive/Airline

• Agriculture

• Consumer Electronics

• Business Systems

• Social Gaming

• Ad Networks

• Social Networks

• Online Business Services

• E-Commerce

• Online Media

• Content Management

• Cloud Services

Page 44: Characteristics of no sql databases

Market Adoption – CustomersInternet Companies Enterprises

More than 300 customers -- 5,000 production deployments worldwide

Page 45: Characteristics of no sql databases

Application Characteristics - Data driven

• 3rd party or user defined structure (Twitter feeds)

• Support for unlimited data growth (Viral apps)

• Data with non-homogenous structure

• Need to quickly and often change data structure

• Variable length documents

• Sparse data records

• Hierarchical data

Couchbase is a good fit

Page 46: Characteristics of no sql databases

Application Characteristics - Performance driven

• Low latency critical (ex. 1millisecond)

• High throughput (ex. 200000 ops / sec)

• Large number of users

• Unknown demand with sudden growth of users/data

• Predominantly direct document access

• Read / Mixed / Write heavy workloads

Couchbase is a good fit

Page 47: Characteristics of no sql databases

Common Use CasesSocial Gaming

• Couchbase stores player and game data

• Examples customers include: Zynga

• Tapjoy, Ubisoft, Tencent

Mobile Apps• Couchbase stores user

info and app content

• Examples customers include: Kobo, Playtika

Ad Targeting• Couchbase stores

user information for fast access

• Examples customers include: AOL, Mediamind, Convertro

Session store• Couchbase Server as a key-

value store

• Examples customers include: Concur, Sabre

User Profile Store• Couchbase Server as a

key-value store

• Examples customers include: Tunewiki

High availability cache• Couchbase Server used as a cache tier replacement

• Examples customers include: Orbitz

Content & Metadata Store

• Couchbase document store with Elasticsearch

• Examples customers include: McGraw Hill, Tunewiki

3rd party data aggregation • Couchbase stores social media and

data feeds• Examples customers include:

Sambacloud

Page 48: Characteristics of no sql databases

Q & A

Page 49: Characteristics of no sql databases

Thank you

[email protected]@dborkar