Cassandra for Barcodes, Products and Scans:
The Backend Infrastructure at Scandit
@scanditwww.scandit.com February 1, 2012
Christof RodunerCo-founder and [email protected] Link: NoSQL concept and data model
2
AGENDA
About Scandit
Requirements
Apache Cassandra
Scandit backend
3
WHAT IS SCANDIT?
Scandit provides developers best-in-class tools to build, analyze and monetize product-centric apps.
ANALYZEUser Interest
MONETIZE Apps
IDENTIFY Products
4
ANALYZE:THE SCANALYTICS PLATFORM
Tool for app publishers App-specific usage statistics
Insights into consumer behavior: What do users scan?
Product categories? Groceries, electronics, books, cosmetics, …? Where do users scan?
At home? Or while in a retail store? Top products and brands
Identify new opportunities: Customer engagement Product interest Cross-selling and up-selling
5
BACKEND REQUIREMENTS
Product database Many millions of products Many different data sources Curation of product data (filtering, etc.)
Analysis of scans Accept and store high volumes of scans Generate statistics over extended time periods Correlate with product data
Provide reports to developers
6
BACKEND DESIGN GOALS
Scalability High-volume storage High-volume throughput Support large number of concurrent client requests (app)
Availability
Low maintenance
7
WHICH DATABASE?
Apache Cassandra Large, distributed key-value store (DHT) «NoSQL» Polyglot Persistence Inspired by:
Amazon’s Dynamo distributed storage system Google’s BigTable data model
Originally developed at Facebook Inbox search
8
WHY DID WE CHOOSE IT?
Looked very fast Even when data is much larger than RAM
Performs well in write-heavy environment
Proven scalability Without downtime
Tunable replication
Easy to run and maintain No sharding All nodes are the same - no coordinators, masters, slaves, …
Data model YMMV…
9
WHAT YOU HAVE TO GIVE UP
Joins Referential integrity Transactions Expressive query language Consistency (tunable, but…)
Limited support for: Schema Secondary indices
10
CASSANDRA DATA MODEL
Column families Rows Columns
(Supercolumns) We’ll skip them - Cassandra developers don’t like
them
Disclaimer: I tend to say «hash» when I mean «dictionary, map, associative array» (Can you tell my favorite language?)
11
COLUMNS AND ROWS
Column: Is a name-value pair row_key, CF, column, timestamp and value
Row: Has exactly one key Contains any number of columns Columns are always automatically sorted by their name
Column family: A collection of any number of rows (!) Has a name «Like a table»
12
EXAMPLE COLUMN FAMILY
A column family «users» containing two rows Columns can be different in every row
First row has a column named «phone», second row does not Rows can have many columns
You can add millions of them
"users": {
"christof": { "email": "[email protected]", "phone": "123-456-7890" }
"moritz": { "email": "[email protected]", "web": "www.example.com" }
}
Row with key «christof»
Two columns, automatically sorted by their names
(«email», «web»)
13
DATA IN COLUMN NAMES
Column names can be used to store data Frequent pattern in Cassandra Takes advantage of column sorting
"logins": {
"christof": { "2012-01-29 16:22:30 +0100": "208.115.113.86", "2012-01-30 07:48:03 +0100": "66.249.66.183", "2012-01-30 18:06:55 +0100": "208.115.111.70", "2012-01-31 12:37:26 +0100": "66.249.66.183" }
"moritz": { "2012-01-23 01:12:49 +0100": "205.209.190.116" }
}
14
SCHEMA AND DATA TYPES
Schema is optional Data type can be defined for:
Keys The values of all columns with a given name The column names in a CF
By default, data type BLOB is used
Data Types BLOB (default) ASCII text UTF8 text Timestamp Boolean
UUID Integer (arbitrary length) Float Double Decimal
15
CLUSTER ORGANIZATION
Node 3Token 128
Node 2Token 64
Node 4Token 192
Node 1Token 0
Range 1-64,stored on node 2
Range 65-128,stored on node 3
16
STORING A ROW
1. Calculate md5 hash for row keyExample: md5(“foobar") = 48
2. Determine data range for hashExample: 48 lies within range 1-64
3. Store row on node responsiblefor rangeExample: store on node 2
Node 3Token 128
Node 2Token 64
Node 4Token 192
Node 1Token 0
Range 1-64,stored on node 2
Range 65-128,stored on node 3
17
IMPLICATIONS
Cluster automatically balanced Load is shared equally between nodes No hotspots
Scaling out? Easy Divide data ranges by adding more nodes Cluster rebalances itself automatically
Range queries not possible You can’t retrieve «all rows from A-C» Rows are not stored in their «natural» order Rows are stored in order of their md5 hashes
18
IF YOU NEED RANGE QUERIES…
Option 1: «Order Preserving Partitioner» (OPP) OPP determines node based on a row’s key instead of its hash Don’t use it…
Manually balancing a cluster is hard Hotspots Balancing cluster for one column family creates hotspot for another
Option 2: Use columns instead of rows Columns are always sorted Rows can store millions of columns
19
REPLICATION
Tunable replication factor (RF)
RF > 1: rows are automatically replicated to next RF-1 nodes
Tunable replication strategy «Ensure two replicas in
different data centers, racks, etc.» Node 3
Token 128
Node 2Token 64
Node 4Token 192
Node 1Token 0
Replica 1 of row
«foobar»
Replica 2 of row
«foobar»
20
CLIENT ACCESS
Clients can send read and write requests to any node
This node will act as coordinator
Coordinator forwards request to nodes where data resides
Node 3Token 128
Node 2Token 64
Node 4Token 192
Node 1Token 0
Client
Request:
insert( "foobar": { "email": "[email protected]" })
Replica 2 of row
«foobar»
Replica 1 of row
«foobar»
21
CONSISTENCY LEVELS
For all requests, clients can set a consistency level (CL)
For writes: CL defines how many replicas must be written before «success» is
returned to client
For reads: CL defines how many replicas must respond before result is
returned to client
Consistency levels: ONE QUORUM ALL … (data center-aware levels)
22
INCONSISTENT DATA
Example scenario: Replication factor 2 Two existing replica for row «foobar» Client overwrites existing columns in «foobar» Replica 2 is down
What happens: Column is updated in replica 1, but not replica 2 (even with
CL=ALL !)
Timestamps to the rescue Every column has a timestamp Timestamps are supplied by clients Upon read, column with latest timestamp wins
→Use NTP
23
PREVENTING INCONSISTENCIES
Read repair
Hinted handoff
Anti entropy
24
RETRIEVING DATA (API)
At a row level, you can… Get all rows Get a single row by specifying its key Get a number of rows by specifying their keys Get a range of rows
Only with OPP, strongly discouraged
At a column level, you can… Get all columns Get a single column by specifying its name Get a number of columns by specifying their names Get a range of columns by specifying the name of the first
and last column
Again: no ranges of rows
25
CASSANDRA QUERY LANGUAGE (CQL)
UPDATE users SET
"email" = "[email protected]",
"phone" = "123-456-7890"
WHERE KEY = "christof";
"users": {
"christof": { "email": "[email protected]", "phone": "123-456-7890" }
"moritz": { "email": "[email protected]", "web": "www.example.com" }
}
26
CASSANDRA QUERY LANGUAGE (CQL)
SELECT * FROM users WHERE KEY = "christof";
"users": {
"christof": { "email": "[email protected]", "phone": "123-456-7890" }
"moritz": { "email": "[email protected]", "web": "www.example.com" }
}
27
CASSANDRA QUERY LANGUAGE (CQL)
SELECT FIRST 3 "2012-01-30 00:00:00 +0100" ..
"2012-01-31 23:59:59 +0100"
FROM logins
WHERE KEY = "christof";
"logins": {
"christof": { "2012-01-29 16:22:30 +0100": "208.115.113.86", "2012-01-30 07:48:03 +0100": "66.249.66.183", "2012-01-30 18:06:55 +0100": "208.115.111.70", "2012-01-31 12:37:26 +0100": "66.249.66.183" "2012-02-09 10:27:28 +0100": "218.315.37.21", }
"moritz": { "2012-01-23 01:12:49 +0100": "205.209.190.116" }
}
28
SECONDARY INDICES
Secondary indices can be defined for (single) columns
Secondary indices only support equality predicate (=) in queries
Each node maintains index for data it owns When indexed column is queried, request must be forwarded
to all nodes Sometimes better to manually maintain your own index
29
PRODUCTION EXPERIENCE
No stability issues
Very fast
Language bindings don’t have the same quality Out of sync, bugs
Data model is a mental twist
Design-time decisions sometimes hard to change
Rudimentary access control
30
TRYING OUT CASSANDRA
DataStax website Company founded by Cassandra developers Provides
Documentation Amazon Machine Image
Apache website
Mailing lists
31
CLUSTER AT SCANDIT
Several nodes in two data centers
Linux machines
Identical setup on every node Allows for easy failover
32
NODE ARCHITECTURE
Apache HTTP Server
Background WorkersRuby
Apache Cassandradata store
Website & REST APIRuby on Rails, Rack
BeanstalkdMessage queue
to other nodes
from
mob
ile a
pps
and
web
bro
wse
rs
Phusion Passengermod_passenger
MapReduce
THANK YOU!
www.scandit.com