Physical Data Storage

Physical Data Storage

Stephen Dawson-Haggerty

Data Sources

sMAP

sMAP

sMAP

sMAP

- Data exploration/visualization- Control Loops- Demand response- Analytics- Mobile feedback- Fault detection

Hadoop

HDFS

Applications

StreamFS

Time-Series Databases

• Expected workload• Related work• Server architecture• API• Performance• Future directions

Dent

ci

rcui

t m

eter

sMAP

sMAP

Write Workload

• sMAP Sources– HTTP/REST protocol for exposing physical

information– Data trickles in as its generated– Typical data rates: 1 reading/1-60s

• Bulk imports– Existing databases– Migrations

Read Workload

• Plotting engine• Matlab & python

adaptors for analysis

• Mobile apps• Batch analysis

Dominated by range queries

Latency is important, for interactive data exploration

Page Cache Lock Manager

Key-Value Store

Storage Alloc.

Time-series Interface

Bucketing RPC Compression

read

ingd

b

insert

resample

aggregate

query

stre

amin

g pi

pelin

e

SQL

Storage mapper

MySQL

Time series interface

db_open()db_query(streamid, start, end) Query points in a range

db_next(streamid, ref), db_prev(...) Query points near a reference time

db_add(streamid, vector) Insert points into the database

db_avail(streamid) Retrieve storage map

db_close()

All data is part of a stream, identified only by streamid

A stream is a series of tuples: (timestamp, sequence, value, min, max)

Storage Manager: BDB

• Berkeley Database: embedded key-value store• Store binary blobs using B+ trees• Very mature: around since 1992, supports

transactions, free-threading, replication• We use version 4

RPC Evolution

• First: shared memory– Low latency

• Move to threaded TCP• Google protocol buffers– zig-zag integer representation, multiple language

bindings– Extensible for multiple versions

On-Disk Format• All data stores perform poorly

with one key per reading– index size is high– unnecessary

• Solution: bucket readings• Excellent locality of reference

with B+ tree intexes– Data sorted by streamid and

timestamp– Range queries translate into

mostly large sequential IOs

bucket

(streamid, timestamp)

• Represent in memory with materialized structure – 32b/rec– Inefficient on disk – lots of repeated

data, missing fields• Solution: compression

– First: delta encode each bucket in protocol buffer

– Second: Huffman Tree or Run Length encoding (zlib)

• Combined compression 2x better than gzip or either one

• 1m rec/second compress/decompress on modest hardware

On-Disk Format

compress

bdb page

...

Other Services: Storage Mapping• What is in the database?

– Compute a set of tuples (start, end, n)• The desired interpretation is “the data source was alive”

• Different data sources have different ways of maintaining this information and maintaining confidence– Sometimes you have to infer it from the data– Sometime data sources give you liveness/presence guarantees – “I haven’t heard from you in an hour, but I’m still alive!”

dead or alive?

readingdb6

• Up since December supporting Cory Hall, SDH Hall, most other LoCal Deployments– behind www.openbms.org

• > 2 billion points in 10k streams– 12Gb on disk ~= 5b/rec including index– So... we fit in memory!

• Import at around 300k points/sec– We maxed out the NIC

Low Latency RPC

Compression ratios

Write load

Importing old data: 150k points/sec Continuous write load: 300-500pts/sec

Future thoughts

• A component of a cloud storage stack for physical data

• Hadoop adaptor: improve Mapreduce performance over Hbase solution

• The data is small: 2 billion points in 12GB– We can go a long time without distributing this

very much– Probably necessary for reasons other than

performance

THE END

Documents

Physical Data Storage