IoT:what about data storage?

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

IoT: what about data storage?Vladimir Rodionov Staff Software Engineer


IoT data stream Sequence of data points


IoT data stream Sequence of data points Triplet: [ID][TIME][VALUE] – basic time-series


IoT data stream Sequence of data points Triplet: [ID][TIME][VALUE] – basic time-series Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags


IoT data stream Sequence of data points Triplet: [ID][TIME][VALUE] – basic time-series Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags Sometimes with location – spatial data


IoT data stream Sequence of data points Triplet: [ID][TIME][VALUE] – basic time-series Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags Sometimes with location – spatial data But, strictly time-series


IoT data stream Sequence of data points Triplet: [ID][TIME][VALUE] – basic time-series Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags Sometimes with location – spatial data But, strictly time-series Do we have good time series data store?


IoT data stream Sequence of data points Triplet: [ID][TIME][VALUE] – basic time-series Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags Sometimes with location – spatial data But, strictly time-series Do we have good time series data store? Open source?


IoT data stream Sequence of data points Triplet: [ID][TIME][VALUE] – basic time-series Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags Sometimes with location – spatial data But, strictly time-series Do we have good time series data store? Open source? But commercially supported?




Apache HBase

Open Source Scalable Distributed NoSQL Data Store Commercially supported Temporal?


Apache HBase

Open Source Scalable Distributed NoSQL Data Store Commercially supported Temporal? Sure, you can do temporal


Apache HBase

Open Source Scalable Distributed NoSQL Data Store Commercially supported Temporal? Sure, you can do temporal stuff! Out of box?




Time Series DB requirements

Data Store MUST preserve temporal locality of data for better in-memory caching Data Store MUST provide efficient compression

– Time – series are highly compressible (less than 2 bytes per data point in some cases)– Facebook custom compression codec produces less than 1.4 bytes per data point

Data Store MUST provide automatic time-based rollup aggregations: sum, count, avg, min, max, etc., by min, hour, day and so on – configurable. Most of the time its aggregated data we are interested in.

Efficient caching policy (RAM/SSD) SQL API (nice to have, but it is optional) Support IoT use cases ( write/read ratio up to 99/1, millions ops)


Ideal HBase Time Series DB Keeps raw data for hours Does not compact raw data at all Preserves raw data in memory cache for periodic compactions and time-based rollup

aggregations Stores full resolution data only in compressed form Has different TTL for different aggregation resolutions:

– Days for by_min, by_10min etc.– Months, years for by_hour

Compaction should preserve temporal locality of both: full resolution data and aggregated data.

Integration with Phoenix (SQL)


Write Path (for 99%)


Time Series DB HBase

Raw Events

Region Server

HDFSCF:Compressed

CF:Raw

CF:Aggregates

C

A

C

A

Compressor Coprocessor

Aggregator Coprocessor

CF:Aggregates

CF:Compressed – TTL days/months

CF:Aggregates – TTL months/years (CF per resolution)

CF:Raw – TTL hours


HBASE-14468 FIFO compaction

First-In-First-Out No compaction at all TTL expired data just get archived Ideal for raw data storage No compaction – no block cache trashing Raw data can be cached on write or on read Sustains 100s MB/s write throughput per RS Available 0.98.17, 1.1+, 1.2+, HDP-2.4+ Can be easily back ported to 1.0 (do we need this?)


Exploring (Size-Tiered) Compaction

Does not preserve temporal locality of data. Compaction trashes block cache No efficient caching of data is possible It hurts most-recent-most-valuable data access pattern. Compression/Aggregation is very heavy. To read back recent raw data and run it through compressor, many IO operations are

required, because … We can’t guarantee recent data in a block cache.


HBASE-15181 Date Tiered Compaction

DateTieredCompactionPolicy CASSANDRA-6602 Works better for time series than ExploringCompactionPolicy Better temporal locality helps with reads Good choice for compressed full resolution and aggregated data. Available in 0.98.17, 1.2+, HDP-2.4 has it as well


Exploring Compaction + Max Size

Set hbase.hstore.compaction.max.size This emulates Date-Tiered Compaction Preserves temporal locality of data – data point which are close will be stored in a same

file, distant ones – in separate files. Compaction works better with block cache More efficient caching of recent data is possible Good for most-recent-most-valuable data access pattern. Use it for compressed and aggregated data Helps to keep recent data in a block cache. ECPM


HBASE-14496 Delayed compaction

Files are eligible for minor compaction if their age > delay Good for application where most recent data is most valuable. Prevents block cache from trashing for recent data due to frequent minor compactions

of a fresh store files Will enable this feature for Exploring Compaction Policy Improves read latency for most recent data. ECP + Max +Delay (1-2 days) is good option for compressed full resolution and

aggregated data. ECPMD Patch available. HBase 1.0+ (can be back-ported to 0.98)


Time Series DB HBase

Raw Events

Region Server

HDFSCF:Compressed

CF:Raw

CF:Aggregates

C

A

C

A

Compressor Coprocessor

Aggregator Coprocessor

CF:Aggregates

CF:Compressed – TTL days/months

CF:Aggregates – TTL months/years (CF per resolution)

CF:Raw – TTL hours

ECPM or DTCP

FIFO

ECPM or DTCP


HBase Block Cache and Time Series

Current policy (LRU) is not optimal for time-series applications We need something similar to FIFO (both in RAM and on SSD) We need support for TB size RAM/SSD-based caches Current off-heap bucket cache does not scale well (it keeps keys in Java heap) For SSD cache we could mirror most recent store files, thus providing FIFO semantics

w/o any complexity of disk-based cache management. This all above are work items for future, but today …

– Disable cache for raw data (prevent extreme cache churn)– Enable cache on write/read for compressed data and aggregations


Flexible Retention Policies

Raw

Compressed

Aggregates

Hours Months Years


Read/Write IO Reduction

100

~50

~10

Base

FIFO+ECPM

+Compaction


Read/Write IO Reduction

100

~50

~10

Base

FIFO+ECPM

+Compaction

50-100MB/s

25-50MB/s

5-10MB/s


Read/Write IO Reduction (estimate for 250K/sec data points)

100

~50

~10

Base

FIFO+ECPM

+Compaction

50-100MB/s

25-50MB/s

5-10MB/s


Summary Disable major compaction Do not run HDFS balancer Disable HBase auto region balancing: balance_switch false Disable region splits (DisabledRegionSplitPolicy) Presplit table in advance. Have separate column families for raw, compressed and aggregated data (each

aggregate resolution – its own family) Increase hbase.hstore.blockingStoreFiles for all column families FIFO for Raw, ECPM(D) or DTCP (next session) for compressed and aggregated data


Summary (continued)

Run periodically internal job (coprocessor) to compress data and produce time-based rollup aggregations.

Do not cache raw data, write/read cache for others (if ECPM(D)) Enable WAL Compression - decrease write IO. Use maximum compression for Raw data (GZ) – decrease write IO.


Read Path (for 1%)


SQL (Phoenix) integration Each time series has set of named attributes, which we call meta (tags in OpenTSDB) Keep time-series meta in Phoenix type table(s). Adding new time series, deleting time-series or updating time-series is DML/DDL

operation on a Phoenix table. Meta is static (mostly) Define set of attributes in meta which create PK Have PK translation to unique ID. Store ID, RTS (reversed time stamp), VALUE in HBase Now you can index time-series by any attribute(s) in Phoenix Query is two-step process: Phoenix first to select list of IDs, then HBase to run query on

ID list


Query Flow

ID Active Version … MFG11 true 1.1 SA12 true 1.3 SA15 true 1.4 GE17 true 1.1 GE… … … … …345 false 1.0 SA

Phoenix SQL Time-Series Definition - META

ID Timestamp Value11 143897653 10.012 143897753 11.315 143897953 11.617 143897853 11.9… … …345 143897753 11.0

HBase Time Series DB Time-Series Data

2)GetAvgByIdSet(ID set, now(), now() -24h)

1)SELECT ID FROM META WHERE MFG=‘SA’AND Version = ‘1.1’

1. 2.

ID set


Time-Series DB API

Group operations on ID sets by time range– Min, Max, Avg, Count, Sum, other aggregations

Pluggable aggregation functions Support for different time resolutions With different approximations (linear, cubic, bi-cubic) Batch load support (for writes) Can be implemented in a HBase coprocessor layer Can work much-much faster than regular SQL DBMS


Time-Series DB API

Group operations on ID sets by time range– Min, Max, Avg, Count, Sum, other aggregations

Pluggable aggregation functions Support for different time resolutions With different approximations (linear, cubic, bi-cubic) Batch load support (for writes) Can be implemented in a HBase coprocessor layer Can work much-much faster than regular SQL DBMS Because we have already aggregated data


Thank you

Q&A

Technology

IoT:what about data storage?