47
Deletes Without Tombstones or TTLs Eric Stevens, Principal Architect ProtectWise, Inc.

Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

Embed Size (px)

Citation preview

Page 1: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

Deletes Without Tombstones or TTLs

Eric Stevens, Principal Architect

ProtectWise, Inc.

Page 2: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

About ProtectWiseAn enterprise security company that records, analyzes, and visualizes your network on

demand to detect complex threats that others can’t see

Big DataData Ingestion and Availability

● Well north of a billion new records per day

● Processed, analyzed, and stored in soft real time

● Fully indexed and searchable with p95 query response times <1 second

○ Shortening the OODA loop

● Hundreds of Cassandra servers

● Hundreds of Billions of Records

● Multiple Petabytes of Data

Page 3: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

With one sensor, ProtectWise captured the following data at Super Bowl 50:

● 8.806 Terabytes of data seen. Primarily HTTP, SSL and traffic to Amazon AWS, Facebook, Twitter, and Instagram.

● 1.550 Terabytes of data captured (82% optimization)

● 17 million URLs hit● 8,085,949 DNS requests

With a single sensor deployed on the Levi's Public Wi-Fi Network, ProtectWise captured 8.806 Terabytes of Data and was able to optimize it by 82% to just 1.550 Terabytes of data, a true testament to the scale and power of our platform.

Use Case – Super Bowl 50The Broncos weren’t the only team from Denver in Levi’s Stadium

Page 4: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● How Deletes (tombstones) in Cassandra Work Today● The Limitations of Tombstones● Misconceptions about Tombstones● How TTL (Time to Live) in Cassandra works today● The limitations of TTLs● Why neither strategy works for ProtectWise● Our unconventional solution● Advantages of our solution● Disadvantages of our solution

Overview

Page 5: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● Increases both write and read I/O pressure

● Not an effective means of reclaiming disk capacity

● May be difficult to locate correct records for deletion

● Makes reads more expensive● Actual tombstones can often greatly

outlive their deleted data (much longer than gc_grace)

Terrible

● Surgically target data for removal● Easy to reason about from a read

consistency perspective

Terrific

The Trouble with Tombstones

Page 6: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

When do tombstones (and expired TTL’d records) go away?

●Never before it’s gc_grace old (this is a good thing, and you get to control it)

●During compaction, for a tombstone past gc_grace, its partition key is checked against the bloom filters of all other SSTables for the given CQL table.

●If there is a bloom filter collision, the tombstone will remain, even if the bloom filter collision was a false positive

●If there is ANY data, even other tombstones for that partition in any SSTable, the tombstone will not get cleaned up

●If bloom filters indicate there is no chance of overlap on that partition key, the tombstone will get cleaned up

Page 7: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Misconception about Tombstone Performance● The performance degradation from tombstones isn’t from the

tombstone itself.●If you do

○ for (n <- 0 to 100000) { INSERT INTO table (partitionKey, clusterKey) VALUES ( 1, n ) }

●You can later create a range tombstone that is tiny bytes wise:○ DELETE FROM table WHERE partitionKey = 1 AND clusterKey < 99999

●But if you then ○ SELECT * FROM table WHERE partitionKey = 1 LIMIT 1

●Cassandra will have to read then discard rows with clusterKey values from 0 to 99998 before the LIMIT 1 can be reached

Page 8: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

PK1 CK1

CK2

1 2 ... o

1 2 ... p

... ...

CKn 1 2 ... q

PK1 DELETE 1 – n-1

SSTable 1

SSTable 2

3

SELECT * FROM table WHERE pk1 LIMIT 1

Page 9: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

{ { { {

Compaction Review

↑ Writes

← Older Data Newer Data →

Page 10: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Tombstones in Compaction

↑ Delete

SSTable containing

record to delete ↑

Page 11: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Tombstones in Compaction

↑ Other Writes

SSTable containing

record to delete ↑

Page 12: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Tombstones in Compaction

↑ Other Writes

SSTable containing

record to delete ↑

Page 13: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Tombstones in Compaction

↑ Other Writes

SSTable containing

record to delete ↑

Page 14: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Tombstones in Compaction

↑ Other Writes

Finally Deleted ↑

Page 15: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

Showing why tombstones are not the same thing as a delete.

Tombstone Demo

Page 16: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Setupcqlsh> CREATE TABLE testing( … p blob, … c blob, … v blob, … PRIMARY KEY(p,c) … ) WITH gc_grace_seconds=0;

Page 17: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Setupcqlsh> INSERT INTO testing (p,c,v) VALUES (0xcafebabe, 0xdeadbeef, 0xdeadc0de);

$ nodetool flush && ls *-Data.db

testing-testing-ka-1-Data.dbtesting-testing-ka-2-Data.db

cqlsh> INSERT INTO testing (p,c,v) VALUES (0xcafebabe, 0xdeadbeef, 0xfacefeed);

$ nodetool flush && ls *-Data.db

testing-testing-ka-1-Data.db

0xcafebabe:0xdeadbeef:0xfacefeed1 0xcafebabe:0xdeadbeef:0xfacef

eed10xcafebabe:0xdeadbeef:0xdeadc

0de2

Page 18: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Setupcqlsh> DELETE FROM testing WHERE p=0xcafebabe AND c=0xdeadbeef;

$ nodetool flush && ls *-Data.db

testing-testing-ka-1-Data.dbtesting-testing-ka-2-Data.dbtesting-testing-ka-3-Data.db

cqlsh> select * from testing;

p | c | v------------+------------+------------ 0xcafebabe | 0xdeadbeef | 0xdeadc0de

0xcafebabe:0xdeadbeef:0xfacefeed1

0xcafebabe:0xdeadbeef:0xdeadc0de2

0xcafebabe:0xdeadbeef:DELETE3

Page 19: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Let’s look at the data$ hexdump testing-testing-ka-1-Data.db

0000000 4b 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 800000010 00 01 00 72 0a 00 04 de ad be ef 0e 00 71 05 340000020 3b d8 4e df f1 0d 00 14 0b 19 00 29 01 76 1a 000000030 70 04 fa ce fe ed 00 00 6f 9b 15 17

0xcafebabe:0xdeadbeef:0xfacefeed1

Page 20: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Let’s look at the data$ hexdump testing-testing-ka-2-Data.db

0000000 4b 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 800000010 00 01 00 72 0a 00 04 de ad be ef 0e 00 71 05 340000020 3b e3 86 df 23 0d 00 14 0b 19 00 29 01 76 1a 000000030 70 04 de ad c0 de 00 00 62 de 14 02

0xcafebabe:0xdeadbeef:0xdeadc0de2

Page 21: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Let’s look at the data$ hexdump testing-testing-ka-3-Data.db

0000000 33 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 800000010 00 01 00 94 07 00 04 de ad be ef ff 10 0a 00 f00000020 00 01 57 4f 2d 69 00 05 34 3b e6 ab 47 c8 00 000000030 db 77 12 69

0xcafebabe:0xdeadbeef:DELETE3

Page 22: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Time to CompactSimulate compaction happening on data that has been deleted, but where the tombstone is not involved in the compaction

% jmx_invoke -m org.apache.cassandra.db:type=CompactionManager forceUserDefinedCompaction testing-testing-ka-1-Data.db,testing-testing-ka-2-Data.db

$ ls *-Data.dbtesting-testing-ka-3-Data.dbtesting-testing-ka-4-Data.db

0xcafebabe:0xdeadbeef:0xfacefeed1

0xcafebabe:0xdeadbeef:0xdeadc0de2 0xcafebabe:0xdeadbeef:???????

???4

Page 23: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Let’s look again:$ hexdump testing-testing-ka-4-Data.db

0000000 4b 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 800000010 00 01 00 72 0a 00 04 de ad be ef 0e 00 71 05 340000020 3b e3 86 df 23 0d 00 14 0b 19 00 29 01 76 1a 000000030 70 04 de ad c0 de 00 00 62 de 14 02

0xcafebabe:0xdeadbeef:0xdeadc0de4

Page 24: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

What happened?●The tombstone for primary key (0xcafebabe,0xdeadbeef) was

written in SSTable 3

●SSTable 3 wasn’t involved in the compaction

●∴The data at rest didn’t get cleaned up

Page 25: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Why is this a problem●In all mainline compaction strategies:

○ Data written close together chronologically tends to compact together relatively quickly

○ Data written chronologically far apart tends to take a long time to compact together

■ This is why it’s an anti-pattern to append or overwrite the same partition over long periods of time, your reads to that partition will end up needing to read out of a large number of SSTables

○ Because disk capacity is not recovered until the tombstone and its underlying data are involved in the same compaction, it can take a long time to recover disk capacity

●Some compaction strategies (DateTiered, TimeWindowed) have controls that allow for data to permanently stop compacting.

○ Under these conditions there become times where it’s impossible to ever recover disk capacity

Note, See CASSANDRA-7019 for an upcoming alternativeAlso “Improving Tombstone Compactions” today at 4:10 in 210C

Page 26: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● Once a TTL has been written, there is no way to change your mind except to write the record again with a new TTL

● Rows written to more than one time may have inconsistent TTLs leading to dirty or incomplete reads.

● TTL’d records may remain at rest much longer than you realize in some circumstances

Trouble

● Fire and forget, your data will “go away” fairly predictably

Terrific

The Trouble with TTLs

Page 27: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

●Customers get to change their mind about how long they want us to retain their data

●Changing TTL’s is expensive, both in terms of I/O pressure, and temporarily doubling the size of your data at rest

●Disks are cheap… lots of disks are not●Cassandra data at rest has an ongoing

cost, if a customer stops paying for it, we need to as well

●Timeliness of deletes is important●Sensitive data spillage means we need to

remove some data quickly

Why Neither Strategy Works for Us

Page 28: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

Our Unconventional Solution

Page 29: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● If you have hot swappable drives, this is a lot easier, if not, you might have some temporary downtime due to RF change.

Step 2: Disconnect Drive

● There are some weird anti-entropy corner cases that are solved if you disable replication

Step 1: Set RF=1

Basic StrategySuccessfully used to delete significant amounts of data with little to no performance impact

Page 30: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Step 3

Page 31: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

Deleting Compaction Strategy

Page 32: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● Records are removed from the next compaction as soon as they should be evicted

● If we need to recover capacity quickly we can use user defined compaction to selectively target our oldest files

Evicting Compaction Strategy

● During compaction, use deterministic logic to determine which records should be removed

● Prevent records from surviving the compaction process

● Clean up indexes at the time the record is removed

Delete While Compacting

Basic StrategyFor real this time.

Page 33: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● If you choose to, you can create a backup automatically of the deleted records

● Save yourself from deletion remorse● Incorrect deletion logic● Change of heart by you(r

customer)● Move those records to cheaper

storage

Backing up your deletes

● Acts as a parent strategy with your preferred child compaction strategy

● Child strategy is responsible for sstable selection

● You get the characteristics of your strategy, with the deletes of our strategy

Wrapping Compaction Strategy

FeaturesDoes it support feature X of my preferred compaction strategy?

Page 34: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● Configurable and extensible

● Several provided implementations can be reasonably surgically controlled by reading deletion rules out of a table you specify

● Extend one of several base classes to provide more sophisticated custom logic

● Restoring backups

● To restore accidentally deleted records, copy these files to the right path and do nodetool refresh

● Or if your topology has changed you can restore them with sstableloader

Features

Page 35: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

ALTER TABLE bar WITH compaction = { 'class': 'DeletingCompactionStrategy', 'dcs_underlying_compactor': 'LeveledCompactionStrategy', 'sstable_size_in_mb': 160};

ALTER TABLE foo WITH compaction = { 'class': 'DeletingCompactionStrategy', 'dcs_underlying_compactor': 'SizeTieredCompactionStrategy', 'min_threshold': '2', 'max_threshold': '8'};

A Wrapping Compaction StrategyDoesn’t change the fundamental characteristics

of your preferred compaction strategy

Page 36: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Compaction’s Inner WorkingsCredit: DataStax

https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html

Page 37: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Compaction’s Inner WorkingsCredit: DataStax

https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html

{Compaction Strategy selects SSTables

Returns SSTableIterators

Page 38: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Compaction’s Inner WorkingsCredit: DataStax

https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html

}FilteringSSTableIterators exclude data which should be deleted, and also notify IndexManager if appropriate to clean up associated indexes.

Page 39: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Rules:

A => ✓B => ✗C => ✓D => ✗E => ✓

* if configured to backup convicted records

An Evicting Compaction StrategyRecords involved in compaction which are convicted do not

survive into the newly compacted SSTable

ABC

ABD

CDE

ACE

SSTable 1 SSTable 2 SSTable 3

New SSTable Backup SSTable*

BD

Page 40: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● Compaction performance is often bounded by available write capacity

● Fewer records surviving into the target table reduces write pressure during compaction

● Testing of records for conviction is lightweight (depending on the complexity of your business logic), and mostly CPU bound

Often Faster than Existing Compaction

Page 41: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● Records past the deletion boundary may still be visible to your application

● You may get inconsistent reads for such records

● Evicted records may resurrect temporarily due to repair

● They’ll end up in a new SSTable and will evict again during the next auto compaction

Boundary Consistency

● Like all other baked in deletion options, disk capacity is reclaimed only eventually

● Old SSTables still tend not to compact very frequently

● However by triggering user defined compaction, you can reclaim space immediately without resorting to major compaction

Eventual Deletes

Limitations

Page 42: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● Read repair and in general any repair may cause a record to fully resurrect temporarily

● Resurrected record will appear in the youngest SSTables

● Will disappear again when those new SSTables next compact (generally relatively quickly for an active cluster)

Repair = Resurrection

● Logic for deletes needs to be deterministic or you’ll end up with consistency issues

● Probably not a good idea to base any deletion logic on anything outside of the primary key except in narrow use cases

Requires deletion determinism

Limitations

Page 43: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

● Supports and tested against Cassandra 2.x series

● In 3.x the package and class names changed, needs to be ported

● Tests are written in Scala, they cover a lot of surface area but would need to be rewritten prior to contribution

● Needs additional general purpose convictors

● Principally tested against STCS and deserves better coverage for other child strategies

Current Project Status

Page 44: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

https://github.com/protectwise/cassandra-util

Also includes:

● Our DataStax Driver Wrapper for Scala

● Our CCM wrapper lib for automating unit tests in Scala

GitHub

Availability & Compatibility

Page 45: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

www.protectwise.com/careers.html

Especially if you’re in Denver!

Scala, Akka, Spark, Node, DevOps

We’re Hiring!

Page 46: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

©2016 ProtectWise, Inc. All rights reserved.

Page 47: Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

Cold Storage that Isn’t GlacialTomorrow 10:45 Room LL20D

Using Approximate Data for Small, Insightful Analytics

Tomorrow 2:00 Room LL20A

See Our Other Talks