Webinar: Dyn + DataStax - helping companies deliver exceptional end-user experience

Preview:

Citation preview

Dyn + DataStax: Helping Companies Deliver Exceptional End-User ExperienceMay 17, 2016Tim Chadwick, Principal Engineer, Infrastructure, DynRick Bross, Principal Engineer, Scalability, Dyn

The Story at Dyn

The Road to Production

Lessons and Direction

Journey to DataStax Enterprise

The Story at DynDyn is a cloud-based

Internet Performance Management (IPM) company that provides unrivaled visibility and

control into cloud and public Internet resources.

Dyn’s platform monitors, controls and optimizes applications and infrastructure through Data,

Analytics, and Traffic Steering, ensuring traffic gets delivered faster, safer, and more reliably

than ever.

http://techcrunch.com/2016/05/10/dyn-series-b/

DNS Overview

Dyn Global: 20+ Data Centers

tchadwick@piedmont:~$ dig SOA ifc.com | grep -A 1 "ANSWER SECT";; ANSWER SECTION:ifc.com. 7175 IN

SOA ns1.p28.dynect.net. postmaster.ifc.com. 2016042900 3600 600 604800 1800Build a sustainable system that

can track usage by customer and zone (domain).

The consumers are our customers, our billing department, and Chris Baker.

Who Needs These Data?

For each five minute interval of an invoice period, determine the Queries per Second (QPS) and sort in descending order.

Discard the top 5%, and it is the maximum value remaining which is a customer’s 95th Percentile, or monthly bill rate.

http://dyn.com/blog/the-95th-percentile-burstable-billing-model-managed-dns/https://en.wikipedia.org/wiki/Burstable_billing#95th_percentile

Traffic Telemetry

1. Operations-Flexible Topology-Resilient Clusters-Visibility and Administration

2. Data Model-Idempotent Writes-Low Concurrency-Application Redundancy

Oh, and it must perform well.

Benchmarking Cassandra Scalability on AWS

Over a million writes per second

Priorities that Led to DataStax Enterprise

Consult the Experts

Oh Baby!

One sec, new priority...

FidelityCustomer

Enterprise Requirement Ahead!

Sunnyvale (USSNN1)

North Bergen (USNBN1)

CREATE KEYSPACE qld WITH replication = { 'class': 'NetworkTopologyStrategy', ...};

USE qld;

CREATE TABLE qld_logs ( key text, row_seq bigint, logline text, PRIMARY KEY ((key), row_seq)) WITH COMPACT STORAGE AND ... compaction={'class': 'SizeTieredCompactionStrategy'} AND ...

Detailed DNS Query Log - DSE Cluster

Success!

Back to our original goal....

• Customers • Zones (Domains)• Zone Record Types• Fully Qualified Domain Names

(qnames)• Regions (ANYCAST)• Data Centers• Nameservers• “Top 10s”

Many, many more customers.Many, many more dimensions.

I Want More From You....

Datastax Enterprise Provided the Tools

North BergenSunnyvale

CREATE TABLE "QueryCountSummaryCF”CREATE TABLE "QueryZoneCountCF"CREATE TABLE "QueryHostCountCF"CREATE TABLE “QueryCountSummaryRollupsCF"CREATE TABLE "QueryZoneCountRollupsCF"CREATE TABLE "QueryHostCountRollupsCF"CREATE TABLE "QueryPlatformCountCF"

WITH DEFAULT_TIME_TO_LIVE = 31536000;WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication' : 2 };

The Working Solution

Factoids

Throughput:○ > 12k w / s○ 99th percentile < 5ms○ Avg read latency < 10ms

Size:○ 200GB -> 1.2TB, steady○ ~ 12B data points

How DataStax Enterprise Provided Value● Support in Every Phase

○ Proof of Concept○ Design○ Operations○ Optimization

● Integrated Toolkit○ OpsCenter○ SPARK

We get the value of many, many people at the cost of about 1/2 FTE.

Lessons Learned

Top Lessons Learned

1. Include all teams in planning, deployment and implementation.2. Consult knowledgeable people before making decisions and “optimizations”.3. Understand compaction strategies to immediately eliminate those that are not a fit.4. Ensure that client load balancing policies and consistency levels match DC

topology and schema replication factors.5. Model and understand all failure scenarios.6. Use Spark to aggregate data in order to save storage and improve performance.

#1: Include all teams

● Product management● Application engineering● DBAs● Operations ● Network engineering● System engineering● Finance and Management

#2: Consult knowledgeable people . . .

● Schema● Cluster topology and tuning● Tuning● Compaction algorithms● Client interaction

Talk to Datastax! They’ve probably seen it before!

#3: Understand Compaction Strategies!

DTCS was our first choice. It didn’t work . . . .

Tim Goodaire September 02, 2015 17:10

We have changed the compaction strategy,

concurrent_compactors, compaction_throughput, and

heap size. It took a while for the cluster to complete

the compactions, but it's done now. The cluster is up

and appears to be healthy.

Today, we've been adding a few more nodes and

resetting the heap size back to 8 GB.

#4: Ensure client and cluster settings match

Load balancing policies, read and write consistency, schema replication factor, cluster topology . . .

#5: Model Failure ScenariosWhat happens when a node fails? Two? The DC?

Will the client fail? How will queries be satisfied?

700 rows for a single 5 minute interval

Daily billing went from 14 hours, to 2 hours on DSE/C*, and 12 minutes with DSE/SPARK

#6: Use DSE Spark to aggregate20 rows for an hour interval

What’s Next?

© DataStax, All Rights Reserved. 29

● Rely on best practices to support more analytical use cases across products.

● Complete development of generic C* solution, for quicker time to market, and greater scale in our hybrid cloud.

● Consider new opportunities for relying on DSE for products delivering services.

Contacts and Thanks!Tim ChadwickPrincipal Engineer, Infrastructurehttps://www.linkedin.com/in/timjchadwicktchadwick@dyn.com@DynData

Rick BrossPrincipal Engineer, Data Analyticshttps://www.linkedin.com/in/rickbrossrbross@dyn.com

Dyn, Inc.150 Dow St – Tower TwoManchester, NH 03101603-668-4998

© DataStax, All Rights Reserved. 30

Coming Soon!

● June 8: How to Half Hour - Building Data Pipelines with SMACK: Storage Strategy using Cassandra and DSE

● July 6: How to Half Hour - Building Data Pipelines with SMACK: Analyzing Data with Spark

● For the latest schedule of webinars, check out our Webinars page: http://www.datastax.com/resources/webinars.

Appendix

Client● Client cluster and session object configuration

○ Cluster seeds (DCAwareRoundRobinPolicy implications)

○ Other load balancing policies to wrap○ Read and write consistency setting○ # connections per host ○ # requests per connection○ Pool timeout

● Client query settings○ Read and write consistency (may override default

for specific query)○ Batches (rarely if ever should be used)○ Stored procedures (usually best practice for groups

of queries - ex. we use for high velocity inserts)○ Sync or Async? Depends on the specific query, but

usually best practice with stored procedures.○ Write with a consistent TTL per table.○ How many threads should share the client session

object? We’ve found that balancing the DC capabilities, client latency, and a (native) thread pool turbocharges inserts.

Cassandra Cluster● Network topology

○ Colocated latency? Inter DC latency?○ Replication factor per DC per schema

● Schema○ Don’t mix schemas with different use cases!○ Dyn’s usage pattern

■ Optimize INSERTs.■ Ensure READs succeed.■ Avoid UPDATEs (“out of order” TTLs)■ Ban DELETEs (turn off the repair service)

○ Attempt to have all (voluminous) tables use the same compaction strategy.

○ Use consistent TTLs for writes. If you override the default, always override with the same value.

● Compaction algorithms○ With time series data, no deletes, no ✓ ✓ ✓

updates and consistent TTLs: you can use ✓DTCS, which will simply drop old sstables.

Client/Cluster Settings - Must Work Together!

Recommended