Reporting from the Trenches: Intuit & Cassandra

Preview:

Citation preview

Reporting from the Trenches – How Intuit Uses Cassandra Effectively to Improve Customer Experiences

Rekha Joshi, Staff EngineerIntuit, Inc.

Thank you for joining. We will begin shortly.

Webinar Housekeeping

© 2015 DataStax, All Rights Reserved. 2

All attendees placed on mute

Input questions at any timeusing the online interface

Speaker Bio

© 2015 DataStax, All Rights Reserved. 3

O’Reilly Certified Apache Cassandra Architect

Rekha JoshiStaff Engineer at Intuit

Inc.

1 About Intuit

2 Use Case: Personalized A/B Testing 3 Database Requirements

4 Cassandra: Intuit NoSQL Standard

5 Using Cassandra Effectively

4© 2015 DataStax, All Rights Reserved.

Intuit On Mission

© 2015 DataStax, All Rights Reserved. 5

Intuit Data Platforms

© 2015 DataStax, All Rights Reserved. 6

50M+manage all of the data

complex compliancePublic and private cloud

customers to handle6+

petabytes of data

45M+ Customers

Manage all of the data 6+ Petabytes of data

Complex compliance

Use Case: Personalized A/B Testing

© 2015 DataStax, All Rights Reserved. 7

Opinion-vs-Opinion Wars

Huge Investment

Angry Customer

Experiment, experiment, experiment!

Let Data Be The Decision Maker!

No Personalized A/B Testing?

With Personalized A/B Testing!!

Use Case: Personalized A/B Testing

© 2015 DataStax, All Rights Reserved. 8

To Continuously Improve User Experience, Data Is Better Than Guess!

Personalized A/B Testing Platform

© 2015 DataStax, All Rights Reserved. 9

User Assignment

Personalization Service

Segmentation Filters and Sampling

Personalization Engine

Analytics

Set up and administration

Profile Store

User Actions

A/B Testing Service

Deployment

© 2015 DataStax, All Rights Reserved. 10

Monitoring

Alerting

Amazon CloudJenkinsCoopr ChefCloudformationECS/Docker

CloudwatchSplunkGraphiteGrafanaLogstashPrometheusNew Relic

SensuNew Relic AlertsHipchatPagerDuty

Database Requirements

© 2015 DataStax, All Rights Reserved. 11

• High Data Security• No Data Loss• No Downtime• Linear Scalability• Tunable Consistency• Performance Under Workloads

All This Data!!!!!

© 2015 DataStax, All Rights Reserved. 12

Can I Lift This Alone?

© 2015 DataStax, All Rights Reserved. 13

Need for Speed

© 2015 DataStax, All Rights Reserved. 14

Cassandra, Who?

© 2015 DataStax, All Rights Reserved. 15

Cassandra is a Java based NoSQL, linearly scalable, best in class tunable performance, fault tolerant, distributed, masterless, time series database.

Cassandra: The Hybrid Kid has the Edge!

© 2015 DataStax, All Rights Reserved. 16

DynamoDB(Amazon)

Big Table(Google)

Cassandra

Inherits data distribution Inherits data model

Masterless ArchitectureLinear Scalability Tunable Consistency/Performance

ApplicationQuery Access Patterns

influencing influencing

Cassandra and DataStax Enterprise

© 2015 DataStax, All Rights Reserved. 17

Advanced Security

Integrated Analytics (Spark)

Advanced Tools

24/7 Support

A Truly Successful Software

© 2015 DataStax, All Rights Reserved. 18

• Solves A Real Need• Is A Building Block for Platforms• Becomes Open Source• Gets Commercial Backing• Tools Ecosystem Builds Around It• Establishes Strong Users Base• Companies in Critical Domains use It!!

Database Options

© 2015 DataStax, All Rights Reserved. 19

Intuit and Cassandra

© 2015 DataStax, All Rights Reserved. 20

Cassandra = Intuit Technology Standard of Choice for NoSQL Distributed Database

High Data SecurityNo Data LossNo Downtime

Linear ScalabilityTunable ConsistencyOther NoSQL variants

Performance Under Workloads

Did You Use Cassandra Effectively?

© 2015 DataStax, All Rights Reserved. 21

Garbage Collection Issue

© 2015 DataStax, All Rights Reserved. 22

New objects created at faster rate, than they are GC’ed Can causes STOP-THE-WORLD GC pauses! •Configure Heap size, MAX_HEAP_SIZE•Set up GC logging CASSANDRA_HEAP_DIR•Configure CMS GC/G1GC•Automated Heap Dump•Upgrade System

Cassandra is a Java based NoSQL linearly scalable, fault tolerant, distributed time series database.

Clock Issue

© 2015 DataStax, All Rights Reserved. 23

Ensure when you move setups/do upgrades, the ntp server is set correctly

Cassandra is a NoSQL linearly scalable, fault tolerant, distributed time series database.

Understand the Node Ring

© 2015 DataStax, All Rights Reserved. 24

Repeat after me: Cassandra is a Java based NoSQL linearly scalable, best in class tunable performance, fault tolerant, distributed, masterless, time series database.

Nodetool statusNodetool ringNodetool infoNodetool cfstatsNodetool tpstats

What If A Node Goes Down?

© 2015 DataStax, All Rights Reserved. 25

ReplicationConsistencyNodetool repairNodetool decommissionNodetool snapshots

Cassandra is a NoSQL linearly scalable, fault tolerant, distributed, masterless time series database.

Tuning The Application

© 2015 DataStax, All Rights Reserved. 26

Cassandra is a Java based NoSQL linearly scalable, best in class tunable performance, fault tolerant, distributed, masterless, time series database.

Refactor data modelRevisit the usage access patternsParanoid Monitoring

Tuning For Reads

© 2015 DataStax, All Rights Reserved. 27

• Caching Layer – Key Cache/Row Cache• SSTable Compactions Frequency

• Multiple SSTable inefficient

Cassandra is a Java based NoSQL linearly scalable, best in class tunable performance, fault tolerant, distributed time series database.

Tuning For Writes

© 2015 DataStax, All Rights Reserved. 28

Cassandra is a Java based NoSQL linearly scalable, best in class tunable performance, fault tolerant, distributed time series database.

• Memtable – Fast Writes• CommitLog – Separate Dedicated Disk

Tuning the System

© 2015 DataStax, All Rights Reserved. 29

EXT4 Filesystem System Memory, CPU, DiskParanoid Monitoring

Cassandra is a NoSQL linearly scalable, fault tolerant, distributed, masterless time series database.

Little Talked Aspect Of The Pareto Principle!

© 2015 DataStax, All Rights Reserved. 30

Heavy Lifting? Easy!

© 2015 DataStax, All Rights Reserved. 31

© 2015 DataStax, All Rights Reserved. 32

Thank you!

Input questions at any timeusing the online interface

Q & A

https://www.linkedin.com/in/rekhajoshmhttps://twitter.com/rekhajoshm