Upload
daniel-cohen
View
141
Download
0
Embed Size (px)
Citation preview
DataStax Enterprise in the Field
Daniel Cohen Solutions Engineer @ DataStax
© DataStax, All Rights Reserved.
But Enough About Me…
• Solutions Engineer at DataStax • LA ➜ SF ➜ NYC ➜ SF ➜ London • Previously at JP Morgan in London • Finance & digital media
2
© DataStax, All Rights Reserved.
But Enough About Me…
• Solutions Engineer at DataStax • LA ➜ SF ➜ NYC ➜ SF ➜ London • Previously at JP Morgan in London • Finance & digital media
2
© DataStax, All Rights Reserved.
But Enough About Me…
• Solutions Engineer at DataStax • LA ➜ SF ➜ NYC ➜ SF ➜ London • Previously at JP Morgan in London • Finance & digital media
2
© DataStax, All Rights Reserved.
1 Introductions
2 Top Customer Questions
3 Field Lessons: Big Irish Bank
4 Field Lessons: Big British Bank
3
© DataStax, All Rights Reserved.
Top Customer Questions
• What are all the other [banks] doing? • How many nodes do I need? • What do you mean SSDs? • How do I load data from [Oracle]? • We already have [MongoDB] for NoSQL.
What’s the difference? • What are all the other [banks] doing?
4
What are all the other [banks] doing?
“Tell me secrets about my competitors.”
© DataStax, All Rights Reserved.
Transform Legacy Infrastructure
6
…USA Equities
UK FX
UK Bonds
Global Users
Legacy Systems
USA FX
DataStax Enterprise ClusterDSE
User Interface / Application Services
© DataStax, All Rights Reserved.
Transition Legacy to Microservices
7
Users µServices
DC NY1A B
C D
DC LDN1A Z
B
Messages
DC NY1
DC LDN1
DC NY1
DC LDN1
USA Customers
Data
UK Accounts
Legacy
C
DSE
DSE
How many nodes do I need?
“How long is a piece of string?”
© DataStax, All Rights Reserved.
The Node Count Dance
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs• Realities
– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs• Realities
– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs• Realities
– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases
• Lesson 1 ➔ Computer science is about trade-offs.
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs• Realities
– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases
• Lesson 1 ➔ Computer science is about trade-offs.• Lesson 2 ➔ Test, iterate, test.
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs• Realities
– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases
• Lesson 1 ➔ Computer science is about trade-offs.• Lesson 2 ➔ Test, iterate, test.• Lesson 3 ➔ Good news! DSE scales linearly.
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs• Realities
– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases
• Lesson 1 ➔ Computer science is about trade-offs.• Lesson 2 ➔ Test, iterate, test.• Lesson 3 ➔ Good news! DSE scales linearly.
9
What do you mean SSDs?
“We have an amazing SAN.”
© DataStax, All Rights Reserved.
Storage Matters
11
SSD (consumer grade)
• 10K – 1M IOPS • 400 MB – 3 GB bandwidth • < 200us latency
✴ Acknowledgements to my colleague Kathryn Erickson.
15K RPM HDD (spinning rust)
• ~ 200 IOPS • ~ 160 MB bandwidth • > 5 ms latency
© DataStax, All Rights Reserved.
Storage Interfaces Matter
12
Interface Transfer Rate
SATA III 6 Gb/s
SAS II 6 Gb/s
SAS III 12 Gb/s
PCIe Gen 2 x8 32 Gb/s
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?– Do not use network attached storage with DSE.
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.– No! Do not use network attached storage with DSE.
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.– No! Do not use network attached storage with DSE.
• Fine. What about EBS?
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.– No! Do not use network attached storage with DSE.
• Fine. What about EBS?– Let’s discuss!
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.– No! Do not use network attached storage with DSE.
• Fine. What about EBS?– Let’s discuss!
13
© DataStax, All Rights Reserved.
Starting Points
Workload CPU RAM Storage
DSE (Read Heavy) 8-24 cores 32-128 GB ✴ Local SSD (.5 - 2 TB)
DSE (Write Heavy) 12-32 cores 32-128 GB Local SSD (1-3 TB)
DSE + Search 16-32 cores 128 GB Local SSD (1-3 TB)
DSE + Analytics 16-32 cores 128+ GB Local SSD (1-3 TB)
✴ Got extra RAM? Cache is king.
✴✴ 1 Gb ethernet is fine. 10Gb is future-proof.
14
We already have [MongoDB] for NoSQL. What’s the difference?
“Behold the one true NoSQL database.”
© DataStax, All Rights Reserved.
NoSQL16
© DataStax, All Rights Reserved.
NoSQL16
© DataStax, All Rights Reserved.
NoSQL16
© DataStax, All Rights Reserved.
NoSQL16
© DataStax, All Rights Reserved.
NoSQLFan
tasy16
© DataStax, All Rights Reserved.
NoSQLFan
tasy16
© DataStax, All Rights Reserved.
1 Introductions
2 Top Customer Questions
3 Field Lessons: Big Irish Bank
4 Field Lessons: Big British Bank
17
© DataStax, All Rights Reserved.
Proof of Technology @ Big Irish Bank
18
Initial Goals
• Deploy on AWS • Ingest ten years of (fake)
customer data efficiently • Fast retrieval & search
Synopsis
• Payment Services Directive (PSD II) and Open Banking
• Customer access to current and historical data via APIs
• Competitive PoT versus other database vendors
© DataStax, All Rights Reserved.
Hardware
19
© DataStax, All Rights Reserved.
Hardware
19
PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only
Hardware
19
PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only
Hardware
19
PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only
Hardware
19
PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
PoT Final • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only
Hardware
19
PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
Production • 8 nodes across 2 data centers (4:4) • HP DL380 Gen9 ➔ 32 cores, 256 GB RAM, 3.2 TB SSDs on SAS III • 10 Gb ethernet, fiber between DCs
PoT Final • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only
Hardware
19
PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
Production • 8 nodes across 2 data centers (4:4) • HP DL380 Gen9 ➔ 32 cores, 256 GB RAM, 3.2 TB SSDs on SAS III • 10 Gb ethernet, fiber between DCs
PoT Final • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
© DataStax, All Rights Reserved.
Lessons
20
© DataStax, All Rights Reserved.
Lessons
20
1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.
© DataStax, All Rights Reserved.
Lessons
20
2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance
doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers?
1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.
© DataStax, All Rights Reserved.
Lessons
20
2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance
doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers?
1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.
3) EBS is still network attached. • 99% Read Latency (milliseconds)
▫ 3.311 ➔ local SSD ▫ 35.425 ➔ EBS Provisioned SSD
• Competing vendor falsified numbers. • Lies, damned lies, and statistics.
© DataStax, All Rights Reserved.
Lessons
20
2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance
doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers?
1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.
4) Not all data needs to be hot. • PoT Mark 1 ➔ 10 years of hot data
▫ ~ 20 billion transactions ▫ ~ 30 nodes to reach latency targets
• PoT Final ➔ 2 years of hot data • Do not architect by convenience.
3) EBS is still network attached. • 99% Read Latency (milliseconds)
▫ 3.311 ➔ local SSD ▫ 35.425 ➔ EBS Provisioned SSD
• Competing vendor falsified numbers. • Lies, damned lies, and statistics.
© DataStax, All Rights Reserved.
Lessons
20
2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance
doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers?
1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.
4) Not all data needs to be hot. • PoT Mark 1 ➔ 10 years of hot data
▫ ~ 20 billion transactions ▫ ~ 30 nodes to reach latency targets
• PoT Final ➔ 2 years of hot data • Do not architect by convenience.
3) EBS is still network attached. • 99% Read Latency (milliseconds)
▫ 3.311 ➔ local SSD ▫ 35.425 ➔ EBS Provisioned SSD
• Competing vendor falsified numbers. • Lies, damned lies, and statistics.
© DataStax, All Rights Reserved.
1 Introductions
2 Top Customer Questions
3 Field Lessons: Big Irish Bank
4 Field Lessons: Big British Bank
21
© DataStax, All Rights Reserved.
Production Pilot @ Big British Bank
22
Initial Goals
• Transition from mothballed trials of OrientDB, Titan
• Ingest enormous quantities of data from legacy DB
• Prove graph at scale
Synopsis
• Customer 360° use case across banking group
• DSE Graph • Dissatisfied with other
graph databases
© DataStax, All Rights Reserved.
Hardware
23
© DataStax, All Rights Reserved.
Hardware
23
Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
© DataStax, All Rights Reserved.
Hardware
23
Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
© DataStax, All Rights Reserved.
Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
© DataStax, All Rights Reserved.
Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
© DataStax, All Rights Reserved.
Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
Pilot Final • 3 x Dell C6220 • 12 cores, 128 GB RAM • 6 x 1 TB SATA HDDs
▫ 2 x OS ▫ 1 x commit log ▫ 3 x data, caches
© DataStax, All Rights Reserved.
Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
Production Target 16 nodes across 2 data centers (8:8) HP DL380 Gen9 ➔ 24 cores, 528 GB RAM, 3.4 TB SSDs
Pilot Final • 3 x Dell C6220 • 12 cores, 128 GB RAM • 6 x 1 TB SATA HDDs
▫ 2 x OS ▫ 1 x commit log ▫ 3 x data, caches
© DataStax, All Rights Reserved.
Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
Production Target 16 nodes across 2 data centers (8:8) HP DL380 Gen9 ➔ 24 cores, 528 GB RAM, 3.4 TB SSDs
Pilot Final • 3 x Dell C6220 • 12 cores, 128 GB RAM • 6 x 1 TB SATA HDDs
▫ 2 x OS ▫ 1 x commit log ▫ 3 x data, caches
© DataStax, All Rights Reserved.
Lessons
24
© DataStax, All Rights Reserved.
Lessons
24
1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.
© DataStax, All Rights Reserved.
Lessons
24
2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test.
1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.
© DataStax, All Rights Reserved.
Lessons
24
2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test.
1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.
3) Hardware matters, of course. • Leftover Hadoop boxes, spinning rust. • Get creative with configuration & tuning. • “Under no circumstances should you do
load tests on these boxes.”
© DataStax, All Rights Reserved.
Lessons
24
2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test.
1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.
4) Avoid surprises before deadlines. • Upgraded from RHEL 6.7 to 7.1. • CPU spikes made nodes unusably slow. • Revert! • Nobody move, nobody gets hurt.
3) Hardware matters, of course. • Leftover Hadoop boxes, spinning rust. • Get creative with configuration & tuning. • “Under no circumstances should you do
load tests on these boxes.”
© DataStax, All Rights Reserved.
Lessons
24
2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test.
1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.
4) Avoid surprises before deadlines. • Upgraded from RHEL 6.7 to 7.1. • CPU spikes made nodes unusably slow. • Revert! • Nobody move, nobody gets hurt.
3) Hardware matters, of course. • Leftover Hadoop boxes, spinning rust. • Get creative with configuration & tuning. • “Under no circumstances should you do
load tests on these boxes.”