36
The fastest NoSQL database Talking about Go Performance Try it while I blab ! github.com/aerospike/aerospike-server github.com/aerospike/aerospike-client-go

Golang Performance : microbenchmarks, profilers, and a war story

Embed Size (px)

Citation preview

Page 1: Golang Performance : microbenchmarks, profilers, and a war story

The fastest NoSQL database!!

Talking about Go Performance!!

Try it while I blab !! github.com/aerospike/aerospike-server!

github.com/aerospike/aerospike-client-go!

Page 2: Golang Performance : microbenchmarks, profilers, and a war story

Who am I ?

Brian [email protected][email protected]!

@bbulkow!

TRS-80, PC, Apple II, Vax 11/70, Wang First product: lightpen university teaching kiosk Palo Alto High School ( ‘85 )

Liberate / NetComputer through the boom

10B market cap in 1999, employee 32

2003-2007 “time off” ( startups ) Citrusleaf / Aerospike history

42 year old first-time CEO (me) 2008 Prototype 2010 First sales “get the band back together” 2011+ 3 rounds of funding (Draper, ALP, NEA, CNTP) 70 employees, 2 offices

Page 3: Golang Performance : microbenchmarks, profilers, and a war story

Does brian know performance?

Brian [email protected][email protected]!

@bbulkow!

Undergrad project: image converter Single pass arbitrary scale and rotate w/ nyquist filters

Novell

Fastest Appletalk server + router available

Starlight Networks 150Mb/sec video server on P133

Liberate

HTML technology for embedded systems

Aggregate Knowledge Realtime reccommendations: 2x faster in first week

Aerospike 10x faster than existing NoSQL, 100x faster than RDBMs

Page 4: Golang Performance : microbenchmarks, profilers, and a war story

Internet Technology Stack

MILLIONS OF CONSUMERS BILLIONS OF DEVICES

APP SERVERS

DATA WAREHOUSE INSIGHTS

WRITE CONTEXT

In-memory NoSQL

WRITE REAL-TIME CONTEXT READ RECENT CONTENT

PROFILE STORE

Cookies, email, deviceID, IP address, location, segments, clicks, likes, tweets, search terms...

REAL-TIME ANALYTICS

Best sellers, top scores, trending tweets

BATCH ANALYTICS Discover patterns,

segment data: location patterns, audience

affinity

Page 5: Golang Performance : microbenchmarks, profilers, and a war story

Who uses Aerospike?

theTradeDesk

… to name a few!

Page 6: Golang Performance : microbenchmarks, profilers, and a war story

Aerospike is High Performance

0 100000 200000 300000 400000 500000 600000 700000 800000 900000

1000000 1100000 1200000 1300000 1400000 1500000 1600000 1700000

Balanced Read-Heavy

Aerospike 3 (in-memory) Aerospike 3 (persistent) Aerospike 2 Cassandra MongoDB Couchbase 1.8 Couchbase 2.0

Page 7: Golang Performance : microbenchmarks, profilers, and a war story

Easy Clients ( better than JSON )

Go!Python!

Page 8: Golang Performance : microbenchmarks, profilers, and a war story

Also, analytics

http://www.aerospike.com/community/labs/!

Page 9: Golang Performance : microbenchmarks, profilers, and a war story

If it is so good, why haven't I heard of it?

Established in 2009 (newer than most)

Used in Advertising – ad exchanges, data exchanges, targeting, real-time bidding, real-time attribution.

Open Sourced in June 2014

Page 10: Golang Performance : microbenchmarks, profilers, and a war story

When should I use Aerospike? Redis, but with scale & flash

Cassandra, but fast

User data, session data, behavior, fraud…

API billing ~ retail actions ~ recommendations

Up and running in 10 minutes!( vagrant, EC2 …)!

Page 11: Golang Performance : microbenchmarks, profilers, and a war story

Why does Aerospike care about Go? It’s cool !

Promises performance with expressive ( as an old C guy, Go is aimed at me )

Our customers are diving in, deploying

What about (other versions of other languages)…( sure, they’re cool too! )

Go!

Page 12: Golang Performance : microbenchmarks, profilers, and a war story

Some old microbenchmarks

Profilers, how to run it

War story: optimizing our Go client

( sure, we know Go isn’t JUST about performance )

Let’s talk about….

Page 13: Golang Performance : microbenchmarks, profilers, and a war story

Old Microbenchmark In Nov 22 2009, I posted to Golang Nuts

Page 14: Golang Performance : microbenchmarks, profilers, and a war story

Old Microbenchmark Seconds (Nov 2009) 1.1 - python (CPython 2.6.2, the distro release with no tweaks) "4.6 - go (current hg release) "4.2 - ruby 1.8 (distro release) "1.1 - ruby 1.9 (distro release)

Pike said: "I suspect the great majority of the time in your benchmark is due to Go's current rudimentary garbage collector.  Tests like this generate a lot of garbage that is collected slowly.  From experiments I've done, a better implementation can make a huge difference.  Profiling this test shows at least 50% of the time is in the allocator and collector, as opposed to about 5% printing the string and less than 15% in the map code.  A better allocator and collector would make a dramatic change. ""The short answer: the Go runtime is new and completely untuned.  The libraries need work too.

Page 15: Golang Performance : microbenchmarks, profilers, and a war story

Microbenchmark “T1” for i := 0; i < 1000000; i++ { x = ( 2 * x ) + x + 1 }1.96 s (big integer only) Python 1.04 ms (2.17s big.Int) Go 5 ms (2.15s BigNum) Java Good news: go is right in the hunt, but easier to code Amazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)

Page 16: Golang Performance : microbenchmarks, profilers, and a war story

Microbenchmarks T5 – the 2009 benchmark12.5 sec Python 12.56 sec Go 2.56 sec Java Good news: not slower than python!Bad news: Holy Crap compared to Java

Amazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)

Page 17: Golang Performance : microbenchmarks, profilers, and a war story

Microbenchmarks – the old code T5 – the 2009 benchmark (slower CPU) for x := 0; x < 1000000; x++ { a := make(map[int] string); for a1 := 0; a1 < 50; a1++ { a[a1] = strconv.Itoa(a1); }}12.56 secondsAmazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)

Page 18: Golang Performance : microbenchmarks, profilers, and a war story

Microbenchmarks – tune the map T5 – the 2009 benchmark for x := 0; x < 1000000; x++ { a := make(map[int] string, 50); for a1 := 0; a1 < 50; a1++ { a[a1] = strconv.Itoa(a1); }}7.80 secondsAmazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)

Page 19: Golang Performance : microbenchmarks, profilers, and a war story

Microbenchmarks – remove the Itoa T5 – the 2009 benchmark for x := 0; x < 1000000; x++ { a := make(map[int] string, 50); for a1 := 0; a1 < 50; a1++ { a[a1] = "123456”; }}

5.45 secondsAmazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)

Page 20: Golang Performance : microbenchmarks, profilers, and a war story

Microbenchmarks – singleton Map T5 – the 2009 benchmarka := make(map[int] string, 50);for x := 0; x < 1000000; x++ { // a := make(map[int] string, 50); for a1 := 0; a1 < 50; a1++ { a[a1] = "123456”; }}2.03 seconds ! Finally better than Java ! Amazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)

Page 21: Golang Performance : microbenchmarks, profilers, and a war story

Microbenchmarks – Java T5 – the 2009 benchmarkfor (int x=0; x < 1000000; x++) {

HashMap<Integer, String> a = new HashMap<Integer, String>();for (int a1=0; a1 < 50; a1++) {

a.put(a1, Integer.toString(a1) );}

}2.56 secondsAmazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)

Page 22: Golang Performance : microbenchmarks, profilers, and a war story

Any ideas?

( I haven’t figured it out yet )

Page 23: Golang Performance : microbenchmarks, profilers, and a war story

Next microbenchmarks ! Float, String

Go Channels vs Java Futures … couldn’t code the java part in time!

Simple TCP echo, but with transactions

Log processing

Ruby 2.1, Go 1.4…

Your votes ?

Page 24: Golang Performance : microbenchmarks, profilers, and a war story

Profilers pprof is pretty great!

Import in all your main’s, does not seem to hurtimport _ "net/http/pprof”

Add the HTTP listener ( only on flag )

// launch http pprof listener if in profile mode if *profileMode { go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }()

}

Page 25: Golang Performance : microbenchmarks, profilers, and a war story

Profilers Take a 30 second snapshotgo tool pprof http://localhost:6060/debug/pprof/profile?seconds=xx

pprof prompt: ‘top 10’ (pprof) top 10

Total: 3852 samples 1187 30.8% 30.8% 1254 32.6% syscall.Syscall 304 7.9% 38.7% 304 7.9% ExternalCode 172 4.5% 43.2% 175 4.5% github.com/aerospike/aerospike-client-go/pkg/ripemd160._Block 137 3.6% 46.7% 233 6.0% runtime.mallocgc 98 2.5% 49.3% 98 2.5% runtime.futex 79 2.1% 51.3% 86 2.2% runtime.MSpan_Sweep 77 2.0% 53.3% 77 2.0% scanblock 68 1.8% 55.1% 68 1.8% runtime.xchg 46 1.2% 56.3% 46 1.2% runtime.epollwait

Page 26: Golang Performance : microbenchmarks, profilers, and a war story

Profilers (pprof) web

Page 27: Golang Performance : microbenchmarks, profilers, and a war story
Page 28: Golang Performance : microbenchmarks, profilers, and a war story

Profilers Good old ‘oprofile’, let’s not forget it –--- ( especially if you can get kernel symbols, hard )

sudo yum -y install oprofile Start capturing sudo opcontrol --reset sudo opcontrol --no-vmlinux sudo opcontrol –start

Run your program sudo opcontrol --dump sudo opcontrol --shutdown

Dump your resultsudo opreport -l --demangle=smart --debug-info

Cheat Sheet http://www.bonsai.com/wiki/howtos/tuning/oprofile/

Page 29: Golang Performance : microbenchmarks, profilers, and a war story

Profilers opreportsamples % linenr info image name app name symbol name 28106 56.5877 (no location information) no-vmlinux no-vmlinux /no-vmlinux 6216 12.5151 rand.go:76 benchmark benchmark math/rand.(*Rand).Int31n 3940 7.9327 rng.go:232 benchmark benchmark math/rand.(*rngSource).Int63 1987 4.0006 benchmark.go:255 benchmark benchmark main.randString 1584 3.1892 rand.go:43 benchmark benchmark math/rand.(*Rand).Int63 1465 2.9496 rand.go:93 benchmark benchmark math/rand.(*Rand).Intn 1421 2.8610 rand.go:49 benchmark benchmark math/rand.(*Rand).Int31 354 0.7127 ripemd160block.go:45 benchmark benchmark github.com/aerospike/aerosp ike-client-go/pkg/ripemd160._Block 349 0.7027 mgc0.c:720 benchmark benchmark scanblock 307 0.6181 malloc.goc:40 benchmark benchmark runtime.mallocgc 205 0.4127 mgc0.c:1783 benchmark benchmark runtime.MSpan_Sweep 138 0.2778 memmove_amd64.s:33 benchmark benchmark runtime.memmove 131 0.2638 asm_amd64.s:600 benchmark benchmark runtime.xchg

Page 30: Golang Performance : microbenchmarks, profilers, and a war story

Tuning the Aerospike Client

What does the client do?!!Maintain the DHT state!!Keep a connection pool!!Make requests to the right servers!!Box / unbox to wire protocol…!

SIMPLE

Page 31: Golang Performance : microbenchmarks, profilers, and a war story

Tuning the Aerospike Client Attempt 1: run pprof!!The usual dance of making life!easy for the garbage collector !(just like java)!!pprof worked!!the hot objects showed up!!Cache easily with Sized Channels !!!!

Page 32: Golang Performance : microbenchmarks, profilers, and a war story

Tuning the Aerospike Client

Attempt 2: oprofile!!oprofile found rand() taking time!!Optimization gave nothing!!… not sure why not …!!Currently happy with throughput!

Page 33: Golang Performance : microbenchmarks, profilers, and a war story

Tuning the Aerospike Client Latency problem at customer site !!!User validating a server install with a quick Go client!“17 ms average latency @ 20K TPS” --- terrible!!!Server measured at 0.4 ms @ 40k TPS, ! -- ping ok! -- it’s the client!!Where’s the latency source? GC? Green Threads? Network?! -- Profile shows low GC load! -- Hard to measure thread latency!

EC2 m3.xlarge ($0.05/hr)!4 core E5-2670 @ 2.5 Ghz!Bare metal vs Virtual!Centos 6 vs Latest Kernel!Intel SSDs vs RAM!

Page 34: Golang Performance : microbenchmarks, profilers, and a war story

Tuning the Aerospike Client GO!!!

Java!!!

Page 35: Golang Performance : microbenchmarks, profilers, and a war story

What happened? •  Not sure what happened at deployment !

(yet, suspect old kernel)!

•  A week lost by developers using MacOS, Laptop!(MacOS is showing bad latency)!

•  C code is running slower – we think it’s random fill of buffer!

•  Lesson: just switch to Linux 3.12-ish kernels!

•  Lesson: fewer lines ~ 11k Go, 17k Java!

•  Lesson: for network / IO, these languages are THE SAME !

Page 36: Golang Performance : microbenchmarks, profilers, and a war story