Gnocchi v4 - past and present

past and present...

gord[at]live.ca@gord_chung

v4 features

□ simplified scheduling□ less pandas, more numpy□ Redis incoming driver□ In-memory incoming Ceph

driver□ Other general features:

■ http://gnocchi.xyz/releasenotes/4.0.html

■ http://gnocchi.xyz/releasenotes/unreleased.html

http://gnocchi.xyz/releasenotes/4.0.html

http://gnocchi.xyz/releasenotes/unreleased.html

schedulingincoming data sharded into sacks to allow simple division of work acrossmetricd workers

numpyoldPandas - a monolithic, all-in-one, data analysis toolkit

newNumpy - a lightweight, high-performance, N-dimensional array (and a bit more) library

in-memorythe memory is mightier.leverage Redis driver or LevelDB/RocksDB internals for Ceph

benchmarksback with another one of those block rockin’ beats

v2 & v3node1

- OpenStack controller node- Ceph Monitor Service- Redis (coordination)

node2

- OpenStack Compute Node - Ceph OSD node (10 OSDs + SSD

Journal) - 18 metricd (24 in v2)

node3

- Gnocchi API (32 workers)- Ceph OSD node (10 OSDs + SSD

Journal) - 18 metricd (24 in v2)

node4

- OpenStack Compute Node- Ceph OSD node (10 OSDs + SSD

Journal)- PostgreSQL (- 18 metricd (24 in v2)

environmentv4.xnode1

- OpenStack controller node- Ceph Monitor Service- Redis- MySQL

node2


Journal)

node3


Journal)- Gnocchi API (32 workers)- 18 metricd

all nodes are physical servers:- 24CPU (48 hyperthreaded)- 256GB memory- 10K disks- 1GB network- CentOS 7.1

less services and hardware when running v4. all gnocchi services on single node

all tests use Ceph as a storage driver for aggregates.

data generated using benchmark tool in client (modified to use threads). 4 clients w/ 12 threads running simultaneously.

write throughput

total datapoints written per second. (higher is better)

number of requests made per second. (higher is better)

write throughput

test case 11K resources, 20 metrics each. flood Gnocchi with 60 individual points per metric. 1.2M calls/run. run it a few times.

time to POST 1.2M individual measures for 20K metrics to Gnocchi.

post time

v3.1 had anomaly that caused degradation over time.

processing time

v4 tests use 18 metricd, v3 test uses 54 metricd

time to aggregate all measures according to policy. (lower is better)

v4 only comparison

processing time

processing time

number of recorded, unprocessed measures over a single run

poor scheduling logic resulted inefficient handling of many tiny objects in v3.

processing time

number of recorded, unprocessed measures over a single run backlog size dependent on

both API’s ability to write data and metricd’s ability to process it.

test case 21K resources, 20 metrics each. flood Gnocchi with 60 batched points per metric. 20K calls/run. run it a few times.

processing time

v4 tests use 18 metricd for 3x8 aggregates/metric, v2 and v3 tests, use 72 and 54 metricd respectively


aggregation time

time to aggregate 60 measures of a metric into 3x8 aggregates(lower is better)

average time reflects a combination of scheduling efficiency, computation efficiency and IO performance.

test case 3500 resources, 20 metrics each. flood Gnocchi with 720 batched points per metric. 10K calls/run. run it a few times.


processing time

v4 tests use 18 metricd for 3x8 aggregates/metric. v2 and v3 tests, use 72 metricd

aggregation time

time to aggregate 720 measures of a metric into 3x8 aggregates(lower is better)

computation efficiency improved for larger series. ~3x improvement for 60 points and ~6x improvement for 720 points

some more numberspeep this...

time to aggregate metric with varying unbatched measure sizes (lower is better)

processing time

numbers represent optimal performance. benchmark was taken under zero load.

time to retrieve a single time series using curl and client(lower is better)

query time

client overhead attributed to but not limited to formatting

no significant performance difference vs v3

time to aggregate all measures according to default ‘medium’ policy. (lower is better)

default configurations

v3 tests use 54 metricd.v4 tests use 18 metricd.

- v3 medium policy:- minute/hourly/daily rollups- 8 aggregates each

- v4 medium policy: - minute/hourly rollups- 6 aggregates each

thanks!Any questions?

You can find me at@gord_chunggord[at]live.ca

?

Credits

Special thanks to all the people who made and released these awesome resources for free:□ Presentation template by

SlidesCarnival

http://www.slidescarnival.com/

Technology

Gnocchi v4 - past and present