Upload
gordon-chung
View
430
Download
0
Embed Size (px)
Citation preview
past and present...
gord[at]live.ca@gord_chung
v4 features
□ simplified scheduling□ less pandas, more numpy□ Redis incoming driver□ In-memory incoming Ceph
driver□ Other general features:
■ http://gnocchi.xyz/releasenotes/4.0.html
■ http://gnocchi.xyz/releasenotes/unreleased.html
schedulingincoming data sharded into sacks to allow simple division of work acrossmetricd workers
numpyoldPandas - a monolithic, all-in-one, data analysis toolkit
newNumpy - a lightweight, high-performance, N-dimensional array (and a bit more) library
in-memorythe memory is mightier.leverage Redis driver or LevelDB/RocksDB internals for Ceph
benchmarksback with another one of those block rockin’ beats
v2 & v3node1
- OpenStack controller node- Ceph Monitor Service- Redis (coordination)
node2
- OpenStack Compute Node - Ceph OSD node (10 OSDs + SSD
Journal) - 18 metricd (24 in v2)
node3
- Gnocchi API (32 workers)- Ceph OSD node (10 OSDs + SSD
Journal) - 18 metricd (24 in v2)
node4
- OpenStack Compute Node- Ceph OSD node (10 OSDs + SSD
Journal)- PostgreSQL (- 18 metricd (24 in v2)
environmentv4.xnode1
- OpenStack controller node- Ceph Monitor Service- Redis- MySQL
node2
- OpenStack Compute Node - Ceph OSD node (10 OSDs + SSD
Journal)
node3
- OpenStack Compute Node - Ceph OSD node (10 OSDs + SSD
Journal)- Gnocchi API (32 workers)- 18 metricd
all nodes are physical servers:- 24CPU (48 hyperthreaded)- 256GB memory- 10K disks- 1GB network- CentOS 7.1
less services and hardware when running v4. all gnocchi services on single node
all tests use Ceph as a storage driver for aggregates.
data generated using benchmark tool in client (modified to use threads). 4 clients w/ 12 threads running simultaneously.
write throughput
total datapoints written per second. (higher is better)
number of requests made per second. (higher is better)
write throughput
test case 11K resources, 20 metrics each. flood Gnocchi with 60 individual points per metric. 1.2M calls/run. run it a few times.
time to POST 1.2M individual measures for 20K metrics to Gnocchi.
post time
v3.1 had anomaly that caused degradation over time.
processing time
v4 tests use 18 metricd, v3 test uses 54 metricd
time to aggregate all measures according to policy. (lower is better)
v4 only comparison
processing time
processing time
number of recorded, unprocessed measures over a single run
poor scheduling logic resulted inefficient handling of many tiny objects in v3.
processing time
number of recorded, unprocessed measures over a single run backlog size dependent on
both API’s ability to write data and metricd’s ability to process it.
test case 21K resources, 20 metrics each. flood Gnocchi with 60 batched points per metric. 20K calls/run. run it a few times.
processing time
v4 tests use 18 metricd for 3x8 aggregates/metric, v2 and v3 tests, use 72 and 54 metricd respectively
time to aggregate all measures according to policy. (lower is better)
aggregation time
time to aggregate 60 measures of a metric into 3x8 aggregates(lower is better)
average time reflects a combination of scheduling efficiency, computation efficiency and IO performance.
test case 3500 resources, 20 metrics each. flood Gnocchi with 720 batched points per metric. 10K calls/run. run it a few times.
time to aggregate all measures according to policy. (lower is better)
processing time
v4 tests use 18 metricd for 3x8 aggregates/metric. v2 and v3 tests, use 72 metricd
aggregation time
time to aggregate 720 measures of a metric into 3x8 aggregates(lower is better)
computation efficiency improved for larger series. ~3x improvement for 60 points and ~6x improvement for 720 points
some more numberspeep this...
time to aggregate metric with varying unbatched measure sizes (lower is better)
processing time
numbers represent optimal performance. benchmark was taken under zero load.
time to retrieve a single time series using curl and client(lower is better)
query time
client overhead attributed to but not limited to formatting
no significant performance difference vs v3
time to aggregate all measures according to default ‘medium’ policy. (lower is better)
default configurations
v3 tests use 54 metricd.v4 tests use 18 metricd.
- v3 medium policy:- minute/hourly/daily rollups- 8 aggregates each
- v4 medium policy: - minute/hourly rollups- 6 aggregates each
thanks!Any questions?
You can find me at@gord_chunggord[at]live.ca
?
Credits
Special thanks to all the people who made and released these awesome resources for free:□ Presentation template by
SlidesCarnival