SPECIAL REPORT Low latency and real-time kernels for telco ...6wind.testdrive-advantech-nfv.com/...NFVtelcos... · a telco infrastructure, certain key, determining factors should

SPECIAL REPORT

Low latency and real-time kernels for telco and NFVThis report allows decision makers to choose the appropriate Linux kernel technology for their applications.

Contents

03 Kernel performance summary

04 NFV and the kernel

05 Reliable

06 Supportable

07 Efficient

08 Kernel comparison

08 Stress-ng

09 Ubuntu kernel release cycle

10 Summary analysis and results

12 NFV Leadership

13 Testcasedescriptionsandkeyfindings 14 Wakeup events per second

17 Throughput and overhead 31 Sigma

35 Cyclic test 37 Latency distribution, idle machine 39 Latency distribution, stress mixed 41 Latency distribution, stress heavy 43 Latency distribution, 8 × VMs 45 Latency distribution, kernel build 48 UDP test 47 UDP test 49 HZ-test 51 TCP test

52 Stress-ng tests 53 Stress-ng bogo ops/sec (real time) 53 Stress-ng bogo ops/sec (user + system time) 53 Stress-ng instructions per cycle

54 References

54 Contact

Executive overview The kernel is the fundamental core of a computer operating system. It is the first program to load, and it manages all core functions of the computer. With the expanding role of the Linux kernel in systems today, Canonical is often asked to provide leadership and support for kernel offerings for many purposes. Recently, Network Functions Virtualisation (NFV) has taken root in the production environments of telecoms operators. The individual components of NFV, known as Virtualised Networks Functions (VNFs), are critically tied to a general purpose operating system, as opposed to being integrated into application-specific devices. This union makes performance and support of core operating system components, like the kernel, the foundational metrics of success.

There are flexibility and economic benefits of using general purpose software for function-specific applications, but as VNFs are mission critical services in a telco infrastructure, certain key, determining factors should be kept in mind when selecting an operating system to host them. This special report will demonstrate that Ubuntu 16.04 LTS offers the necessary, proven reliability, while also offering supported kernel choices that enhance the efficiency of VNF operations and functions.

Executive conclusion

Low-latency Linux kernel configurations are more favourable for both throughput and user space CPU access. Preempt-full Linux kernels tend to provide “real-time” performance but at significant operational cost. Low-latency kernels offer near “real-time” performance without the same significant overhead of real-time alternatives.

03

Kernel performance summary

Ubuntu, the most widely deployed server operating system on the Internet, supports multiple kernel options across multiple architectures. This paper addresses the two x86 64-bit Linux kernels currently supported by Canonical, the low latency Linux kernel and the all-around, generic Linux kernel, in the context of comparison to additional preemptive patches available As workloads that require characteristics like low latency1, low jitter2 and other similar features are increasingly migrated to Ubuntu, the kernel team at Canonical continues to analyze and review the performance characteristics of a multitude of kernel compilation and patch options.

In this special report, we have focused on comparing a low latency kernel to a kernel patched to be fully preemptive, or real-time3, with the generic kernel as a baseline. The testing, performed and analysed by Colin Ian King, one of Canonical’s kernel engineers and leading performance experts, has shown that the majority of real world workloads requiring low latency, low jitter and real-time characteristics are efficiently serviced by the currently supported, Ubuntu low-latency Linux kernel. The Ubuntu low-latency Linux kernel offers balanced, high performance for real world NFV infrastructure.

1 Low latency kernels are a type of soft real-time kernel in that they attempt to meet expected deadlines, but can miss; events are completed with minimal delays (low latencies). It is not considered a fatal error to miss deadlines, merely a degradation. An example would be live audio-visual communications. 2 Jitter refers to variability in timing of an expected signal. During audio or network communications jitter creates a degraded experience. Low jitter refers to minimising variability and maintaining undetectable (e.g. inaudible) variations in the kernel’s responsiveness to servicing jitter sensitive applications. 3 Real-time kernels attempt to always have a deterministic response time to service events and attempt to never miss a deadline. Deadlines must always be met, regardless of system load, or a real-time process request is considered to have failed. An example would be life-supporting medical equipment.

Low latency and real time kernels for telco and NFV

04

NFV and the kernel

Modern NFV solutions that facilitate traditional and modern real-time communications are not immune to jitter and delay. They must be implemented, though, in a manner that does not consume the entirety of a system’s resources before they have scaled to a system density that provides acceptable return on investment. The Ubuntu low latency kernel makes it possible to implement a well-supported model with investment returns that meet business expectations.

Examples like Radio Access Functions demonstrate real world workloads that require low jitter, low latency, and quality of service. They may be running LTE functions on Radio Access Network (RAN) VNF’s providing not just data, but voice services, as well. Voice communications systems are extremely sensitive to delay, as delay and stuttering ( jitter) make the verbal communications experience suboptimal.

Other Virtualised Network Functions may not place the same latency demands on the kernel subsystem. VNFs like Broadband Network Gateways, Deep Packet Inspection, virtualised Customer Premise Equipment, etc., all require performance and stability, but likely scale better in an NFV infrastructure hosted by the Ubuntu generic kernel.

05

Reliable

Kernel reliability is achieved through a thoughtful design process, a skilled engineering team and community, and volume of use in production. While design and engineering are obvious to most people, volume and diversity of implementation is perhaps even more important. The more organizations that rely on a piece of software, the Ubuntu Linux kernel, in this case, the more it is rigorously tested, refined, and improved.

Most VNFs are deployed in clouds, implemented as Big Software – software that scales beyond the scope of a single system, based on scale-out, microservices architectures. The vast majority of cloud workloads run on Ubuntu, in fact more than all other Linux distributions put together. That means the Ubuntu Linux kernel is the most production tested kernel in the world of Linux today.

Since reliability is the most important aspect of an operating system kernel, Canonical, the commercial sponsor of Ubuntu, does continuous integration testing of multitudes of kernel combinations. Ubuntu offers performance diversity in kernel selection, but with reliability always being the focus.


06

Supportable

Telcos and service providers are tasked with mission critical operations. Beyond the assumption that the chosen technologies to support those operations are highly reliable is the need for them to be supported. The Ubuntu Linux kernel team provides telco SLA-backed support for the upstream Linux kernel, as delivered by Ubuntu and Ubuntu Advantage.

The generic Linux kernel is excellent for most workloads, but not always the optimal Linux kernel for every application. As providers of scalable and performance-bound solutions, it is important to be able to choose the right kernel for the right service. Ubuntu supports the generic Linux kernel, as well as a low-latency Ubuntu Linux kernel using entirely upstream –open source, community reviewed, code.

The economic benefits of Canonical performing kernel maintenance and support is another key factor in meeting business expectations for investment returns. For every organization to provide its own fixes, security patches, kernel module integration, and operating system platform testing would be cost prohibitive, and often just not possible. Ubuntu Linux kernels are built, patched and maintained by the kernel experts at Canonical that work on Ubuntu every day.

07

Efficient

Most Linux kernels are capable of running a wide variety of workloads. Identifying the right kernel with the right features for optimal efficiency is not always as easy as picking the aptly named option. With this in mind, Canonical’s Colin Ian King ran an extensive series of tests using a program he created to specifically stress specific aspects of the kernel subsystems.

The efficiency analysis was performed in an effort to determine optimal kernel selection for performance sensitive and hyperscale workloads, where small percentages of difference can have tremendous impact on the ability of a solution to scale while providing the appropriate quality of service.

A performant system must be operationally efficient. A performant kernel must be responsive but not to the detriment of other subsystems. When performing analysis of real-time and low-latency kernels, there are some key metrics that are to be observed. Wake-up times should be clustered together. The kernel should return to service requests as quickly and efficiently as possible.

In order to provide immediate responsiveness, yield points are added to the kernel’s subroutines. These yield points allow an external operation to interrupt the kernel and ask it to give immediate priority to something else.

There must be a balance between the kernel completing existing tasks, and external tasks taking priority. If that balance is skewed in either direction, the kernel will either be slow to respond to external requests, or the kernel will become sluggish to respond to non-priority requests, burdening the system, and throttling performance via indirect subsystems. The required yield points to implement a fully preemptive kernel can increase interrupts significantly and affect overall system performance.


08

Kernel comparison

While many different kernel configurations were exercised against a set of load scenarios, this paper compares the load results of the Ubuntu 16.04 LTS generic kernel 4.4.0-18-generic, the Ubuntu low-latency kernel 4.4.0-18-low-latency and an Ubuntu Linux kernel patched to be fully preemptive 4.4.0-rt5-preempt-full. The analysis of these loads focuses on:

• Minimising latency

• Minimising clock jitter

• Total system throughput

• Scheduling overhead

• Balanced operational efficiency

The intent was to measure and determine the overall CPU cost of real time and low latency kernels. The tool used to perform the in-depth analysis of the kernels is called stress-ng45. Stress-ng provides a useful set of results as stress-ng is capable of stressing the majority of Linux system calls. This provides confidence that the results reflect performance across the majority of kernel interfaces.

Stress-ng

Colin Ian King wrote stress-ng to:

• Push a machine to its absolute limits

• Identify kernel bugs and perform regression testing

• Make the machine, via kernel specific functions, perform in different and very specific ways depending on the test and the mode in which it is

4 project page: kernel.ubuntu.com/~cking/stress-ng/ 5 git repository: git://kernel.ubuntu.com/cking/stress-ng.git

09

Ubuntu kernel release cycle

Each Ubuntu Long Term Support (LTS) release ships with a current, stable kernel. That GA kernel release will continue to receive regular kernel updates every 3 weeks throughout its 5 year life cycle, with critical patches being released as needed between the 3 week updates.

Additional hardware enablement kernels (HWE) are released beginning with the first .2 release, e.g. 16.04.2, and for each point release thereafter. These HWE kernels receive the same level of support and maintenance as the GA kernel. The user is free to elect to remain on the original GA kernel or opt in to the HWE kernel stream.

The cadence provides a stable, progressive kernel release policy. Some Linux distributions do not change kernel versions for many years, potentially resulting in large, overly complex patch sets being applied to a dated kernel; not able to take full advantage of the latest hardware and software solutions in market. By offering HWE update kernels, Ubuntu does not suffer from obtuse patch sets or lack of feature offerings, but maintains kernel stability throughout the lifecycle of an LTS release.

The Ubuntu Kernel Support and Schedule is published as a wiki, updated as necessary. The current kernel support schedule for the Ubuntu 16.04 LTS release is diagrammed below, as is the overall LTS release schedule in general.


10

Summary analyses and results

A range of stress tests are run to find if the preempt kernels are resilient and so that we can compare the low-latency Ubuntu Linux kernel against the various preempt kernel configurations. The complete results are published as addendum to this paper.

The tests show that, in general, the real-time 4.4.0-rt5-preempt-full kernel consumes more CPU reducing throughput on the user processes and diminishing available CPU for user space processes. Low-latency kernel configurations are more favourable for both throughput and user space CPU access. Preempt-full provides “real-time” performance at an operational cost. Low-latency kernels offer near “real-time” performance with a significant reduction in overall system overhead than the current preempt-full kernel.

Unless the low-latency kernel does not meet an exact preemption specification, low-latency is an excellent option to use for servicing real-time and low latency requirements. It requires no extra kernel patches and is easier and less costly to maintain than the preempt kernel options.

None of this is to say that fully preemptive real-time kernels are unnecessary. Some systems, particularly dedicated devices, may be better suited to a preemptive solution than a balanced one. But, in hyper-scale architectures with densely packed compute nodes, reducing overhead while maintaining responsiveness is the balanced approach likely to be preferred by most telecoms operators.

The Linux Foundation recently announced the intention to include real-time kernel support in the upstream kernel. The Ubuntu Linux kernel team will work with the community and continue to monitor the progress of the real-time kernel. As the work matures, and begins to offer additional value and capability, Canonical will continue to consider real-time kernel support as a standard practice.

The summary analysis of the results is as follows:

1. The most aggressive preemption configuration preempt-full should, in theory, provide best preemption characteristics without a long latency distribution tail. Essentially, the time to respond to a request should remain consistent, and requests should be efficiently clustered, to reduce excessive churn against the kernel’s other priorities.

11

a. This comes at an overall performance cost; interrupt rates are higher, there is slightly busier (C0 fully active) CPU running state and hence higher (~0.3W) power usage on the test machine.

b. Preempt-full actually seems to add 2-3 microseconds extra to latencies compared to generic or low-latency kernels, not reduce them.

c. There is a noticeable reduction in throughput on several use cases tested with stress-ng with preempt-full.

d. POSIX timer cyclic tests with preempt-full show an unexpected delayed latency overhead of 10s of microseconds. The distribution spread is also wider on preempt-full.

2. The low-latency kernel configuration provides improved latency timings compared to the generic kernel and at times is as good as, or better than some of the preempt kernels.

a. Low-latency, like preemption, is not cost free; it has overhead compared to the generic kernel depending on the test scenario, this can range from 10-60% throughput decrease.

b. Unlike preempt-full, low-latency does seem to be responsive in some heavy workloads such as heavily loaded VMs, where NFV applications typically run.

c. Low latency does not have the expensive interrupt overhead that preempt-full has.

3. Generally speaking, low-latency and generic kernels can handle wakeup events well in a timely manner on relatively low-loaded systems where there is plenty of free CPU headroom.

4. Under heavy loads, the preempt-full kernel should be able to service wakeup events with a shorter tail in the latency distribution than generic and low-latency. However, tests such as VM stress tests show this not to be the case. The choice of full preemption may not win over low-latency for the desired low latency outcomes for certain loads and activity profiles.


12

NFV leadership

Careful analysis and understanding of intended workload is required when choosing the best kernel option for respective NFV applications. Canonical works with the world’s largest and most progressive telco operators and network equipment vendors to ensure that the kernel offerings in Ubuntu meet their needs. As of this publication, the generic and low-latency Linux kernels satisfy almost all use cases for telco and NFV.

Beyond the operating system kernel, Canonical also offers its V-PIL program, the VNF Performance and Interoperability Lab. V-PIL allows operators to objectively compare the performance and interoperability of service chained functions within a standardized, objective environment. The rapid interchange of VNFs within the service chain is enabled by the use of Juju and its application modeling capabilities.

The analysis and results from the tests published here, as well as real-world experience, have brought Canonical to a leadership position in the SDN and NFV spaces. For further analysis and discussion of your NFV infrastructure requirements, we encourage you to contact us at ubuntu.com/about/contact-us/form

Test case descriptions andkeyfindings

14

Wakeup events per second (idle system)

In this set of tests, we check to see if there is any noticeable overhead in terms of system utilization on an idle machine.

Key findings:

1. The preempt-full shows a very high interrupt rate, > 2000 per second and consequently also a high context switch rate.

2. The high interrupt rate causes the preempt-full kernel to be resident in the processor C0 (fully on) state than other kernels (1-1.5% more CPU time overall).

3. Preempt-full kernel is consuming ~0.3 Watts compared to average. This is due to the higher interrupt rate.

Wakeup events peer second (idle system)

15


16

Wake up events per second (idle system)

Throughput and overhead

18

In this set of tests, we run several stress-ng stress tests across the kernels and compare throughput in terms of stress-ng “bogo operations per second”. The

“Bogo/ops (usr+sys time)” metrics are useful for measuring system throughput across a range of test scenarios. These show the total amount of bogo operations divided by all the user and also system time consumed across all the 8 busy CPUs on the test machine.

The 4.4 generic kernel is used as a baseline for comparison. The statistics on the chart “% change compared to 4.4 generic kernel” shows the % difference in throughput compared to this 4.4 baseline kernel.

Stress-ng test “cpu 8” - this has minimal system call overhead as it just consumes CPU cycles in userspace performing compute bound calculations.

1. Differences across the kernels are ~1-2%, which is surprising as this is generally a user space compute bound stressor.

Stress-ng test “clock 8” - this exercises clock setting and reading.

1. Preempt-none has same throughput as generic.

2. Preempt-full has a ~2.9% reduction in throughput compared to 4.4 generic.

3. 4.4 low-latency has a ~1.1% reduction in throughput compared to 4.4 generic.

Stress-ng test “context 8” - this forces rapid stack switching (much like lightweight thread context switches) using the swapcontext library call. This should be all user space bound processing and will impact on register save/restore and cache/memory read/write rates.

1. Of all the preemption configurations, preempt-full is showing the largest decrease in throughput by ~21.3%.

2. The 4.4 low-latency kernel has a significant performance decrease of ~90%.

Stress-ng test “hdd” - this performs various file I/O read/writes to simulate busy file activity.

1. Of the preemption configurations, preempt-ll has a performance decrease of ~57%, and preempt-full has a decrease of 28%.

Stress-ng test “msg” - this performs inter-process communication between a parent and child process pair sending messages using System V messages.1

1. Low latency, preempt-rtb, and preempt-full are seeing a performance throughput decrease of ~60% or more.

2. Preempt-voluntary and preempt-none show the same performance as the 4.4 generic kernel.

Stress-ng test “sigfd” - this generates a heavy context switch load by generating


19

SIGRT signals which are then read via a process using signalfd(2).

1. Preempt-voluntary, preempt-rtb, preempt-none show a 15-25% throughput increase compared to 4.4 generic.

2. Preempt-full shows a ~45% performance decrease compared to 4.4 generic.

Stress-ng test “sock” - this performs rapid socket connect/send/receive/disconnects between a client and server on the test machine.

1. All preempt and low-latency kernels show small 5-15% performance increase over the 4.4 generic kernel. This corresponds to faster handling of message related wakeups and context switching.

Stress-ng test “switch” - this forces rapid context switching by ping-ponging messages between parent and child processes over a message pipe. This should show any IPC and context switch overhead.

1. Preempt-lowlatency, preempt-rtb and preempt-full show a performance decrease of ~10-15% compared to 4.4 generic.

Stress-ng test “timer” - this exercises the timer clock interrupt handler by generating multiple instances of a 1MHz timer clock. Each clock event is caught by a signal handler, so potentially this generates millions of interrupts and software signals.

1. Preempt-rtb and preempt-full show a performance decrease of ~75% compared to 4.4 generic.

2. Preempt-none is about the same as 4.4. Generic.

3. 4.4 low-latency kernels show ~10% performance loss compared to 4.4 generic.

4. Marginal differences between preempt-none and preempt-voluntary to 4.4 generic.

Stress-ng test “udp” - this exercised data transmission between clients and servers on the test machine using UDP. This performs rapid connect, send, receive and disconnects as well and a high context switch rate.

5. 4.4 low-latency throughput shows little difference to 4.4 generic.

6. Preempt-full shows a ~32% performance decrease compared to 4.4 generic.

7. Preempt-low-level and preempt-rtb show a ~13-17% performance decrease compared to 4.4 generic.

Stress-ng test “vm” - this exercises memory allocation and freeing using mmap and munmap. Memory is also written to causing memory bandwidth hogging.

1. All kernels are comparable in performance.


20


21


22


23


24


25


26


27


28


29


30


31

Sigma

In this test, the latencies are measured on cyclic tests [2] and we measure the maximum latency required for 50%, 68.27%, 95.45%, 99.73%, 99.99% and 99.99994% wakeups on timers to occur (these correspond to standard deviation sigmas 0.67, 1, 2, 3, 4 and 5, see [5]). For example, the 2 sigma results measure the maximum latency in microseconds required for 95.449974% of wakeups to occur.

In an ideal system, we want a low maximum latency for the sigma 5, that is, 99.99994% of wakeups can occur with a low latency skew, showing that the kernel is able to meet the low latency demands for the majority of events.


32

We measure two sets of wakeups on POSIX timer sleeps: 10,000us sleeps and 500us sleeps.

1. Generic and low-latency kernels have less overhead than preempt, and can handle < 68.27 (1 sigma) of samples faster than preempt.

2. Generic and low-latency kernels behave well when not under heavy load.

3. Under heavy load, preempt kernels can service the full set of events with less latency.

Note that the sigma tests do not show the full latency distributions and are hence of limited value. See the various idle, stress-mixed, stress-heavy and VM Cyclic tests next in this document.


33


34


Cyclic test

36

The cyclic tests run the cyclictest [2] high resolution timer test to measure latency skews on various system loads. The data gathered allows one to see the distribution of latencies from timer delays. The peak point (modal latency) shows where most latencies cluster, and the latency distribution graphs show how well the kernels can schedule around the modal point and if there are any long “tails” on the curve showing that there are sometimes long delays in the kernel before a process gets woken up.

An ideal system will have the majority of wakeups clustering at the modal point and should have a minimal “short tail” of latencies. A long tail of latencies shows that some paths in the kernel are taking a while to be preempted during critical sections where the kernel cannot be interrupted.

4 sets of tests are executed:

• POSIX timer: 10,000 us sleep, 500 us sleep.

• Clock nanosleep: 10,000 us sleep, 500 us sleep.

Each test is run 10,000 times and we measure the latency between the expected wakeup time and the actual wakeup time. We keep a count (frequency) of these latencies and see where the latencies cluster for the various kernels.

When referring to the original spreadsheet data (available here), the modal latency points for each test have been highlighted in yellow.

Cyclic tests

37

Latency distribution, idle machine

In this scenario, the system is idle.

1. POSIX timers sleep have more latency skew than clock_nanosleep.

2. 4.4 preempt-full has longest modal latency point and is ~1us longer than the other kernels. This could be because of the additional overhead to manage full preemption.


38

Cyclic tests

39

Latency distribution, stress mixed

In this scenario, the system is loaded with 8 compute bound processes, 8 processes generating signals, 8 processes exercising various memory mappings and 8 processes changing CPU affinity.

1. POSIX timers sleep have a latency skew of 2-4us more than clock_nanosleep.

2. The 4.2 and 4.4 generic and low-latency kernels show similar latency distributions, with the low-latency kernels with less of a tail, showing that they can meet the timing requirements better than the generic kernels.

3. The 4.4 preempt-full kernel has a 2-3us latency overhead on POSIX timers, which may be considered surprising.


40

Cyclic tests

41

Latency distribution, stress heavy

In this scenario, the system is overloaded with many stressors covering CPU compute, file system, network and scheduler. This is a typical load scenario but is useful to see how responsive a system is under heavy pressure.

1. POSIX timers sleep have a latency skew of 2-4us more than clock_nanosleep.

2. For POSIX timers, the low-latency kernel is the most responsive and the preempt full kernel shows a lag of 2-3us compared to other kernels.

3. For clock nanosleep, the preempt-full manages to get a larger proportion of wakeups in the modal point compared to other kernels


42

Cyclic tests

43

Latency distribution, 8 × VMs

In this scenario, the 8 CPU host is running 8 virtual machines running Ubuntu Xenial with 1 CPU per VM. Each VM is heavily overloaded running hundreds of processes - all the stressors in stress-ng.

1. For POSIX timers, the preempt-full kernel shows considerable latency delays of ~35us for the 10,000us timer and ~8us for the 500us timer compared to the low-latency kernel.

2. For clock nanosleep, the low-latency kernel compares favourably to the preempt kernels.


44

Cyclic tests

45

Latency distribution, kernel build

In this scenario, the linux kernel is compiled with -j 8, that is, 8 concurrent compilations on the 8 CPU test machine.

1. The low-latency kernels compare favourably to the preempt kernels.

2. The preempt-full kernel suffers from latency skew of ~5us compared to the low-latency kernels.


46

Cyclic tests

47

UDP test

In this test, data is transferred via UDP from a sender to a receiver on the test machine. The test is run with packet sizes of 1 to 16384 bytes and receive time jitter and throughput rates are measured. An ideal system should have optimal data transfer rates and minimal jitter (latency skews) when receiving packets.

1. For receive rates, all 4.4 kernels perform almost identically.

2. The 4.4 kernels show a 1.7% improvement over the 4.2 kernels in throughput rate.

3. The jitter was too random in the measurements to be useful.


48

Cyclic tests

49

HZ-test

This test measures nanosleep() jitters for various delays from 100us to 10,000,000us on the different kernels. The test is run with various loads to see how this affects the different kernels. The hz-test tool [4] was used to perform the measurements.

1. An ideal system should have minimal jitter (latency skew) for all delays.

2. For the stress-ng --cpu 8 load, there is little difference between the kernels. This is unsurprising as these scenarios involve little or no system call activity and hence minimal kernel space delays.

3. For the heavier loaded stress-ng --affinity 8 --sigq 8 --cpu 8 --vm 8 scenario preempt-full shows it has ~5us extra latency on the 100us sleeps compared to the other configurations. The 4.4 low-latency kernel seems to be the best overall kernel for lowest latencies.

4. For the kernel build on small 100us sleeps, the preempt-full kernel shows a large 136us latency compared to the other kernels that are averaging around 52us latency.

5. For the fully loaded 8 x VM test scenario, the 4.4. generic kernel suffers from large 362us latencies on the 100us sleeps, whereas the 4.4 preempt-full and 4.4 low-latency kernels perform well with minimal differences in latencies compared to the slightly loaded tests.


50

Cyclic tests

51

TCP test

In this test, data is transferred via TCP from a sender to a receiver on the test machine. The test is run with packet sizes of 1 to 16384 bytes and receive time jitter and throughput rates are measured.

1. For receive rates, all kernels perform almost identically.


Stress-ng tests

53

Stress-ng tests

In these tests, we run each of the stress-ng [3] stress tests and compare various stress-ng metrics for each kernel. The gathered data is averaged and then normalised against this average. The normalised data is then summed to give an overall view of system impact each different kernel has on different types of metrics across a very wide range of stress tests.

Stress-ng bogo ops/sec (real time)

This metric shows the amount of bogo operations each stressor has performed divided by the overall wall clock (run time) of the stress test. This gives us some notion of total bogo operations run over the 8 CPU machine based on wall clock time. The Sum of normalised metrics shows:

1. For 4.4 kernels, the preempt-none and generic kernels give maximum bogo operation throughput, with the preempt-voluntary and low-latency kernels not far behind. The preempt-full kernel’s easily observable lower throughput shows that there is an overhead cost (reduced throughput) to provide full preemption.

Stress-ng bogo ops/sec (user + system time)

This metric shows the amount of bogo operations each stressor has performed divided by the total user and system times consumed of the stress test. This gives us some notion of total bogo operations run over the 8 CPU machine in terms of real CPU time consumed.

The Sum of normalised metrics shows:

1. For the 4.4. kernels, the preempt-full and preempt-rtb kernels have the least throughput.

Stress-ng instructions per cycle

This statistic gives some notion to the amount of extra work being performed per cycle by each kernel. Assuming that the stress tests are, in general, not behaving differently per run, then the only difference left can be the differences in each kernel. A higher number of instructions per cycle may indicate that the kernel is working harder per cycle to achieve the same stressor throughput. The sum of normalised metrics show:

1. The preempt-full kernel is performing more work than other kernels.

2. The generic kernels perform less work than other kernels.

3. The low-latency kernels are not much behind the generic kernels.

This metric is possibly dubious, but may indicate that preemption does exercise the CPUs harder per unit of execution time.


References

cyclictest: rt.wiki.kernel.org/index.php/Cyclictest

stress-ng: kernel.ubuntu.com/~cking/stress-ng/

hz-test: kernel.ubuntu.com/git/cking/debug-code/.git/tree/hz-test

kernel.org/pub/linux/kernel/projects/rt/4.4/

en.wikipedia.org/wiki/Standard_deviation

Contact

If you want the raw data or if you would like to discuss the outcomes on this report with members of our engineering team please contact [email protected]

© Canonical Limited 2016. Ubuntu, Kubuntu, Canonical and their associated logos are the registered trademarksof Canonical Ltd. All other trademarks are the properties of their respective owners. Any information referredto in this document may change without notice and Canonical will not be held responsible for any such changes.

Canonical Limited, Registered in England and Wales, Company number 110334C Registered Office:12 - 14 Finch Road, Douglas, Isle of Man, IM99 1TT VAT Registration: GB 003 2322 47

Documents

SPECIAL REPORT Low latency and real-time kernels for telco ...6wind.testdrive-advantech-nfv.com/...NFVtelcos... · a telco infrastructure, certain key, determining factors should