Splash Image Reflex Benchmarks at Intel fasterLAB - May 2013

Description / Use Case
In many aspects of data processing in finance, such as the consumption of high-speed market data feeds directly from the exchanges themselves, it is desirable to have the lowest latency possible. Lower latency means more time to make a decision, as well as increasing the likelihood of capturing good trades through being the first to act. This test has been designed to measure the industry standard "half-round-trip" latency of Reflex, which is known to be a reliable indicator of expected production performance.

This test measures total latency, including that of Reflex, of the SolarFlare SFN5122F/OpenOnload/Reflex combined stack.

Measurements are of half-round-trip latency between two Reflex instances, one running a sender program and the other running an echo program. Timestamps were taken at the sender just prior to send of a source message and immediately upon receipt of its echo, using the clock_gettime(CLOCK_MONOTONIC_RAW, ...) system call to eliminate clock skew effects. The difference in times was then divided by two to get a half-round-trip time (1/2 RTT) measurement. These results were graphed as boxplots where the center line of the box indicates the mean latency. The upper and lower sides of the box indicate a quartile of data above and below the mean, respectively. The outer "whiskers" of the box plot indicate the 99 percentile of the measurements. The indicated events per second is the rate of the outgoing source transmissions. The total number of events being processed by Reflex is actually double this source rate because of the simultaneous receipt of echo messages.

Measurements were taken for the following transmission modes of Reflex:

  • TCP_NORMAL - Mode allowing batching of TCP transmissions
  • TCP_URGENT - Latency critical TCP transmissions
  • UDP_NORMAL - Mode allowing batching of UDP transmissions
  • UDP_URGENT - Latency critical UDP transmissions
Tested at payload sizes of 28 bytes and 1460 bytes, both with and without OpenOnload kernel-bypass technology.

Two machines connected back-to-back (without a switch), each having the same configuration of the following:
  • Industry Standard x86 Architecture
  • 16 core Intel(R) Xeon(R) CPU E5-2680 @ 2.70GHz
  • Solarstorm SFN5122F SFP+ Server Adapter
  • Reflex v1.1
  • OpenOnload version 201210-u1.
  • RedHat Enterprise Linux 6.4
  • Linux Kernel 2.6.32-358.el6.x86_64
  • Kernel arguments: "intel_idle.max_cstate=0 mce=ignore_ce isolcpus=4-15"
  • MTU set to 9000
  • scaling_governor set to "performance"
  • Unnecessary services were stopped
  • smp_affinity set for IRQs 105-120 to CPU5.
  • Reflex environment variables: RF_WORK_MODE=2 RF_NEVER_WAIT=1
  • onload --profile=latency
UDP_URGENT Solarflare, Payload 28 bytes
UDP_NORMAL Solarflare, Payload 28 bytes
TCP_URGENT Solarflare, Payload 28 bytes
TCP_NORMAL Solarflare, Payload 28 bytes
UDP_URGENT Solarflare, Payload 1460 bytes
UDP_NORMAL Solarflare, Payload 1460 bytes
TCP_URGENT Solarflare, Payload 1460 bytes
TCP_NORMAL Solarflare, Payload 1460 bytes
Technical specifications
UDP_URGENT Solarflare, Payload 28 bytes
UDP_NORMAL Solarflare, Payload 28 bytes
TCP_URGENT Solarflare, Payload 28 bytes
TCP_NORMAL Solarflare, Payload 28 bytes
UDP_URGENT Solarflare, Payload 1460 bytes
UDP_NORMAL Solarflare, Payload 1460 bytes
TCP_URGENT Solarflare, Payload 1460 bytes
TCP_NORMAL Solarflare, Payload 1460 bytes