Network Interface Cards (NICs) are commonly used for connecting computers into networks and enabling devices in the network to communicate with each other.
Due to the great demands for higher performance and lower latency, the developments of Ethernet technologies have never stopped. After being released for near one decade, the 10G Ethernet technology is now a very competitive option for servers and workstations with tight budgets.
Here we are going to evaluate the 10 NICs released within last two years. They are:
- Mellanox ConnectX CX311A
- SOLARFLARE 7322
- Chelsio T420-LL-CR
- QLOGIC QLE8442
- EXABLAZE EXANIC X2
The tests were carried out on two HFT systems with target network interface controllers (NICs) connected back to back.The focus of the benchmarking is to obtain the latency and bandwidth from various 10G NICs applied with optimizations and also evaluate the performance on MPI-intensive operations.In particular, latency was measured with the benchmark sockperf, an networking benchmark utility over socket API for different types of networking.
Bandwidth was measured by the benchmark iperf, a tool for active measurements of the maximum achievable bandwidth. It is commonly used for testing Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) data streams.
Intel MPI Benchmark (IMB) was used for evaluating the MPI intensive operations. Since the purpose of this test was to evaluate the performance between two NICs, the IMB was configured to only run on two processes, one on each system. The MPI operation evaluated in this test are Pingpong and Exchange.
- Pingpong: a single message is sent between two processes.
- Exchange: Processes exchange data with both left and right in the chain.
Benchmarking Configuration Details
The HFT System:
- Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz (turbo mode: 4.0GHz)
- 6 X 8G DDR3 1866 MHz Memory
- System: CentOS 6.5
- MPI Library: OpenMPI 1.8.4 compiled by GCC 4.4.7
- Intel(R) MPI Benchmarks 4.0 Update 1 compiled by GCC 4.4.7
- Sockperf version 2.5.241 compiled by GCC 4.4.7
- Iperf version 2.0.5
- Over-clocking bclk to 104
- Kernel Boot command line Options:
- CPU No. 12 was pinned during tests
- turbo mode enabled
- hyper-threading disabled
Network Cards Detailed Models and driver version:
|Network Cards||Driver Version|
|Mellanox ConnectX CX311A||MLNX_EN for Linux v2.3-1.0.0|
|Chelsio T420-LL-CR||Chelsio Unified Wire v22.214.171.124 for Linux|
|QLOGIC QLE8442||Driver RPMS V-7.11.05|
|EXABLAZE EXANIC X2||Driver and software package v1.4.2|
The results of Imb benchmark, iperf, sockperf are presented below.
Iperf Bandwidth Performance
The iperf benchmark were performed with the default configuration with the command at client side:
iperf -c 192.168.2.1
With general optimizations mentioned in Section tuning, QLOGIC QLE8442 and delivers the best performance, cx311a from Mellanox and exanicx2 are slightly left behind while Chelsio T420-LL-CR gave the worst performance.
To be noted that the performance from exanicx2 NIC was obtained by using the exasock tools provided by Exanic.
Sockperf UDP Latency Performance
The sockperf benchmark were performed with parameters
-t 5 -m 12 which set the running time for benchmark to be 5 seconds and the message size to be 12 bytes with the command from client side:
sockperf pp -i 192.168.2.2 -t 5 -m 12.
Please note that in order to achieve the optimised latency of EXANIC X2 card as suggested by the official documents, the exasock tool was used and the process was pinned to CPU core 12 such as:
exasock taskset -c 12 sockperf pp -i 192.168.0.1 -t 5 -m 12
With the help of exasock tool, the card exanix2 achieved the best performance in latency while cx311a from Mellanox and Solarflare are slightly left behind.
IMB MPI Performance
The MPI benchmark was launched using command:
mpirun -n 2 -host 192.168.2.1,192.168.2.2 --npernode 1 --allow-run-as-root IMB-MPI1 pingpong
Pingpong: a single message is sent between two MPI processes.
The network card Chelsio-T420 gives the best overall performance for both small and large messages. For messages less than 4Kbytes, Exanic x2 gives second best performance while when the message becomes larger Solarflare 7322 and QLE8442 start to perform better.
The network card CX311A delivers one of the top performance in terms of bandwidth and at the same time, it also give as a middle-range latency score. Solarflare 7322 delivers best bandwidth which is around 23% better than the average bandwidth performance and it also gives the middle range latency performance. But in terms of MPI operation, pingpong, CX311A is able to deliver solid performance while Solarflare 7322 is able to deliver middle class bandwidth for message less than 4096 bytes but gives the best performance for larger messages.
The EXANIC X2 Card has a great advantage in latency with the help of the exasock provided.
Chelsio T420-LL-CR card is the oldest among the NICs been tested and its ranking in bandwidth and latency also reflect this. However, its performance for MPI operations is not too much left behind.
QLE8442 presents the best performance in bandwidth but also ranked lowest in latency while delivers the middle class MPI operation performance.
- knowledge base and experience
- random driver issues
- with better understanding of the NICs, continue apply optimizations and push these cards to the limit
- with the same configuration, compare other cards or new comers in terms of bandwidth and latency.