Syncious team executed Intel IMB benchmark tests on Microsoft Azure compute servers (VMs). We selected VMs of type A9 and H16R. VMs of type H16R performs much better than type A9. We got lower latency numbers and higher throughput with H16R as compared with A9 series. Both A9 and H16 VMs support RDMA capability. If high speed RDMA is not required, then cost effective options of VM type A11 or H16 can be selected.
Setup used:
Two Microsoft Azure A9 VMs with below specifications:
Number of CPU cores: 16
CPU Type: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
RAM: 112GB
Intel MPI Version: 5.1.3.181
Operating System: CentOS 7.1.1503
Two Microsoft Azure H16R VMs with below specifications:
Number of CPU cores: 16
CPU Type: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
RAM: 112GB
Intel MPI Version: 5.1.3.181
Operating System: CentOS 7.1.1503
IMB Sendrecv test was used for point to point MPI operations. H16R showed lower latency numbers than A9. For example, with 16 processes running on each MPI host, A9 reported average latency number of 1.98 microseconds for zero byte message size. For same test, H16R reported average latency number of 1.45 microseconds. H16R outperformed A9 for higher message sizes also. For example, for message size of 4194304 bytes, A9 average latency was 2474 microseconds whereas H16R average latency was 1986 microseconds.
Here is a graph comparing H16R and A9 for different message sizes:
Even for H16R reported higher throughput while execution of IMB Sendrecv test. For example, with 16 processes on each host, A9 reported 3662 MB/s throughput whereas H16 reported 9638 MB/s throughput. Note that these numbers are bidirectional throughput numbers.
The graph below shows comparison for different message sizes:
H16R series continued to outperform A9 series for collective MPI operations also. We ran IMB Alltoall test. With 16 processes running on each MPI host, A9 reported average latency of 0.2 microseconds whereas H16R reported average latency of 0.15 microseconds. For higher message size of 4194304, A9 reported average latency of 328129 microseconds whereas H16R reported average latency of 240126 microseconds.
Below graph shows the IMB Alltoall latency numbers for different message sizes:
Based on our benchmarking, we concluded that using H16R series for HPC workloads will give better networking performance than A9 series. H16R series gives lower latency numbers and higher throughput numbers for point to point as well as collective MPI operations.
H16R HPC machines have the latest and more powerful Intel(R) Xeon(R) CPU E5-2667 v3 processor with 3.20GHz speed than A9 series which has slightly older Xeon(R) CPU E5-2670 0 processor running at 2.60GHz. Also, it seems the InfiniBand hardware used in H16R is latest as compared with the A9 series. Hence, overall result show that using H16R series for running distributed HPC workloads will take less amount of time to complete. Unfortunately, H16R VMs are not available at all Azure regions. Hence, if cost and location is not a constraint, then H16R will be a preferred option for HPC workloads on Microsoft Azure.