Benchmarking of Microsoft Azure Cloud for A9 and H16R based HPC clusters

Syncious team executed Intel IMB benchmark tests on Microsoft Azure compute servers (VMs). We selected VMs of type A9 and H16R. VMs of type H16R performs much better than type A9. We got lower latency numbers and higher throughput with H16R as compared with A9 series. Both A9 and H16 VMs support RDMA capability. If high speed RDMA is not required, then cost effective options of VM type A11 or H16 can be selected.

 

Setup used:

Two Microsoft Azure A9 VMs with below specifications:

Number of CPU cores: 16
CPU Type: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
RAM: 112GB
Intel MPI Version: 5.1.3.181
Operating System: CentOS 7.1.1503

Two Microsoft Azure H16R VMs with below specifications:

Number of CPU cores: 16
CPU Type: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
RAM: 112GB
Intel MPI Version: 5.1.3.181
Operating System: CentOS 7.1.1503

IMB Sendrecv test was used for point to point MPI operations. H16R showed lower latency numbers than A9. For example, with 16 processes running on each MPI host, A9 reported average latency number of 1.98 microseconds for zero byte message size. For same test, H16R reported average latency number of 1.45 microseconds.  H16R outperformed A9 for higher message sizes also. For example, for message size of 4194304 bytes, A9 average latency was 2474 microseconds whereas H16R average latency was 1986 microseconds.

Here is a graph comparing H16R and A9 for different message sizes:

sendrecv_np32_latency

Even for H16R reported higher throughput while execution of IMB Sendrecv test. For example, with 16 processes on each host, A9 reported 3662 MB/s throughput whereas H16 reported 9638 MB/s throughput. Note that these numbers are bidirectional throughput numbers.

The graph below shows comparison for different message sizes:

sendrecv_np32_gbps

H16R series continued to outperform A9 series for collective MPI operations also. We ran IMB Alltoall test. With 16 processes running on each MPI host, A9 reported average latency of 0.2 microseconds whereas H16R reported average latency of 0.15 microseconds. For higher message size of 4194304, A9 reported average latency of 328129 microseconds whereas H16R reported average latency of 240126 microseconds.

Below graph shows the IMB Alltoall latency numbers for different message sizes:

alltoall_np32_latency

Based on our benchmarking, we concluded that using H16R series for HPC workloads will give better networking performance than A9 series. H16R series gives lower latency numbers and higher throughput numbers for point to point as well as collective MPI operations.

H16R HPC machines have the latest and more powerful Intel(R) Xeon(R) CPU E5-2667 v3 processor with 3.20GHz speed than A9 series which has slightly older Xeon(R) CPU E5-2670 0 processor running at 2.60GHz. Also, it seems the InfiniBand hardware used in H16R is latest as compared with the A9 series. Hence, overall result show that using H16R series for running distributed HPC workloads will take less amount of time to complete. Unfortunately, H16R VMs are not available at all Azure regions. Hence, if cost and location is not a constraint, then H16R will be a preferred option for HPC workloads on Microsoft Azure.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s