Syncious Team selected OpenFOAM as one of the applications to analyze Public Cloud based HPC Clusters. OpenFOAM (for “Open source Field Operation And Manipulation”) is the free, open source software used to solve problems of computational fluid dynamics (CFD). Computational intensive CFD problems can be solved by using HPC clusters.
Team referred the blog by Dr. Donald Kinghorn for configuration of OpenFOAM solver. We really want to thank Dr. Donald Kinghorn for such a informative blog.
OpenFOAM provides palatalization by partitioning (decomposing) the geometry of model (domain). The newly created partitions are assigned to individual processors (CPU Cores) in HPC cluster. But, some of the domain elements are shared between the processors. The processors need to communicated with each-other. Hence, the speed of communication between processors can affect the overall performance.
Public Cloud based HPC systems were selected to analyze the effect of inter-processor communication within HPC Cluster on OpenFOAM. We wanted to analyze the effect of remote direct memory access (RDMA), which affects the inter processor communication between two or more nodes of HPC cluster.
Microsoft Azure Cloud with Virtual Machines (VM) of Type H16R and H16 are selected for comparing the results. Both VMs types have same CPU configuration. But, only H16R provides RDMA capacity.
For performance analysis and comparison two clusters are created as described in configuration details. Syncious on-demand HPC solution is used to deploy these clusters.
Setup Details:
Cluster 1: Two VMs with Non-RDMA connection:
Number of CPU cores: 16
CPU Type: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
RAM: 112GB
Intel MPI Version: 5.1.3.181
Operating System: CentOS 7.1.1503
OpenFOAM version 2.3
RDMA Enabled: NO
Interconnect: Ethernet
Cluster 2: Two VMs with RDMA Connection:
Number of CPU cores: 16
CPU Type: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
RAM: 112GB
Intel MPI Version: 5.1.3.181
Operating System: CentOS 7.1.1503
OpenFOAM version 2.3
RDMA Enabled: YES
Interconnect: DAPL with Intel MPI (Underline Hardware Interconnect is InfiniBand)
Selected Problem To Solve:
The configuration of test problem is selected based on recommendation from a blog on “OpenFOAM performance on Quad socket Xeon and Opteron”. The Lid driven cavity flow problem is selected from OpenFOAM tutorials. It was decomposed for parallel execution on multiple number of CPU cores on HPC systems. Navier-Stokes solver, “icoFoam” is selected for incompressible laminar flow on a 2D mesh.
Results:
As expected, there is no visible difference between two types of VMs (RDMA and Non-RDMA) when only single VM is used to solve the problem. As the number of cores are increased, the performance improves linearly on single VM. The results shown in tables are for single VM performance for icoFoam.
When processors from both VMs from HPC cluster are used together, then there is significant difference between the two clusters. As an experiment, equal number of cores from each VM in the cluster are selected to run icoFoam solver.
The RDMA based cluster shows quite a good performance improvement. In fact, the RDMA based cluster shows almost linear performance scaling as number of cores are increased from both VMs in the cluster.
Here are the results for RDMA based cluster:
But, Non-RDMA cluster does not show linear results as the number of cores from both VMs are increased. In fact, the results are degraded after increasing cores beyond some number. Hence, inter-processor communication between 2 VMs is the bottleneck in this cluster.
Here are the results for Non-RDMA based cluster:
Following plot compares the results between RDMA and Non-RDMA based cluster.
Discussion:
For Single VM, the “icoFoam” performance scales linearly with core count.
But for cluster, RDMA and underline interconnect plays very significant role in performance. Most of the public clouds do not provide RDMA capabilities. The selection of RDMA based HPC clusters from cloud is very important for specific CFD problems.