Hybrid HPC at Engineering Design Company

Case Study: HPC+VDI at On-Premise and Cloud

Background

A leading engineering design company, committed to serve the leading organisations in automotive and aerospace space. To stay competitive and innovate efficiently, it integrated High-Performance Computing (HPC) into its operations. This case study explores how SyncHPC helped their R&D design and simulation processes. As the companies HPC requirement grew, so did its need for robust computational resources to accelerate their R&D needs.

Objectives

Enhance Computational Power: Reduce simulation time to improve productivity and customer deliverables.
Cater to flexible needs for different types of projects: Each project has a different need of computation resources. Also, the simulation applications can be different types. For example, some of them can be memory intensive vs some of them can be CPU intensive.
Optimise Costs: There is a need to optimise the use of computation resources according to the actual need. Many times organization’s build extra capacity than the requirement or they are in want of additional computation power.

On-Premise HPC Deployment

Infrastructure:

SyncHPC based Multiple HPC clusters consisting of total capacity of about 1000 CPU cores, few high-performance GPUs.
Support for “Slurm” scheduler added to have consistency across cloud and on-premise.
In-house IT team managing system maintenance, software updates, and data security.

Advantages:

Performance at fixed cost: Due to CAPEX investment, users keep running their regular and predictable tasks on-premise cluster.
Support: Due to in-house team of HPC, they could support their user requirements.

Challenges:

Scalability Limitations: Expanding the cluster required significant capital investment and time to procure new hardware.
Maintenance Overhead: High operational costs related to energy consumption, cooling, and physical space.
Obsolescence: Rapid advancements in technology necessitated frequent upgrades to stay competitive.

Cloud HPC Solution

Transition:
To overcome limitations with their on-premise infrastructure, the team piloted a cloud-based HPC solution using multiple cloud providers.

Deployment:

Utilized cloud resources for simulation and modeling tasks which cannot be accommodated in their premise.
Implemented a mix of reserved and pay-as-you go instances to optimize costs.

Advantages:

Scalability: Easily scale resources on demand to accommodate peak workloads during critical design phases.
Cost Efficiency: Pay-as-you-go pricing allowed for better budget management without heavy upfront investments.
Access to Latest Technologies: Immediate availability of cutting-edge hardware with both Intel, AMD CPUs and powerful GPUs for complex simulations.

Challenges:

Data Transfer Bandwidth: Initial data transfers to the cloud increased simulation start times, impacting productivity. But, with updated network bandwidth users did not face much hurdles.
Latency challenges for pre/post processing: Customer had built a VDI using SyncHPC in cloud. There were initial latency challenges. Later, they were reduced with improvement of connectivity channel between cloud and customer’s premise.
Cost: Usage of cloud should be very much streamlined. The cost associated with cloud can lead to unimaginably high if not controlled properly. Using SyncHPC’s auto-scale and HPC-Unit features, the cost was controlled significantly. Also, the reports and notification alerts from SyncHPC helped the management to optimise the resource utilization along with cost.

Results

Performance Comparison:

Computational Fluid Dynamic (CFD) simulations improved speed significantly on cloud due to scalability.
Collaboration between teams increased, leading to faster iterations on design and development.

Cost Analysis:

Optimisation of cost had been a challenge as things were still new. But, now management could analyse the right resources (on cloud or on-premise) to be allocated for specific workloads.
Management’s aim is towards toward innovative R&D initiatives rather than continuous hardware upgrades.

Conclusion

Organisation determined that a hybrid HPC model—retaining on-premise resources for sensitive projects while utilising cloud infrastructure for scalable, computationally intensive tasks—best met its needs. This approach balanced performance, cost-effectiveness, and data security.

Recommendations

Hybrid Infrastructure Strategy: Develop a clear strategy for when to utilize on-premise versus cloud resources based on project requirements.
Optimise Data Management: Implement efficient data transfer solutions to minimize latency and streamline workflows between on-premise and cloud environments.
Regular Assessment: Continuously evaluate performance metrics and costs to ensure the most efficient use of HPC resources.

This case study illustrates how a big engineering design services company effectively navigated the challenges of high-performance computing to drive productivity.