HPC infrastructure deployment and its maintenance is a costly affair. Hence, many of the organizations are evaluating an option of offloading their HPC workloads to cloud. But, the decision to choose between two is always complex and CIOs need to work on various aspects.
Many organizations select one of the three HPC deployment models:
- 100% On-premise HPC
- 100% Cloud based HPC
- Hybrid HPC
Let’s go through important considerations used to select a particular HPC deployment model.
Identify the HPC requirements:
A. Workload:
- Percentage utilization of HPC cluster
- Application, MPI library, other supporting libraries
- Duration of each run
- Multi-user support and job scheduler
- On-premise server integration (License servers, Databases, Active Directory, etc.)
- Input and output data during processing
- Is it “embarrassingly parallel” or “tightly coupled”?
- GUI requirements
- High Availability (HA) requirements
- Security considerations
- ‘Burst’ scenario: An event when huge HPC infrastructure is required for small time period.
B. Hardware:
- Number of CPU Cores, CPU model ID and speed
- Memory size and type of memory (DDR4 , etc)
- Disk size, Disk IOPs
- Data-storage and file system (Ext3, Ext4, NTFS, NFS, Luster, GPFS, etc.)
- Interconnect/Network between compute nodes (10Gbps ethernet, InfiniBand 56Gbps, etc.)
C. Software:
- Operating System of each node
- HPC Management and monitoring software
- Job scheduler
- User management
- MPI libraries, compilers, etc.
D. Data:
- Input and output data size and its movement to and from HPC
- It’s storage requirements like IOPs
- Backup and Disaster recovery requirements
- Location of data
E. Others
- HPC Administrator availability and its cost to company
- Power consumption and power backup
- Real-estate for HPC cluster
- Cooling/HVAC requirements
- Vendor support and SLA
Once the requirements are identified, then selection depends on cost-analysis, productivity and other miscellaneous aspects like security, legal, location of service, etc.
Let’s discuss the most prominent aspects used for decision making.
Pricing / Cost-analysis:
Each of the items listed in HPC requirements section is associated with a price tag. Team should identify the Total Cost of Ownership (TCO) based on HPC requirements. The TCO includes many factors like capital expenditure, maintenance cost, depreciation of assets, duration of usage, etc.
Productivity:
Productivity is as important as pricing of HPC deployment model. Productivity considerations include HPC cluster compute performance, ease of end-user workflow, multi-user management, HPC job scheduling infrastructure, data management and movement, deployment time required for ‘burst’ scenarios, etc.
In summary, selection process of HPC deployment model is complex. This article describes some guidelines about identifying HPC requirements. Organizations should map these requirements with each of the deployment models. Syncious helps organizations to calculate TCO and associated productivity matrix for each of the three deployment models.