Using MIG (Multi-Instance GPU) with SyncHPC-AI

Introduction

As AI, ML, and HPC workloads grow, efficient GPU utilization has become critical. Traditionally, a single GPU served one workload at a time, often leaving resources underused. To solve this, NVIDIA introduced Multi-Instance GPU (MIG) technology.

MIG allows a single GPU to be securely partitioned into up to seven fully isolated GPU instances for CUDA applications, each with dedicated compute, memory, and cache paths. This ensures predictable performance, quality of service (QoS), and fault isolation, making it ideal for isolating workloads across multiple MIGs. For example, there can be multiple ML trainings running on individual instance of a whole GPU.

With MIG, multiple workloads or users can share a GPU without interfering with one another, enabling parallel job execution and optimal utilization. It works seamlessly across bare-metal deployments, containers (PODs), and virtualized environments, treating each instance as if it were a dedicated physical GPU.

What is MIG GPU?

MIG (Multi-Instance GPU) is a feature available in NVIDIA’s Ampere architecture and later GPUs. It enables a single GPU, such as the H100 or H200, to be divided into up to seven independent GPU instances. Each instance behaves like a dedicated GPU with its own memory, cache, and compute cores.

This means multiple users or workloads can share one physical GPU without interfering with each other. For organizations running diverse workloads like AI inference, training, or VDI (Virtual Desktop Infrastructure), MIG maximizes GPU utilization and reduces costs.

How MIG Works

  • A high-end GPU (like H100, H200 & B100) can be split into up to 7 smaller GPU instances.
  • Each instance behaves like a separate GPU to the OS, drivers, and applications.
  • Each partition has guaranteed resources:
    • Compute (SMs)
    • High-Bandwidth Memory (HBM)
    • Cache
    • Copy Engines

This ensures predictable performance and strong isolation.

Using MIG with SyncHPC

  1. Flexible Partitioning from SyncHPC: SyncHPC allows GPUs to be split into configurations such as 1g.5gb, 2g.10gb, or larger slices, depending on workload requirements.
  2. Scalability: Multiple instances on one GPU enable organizations to run different jobs simultaneously, improving throughput and reducing idle GPU time.
  3. Scheduling PODs on GPU Instances: SyncHPC allows to schedule PODs on GPU instances as per input provided by user. This can be used to schedule ML training jobs on one or more GPU instances. SyncHPC interacts with Kubernetes scheduler to achieve this.
  4. AI Inferencing Workloads: Multiple models can be deployed on seperate GPU instances to achieve granularity in inferencing infrastructure.

Supported GPUs

MIG is supported on: Check the full list of GPUs supported by MIG here: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-gpus

Leave a comment

Blog at WordPress.com.

Up ↑