Performance of CUDA Progrms on Cloud based Machines with Graphic Processor

High-end graphic processor significantly increases the price of a workstation. Hence, if you want to run extremely compute intensive application on a graphic processor, then cloud can be an option.

Syncious selected a simple but compute intensive problem to evaluate the performance of cloud based machines having graphic processor unit (GPU). The selected problem is defined by  NVIDIA as a benchmark to compare CPU vs GPU processing. The CUDA (Compute Unified Device Architecture) programming model created by NVIDIA is a heterogeneous model in which both the CPU and GPU are used for general purpose computation.

NVIDIA Tesla GPU is selected for this experiment. NVIDIA Tesla series is used on datacenter servers and cloud. The selected GPU model is NVIDIA Tesla M60. In subsequent blog posts, we’ll compare the performance between Tesla M60 and Tesla V100.

Selected Problem: Perform addition of two floating point vectors of size 11444777

Above problem was written in a C along with CUDA code. Same program computes vector addition on CPU and GPU respectively. The results are as follows:

Time taken on CPU = 47.625000 (ms).

Time taken on GPU = 0.045000 (ms).

Comparison of output arrays on CPU and GPU are accurate within the limit of 0.000001.


Graph clearly shows the advantage of using servers with graphic processors for CUDA programming. Many of the industrial applications are being converted to CUDA programming. Hence, cloud based graphic processors are very useful for compute intensive workloads running on GPU.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s