Performance of CUDA Progrms on Cloud based Machines with Graphic Processor

High-end graphic processor significantly increases the price of a workstation. Hence, if you want to run extremely compute intensive application on a graphic processor, then cloud can be an option.

Syncious selected a simple but compute intensive problem to evaluate the performance of cloud based machines having graphic processor unit (GPU). The selected problem is defined by NVIDIA as a benchmark to compare CPU vs GPU processing. The CUDA (Compute Unified Device Architecture) programming model created by NVIDIA is a heterogeneous model in which both the CPU and GPU are used for general purpose computation.

NVIDIA Tesla GPU is selected for this experiment. NVIDIA Tesla series is used on datacenter servers and cloud. The selected GPU model is NVIDIA Tesla M60. In subsequent blog posts, we’ll compare the performance between Tesla M60 and Tesla V100.

Selected Problem: Perform addition of two floating point vectors of size 11444777

Above problem was written in a C along with CUDA code. Same program computes vector addition on CPU and GPU respectively. The results are as follows:

Time taken on CPU = 47.625000 (ms).

Time taken on GPU = 0.045000 (ms).

Comparison of output arrays on CPU and GPU are accurate within the limit of 0.000001.

Graph clearly shows the advantage of using servers with graphic processors for CUDA programming. Many of the industrial applications are being converted to CUDA programming. Hence, cloud based graphic processors are very useful for compute intensive workloads running on GPU.

Published by Syncious

Leave a comment Cancel reply

Share this:

Related

Published by Syncious

Leave a comment Cancel reply