RACC2 – GPU computingGPU computing uses a graphics processing unit (GPU) as a co-processor to accelerate a central processing unit (CPU) for general scientific computing. While GPUs were originally designed for graphics workloads, they are now widely used to speed up compute-intensive applications.
Many parallelised applications run significantly faster by offloading their most computationally demanding sections to the GPU, while the remainder of the code continues to run on the CPU. A typical CPU has a small number of powerful cores (usually four to eight), whereas a GPU contains hundreds or thousands of smaller cores, enabling much higher throughput for suitable workloads.
Many scientific applications support GPU acceleration and can be developed or enhanced using frameworks such as NVIDIA’s CUDA toolkit, which provides GPU-optimised libraries as well as debugging and performance-tuning tools.
On RACC2, users can request one or more GPUs and combine them with an suitable number of CPUs. This is illustrated in the example SLURM submission script below, available in /software/slurm_examples/gpu/.
#!/bin/bash # standard CPU directives, tip: use cpus-per-task to allocate one or more cores per GPU #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --threads-per-core=1 # plus the GPU line, you can request one or more GPUs on the same node #SBATCH --gres=gpu:1 # partition 'gpuscavenger' or project partition # jobs in gpuscavenger use idle time on GPUs and might get killed end re-queued #SBATCH --partition=gpuscavenger #SBATCH --job-name=example_gpu_job #SBATCH --output=gpu_out.txt #SBATCH --time=24:00:00 #(24 hours is the default in the partition 'gpu_limited') #SBATCH --mem=48G #optional for debugging hostname nvidia-smi echo CUDA_VISIBLE_DEVICES $CUDA_VISIBLE_DEVICES #and the actual job ./gpu_hello_apptainer.sh
The above is an example of a job where we expect almost all the work is done on the GPU. Hence, we request just 1 CPU core (‘-cpus-per-task=1’). Allocating CPU and GPU cores does not always work like in the example above. Some jobs run both on CPUs and on GPUs, in which case it might be beneficial to allocate more CPU cores. GPUs are requested with the directive #SBATCH –gres=gpu:N, where N is the number of GPUs your job will use. In the example above we allocate just one GPU.
There are 2 options to access GPUs:
In the example above, we have included several commands that can help diagnose potential issues with GPU access. Printing the hostname allows you to identify the node on which your job is running. In addition, the ‘nvidia-smi’ command displays information about the installed NVIDIA driver and the available GPUs. Successful output from this command confirms that GPUs are present on the system and that the NVIDIA drivers are correctly installed.
The example job script gpu_hello_apptainer.sh demonstrates how to use Apptainer with PyTorch to run an application on a GPU.
The following nodes are currently available
| node | GPUs | device RAM memory (per device) | system RAM memory |
| racc2-gpu-0 | 3 x Tesla H100 | 96 GB | 384 GB |
| racc2-gpu-1 | 2 x Tesla H100 | 96 GB | 768 GB |
| racc2-gpu-1 | 4 x Tesla L40S | 48 GB | 768 GB |
| racc2-gpu-1 | 4 x Tesla L40S | 48 GB | 768 GB |