GPU Jobs

# GPU Jobs

## Overview

GPUs have become essential for many life science workloads: cryo-EM reconstruction, molecular dynamics, deep learning, and protein structure prediction. Slurm provides native GPU scheduling through its Generic Resource (GRES) system.

---

## Requesting GPUs

### Basic GPU Request

```bash
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1              # 1 GPU
```

### Specific GPU Type

If the cluster has multiple GPU types:

```bash
#SBATCH --gres=gpu:a100:2         # 2 NVIDIA A100 GPUs
#SBATCH --gres=gpu:v100:1         # 1 NVIDIA V100 GPU
```

Check available GPU types with:

```bash
$ sinfo -p gpu -o "%N %G"
NODELIST   GRES
gpu[01-04] gpu:a100:4
gpu[05-08] gpu:v100:4
```

### GPU Shorthand Options

Slurm provides convenience flags that complement `--gres`:

```bash
--gpus=2                  # 2 GPUs total for the job
--gpus-per-node=1         # 1 GPU on each allocated node
--gpus-per-task=1         # 1 GPU for each task
--cpus-per-gpu=4          # Automatically allocate 4 CPUs per GPU
--mem-per-gpu=32G         # Automatically allocate 32G memory per GPU
```

---

## GPU Environment

When Slurm allocates GPUs to your job, it sets `CUDA_VISIBLE_DEVICES` so your code only sees the assigned GPUs:

```bash
#!/bin/bash
#SBATCH --gres=gpu:2

echo "Allocated GPUs: $CUDA_VISIBLE_DEVICES"
# Output: Allocated GPUs: 0,1

nvidia-smi    # Shows only your allocated GPUs
```

Your code should use GPU indices starting from 0 -- Slurm handles the mapping to physical devices.

---

## Common GPU Job Patterns

### Single-GPU Job (most common)

```bash
#!/bin/bash
#SBATCH --job-name=gpu_job
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G
#SBATCH --time=12:00:00

module load cuda/12.0

./my_gpu_program
```

### Multi-GPU Job (single node)

```bash
#!/bin/bash
#SBATCH --job-name=multi_gpu
#SBATCH --partition=gpu
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=16
#SBATCH --mem=128G
#SBATCH --time=24:00:00

module load pytorch/2.0

python train.py --gpus 4
```

### Multi-Node GPU Job (distributed training)

```bash
#!/bin/bash
#SBATCH --job-name=distributed
#SBATCH --partition=gpu
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-gpu=32G
#SBATCH --time=2-00:00:00

module load pytorch/2.0

srun torchrun \
    --nnodes=$SLURM_NNODES \
    --nproc_per_node=$SLURM_GPUS_ON_NODE \
    --rdzv_id=$SLURM_JOB_ID \
    --rdzv_backend=c10d \
    --rdzv_endpoint=$SLURMD_NODENAME:29500 \
    train_distributed.py
```

---

## Life Science GPU Examples

### AlphaFold Protein Prediction

```bash
#!/bin/bash
#SBATCH --job-name=alphafold
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=24:00:00

module load alphafold/2.3

run_alphafold.sh \
    --fasta_paths=target.fasta \
    --output_dir=af_output \
    --model_preset=monomer \
    --db_preset=full_dbs
```

### CryoSPARC Refinement (via cluster lane)

CryoSPARC manages its own GPU job submission (see the CryoSPARC + Slurm module), but the underlying Slurm job looks like:

```bash
#SBATCH --gres=gpu:2
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
```

### Desmond Molecular Dynamics (GPU)

```bash
#!/bin/bash
#SBATCH --job-name=desmond_gpu
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=2-00:00:00

module load schrodinger/2026-1

$SCHRODINGER/desmond -HOST localhost -in desmond_md_job.cfg -WAIT
```

### RELION with GPU Acceleration

```bash
#!/bin/bash
#SBATCH --job-name=relion_gpu
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=5
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:4
#SBATCH --mem=0
#SBATCH --time=3-00:00:00

module load relion/4.0

srun relion_refine_mpi \
    --i particles.star \
    --o Refine3D/run1 \
    --gpu "" \
    --j $SLURM_CPUS_PER_TASK
```

---

## Monitoring GPU Jobs

### Check GPU Utilization

```bash
# From an interactive session on the GPU node
$ srun --jobid=12345 --overlap nvidia-smi

# Or SSH to the node (if permitted)
$ ssh gpu03 nvidia-smi
```

### Check GPU Memory Usage

```bash
$ nvidia-smi --query-gpu=index,name,memory.used,memory.total,utilization.gpu \
    --format=csv
index, name, memory.used [MiB], memory.total [MiB], utilization.gpu [%]
0, NVIDIA A100-SXM4-80GB, 45231 MiB, 81920 MiB, 97 %
```

### After Job Completion

```bash
$ sacct -j 12345 --format=JobID,Elapsed,MaxRSS,AllocTRES%60,State
```

---

## Tips for GPU Jobs

1. **Request the right number of CPUs per GPU.** Many GPU applications need CPU cores for data loading. A good starting point is 4-8 CPUs per GPU.

2. **Request enough memory per GPU.** GPU jobs often need substantial host memory (32-64 GB per GPU for cryo-EM).

3. **Don't hoard GPUs.** Request only what you need. Idle GPUs waste expensive resources.

4. **Check GPU utilization.** If `nvidia-smi` shows low GPU utilization, your job may be bottlenecked on CPU, I/O, or memory.

5. **Use local SSD for staging** when available. GPU jobs processing large datasets (cryo-EM particles, training data) benefit from fast local storage.

> **ParallelCluster Note:** GPU instance types (p3, p4d, p5) are typically configured as separate compute resources with specific GPU counts. Check `sinfo -p gpu -o "%N %G %c %m"` to see available GPU configurations.

Exercises¶

Request 1 GPU and verify the allocation

Submit a job to the GPU partition requesting 1 GPU. Inside the job, run nvidia-smi to confirm a GPU was allocated, and print $CUDA_VISIBLE_DEVICES to see which GPU device was assigned.

Hint / Solution

sbatch --partition=gpu --gres=gpu:1 --cpus-per-task=4 --mem=16G \
    --time=00:10:00 --output=gpu_test_%j.out --job-name=gpu_test \
    --wrap='echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"; nvidia-smi'

Request a specific GPU type

First, check what GPU types are available on your cluster using sinfo. Then submit a job requesting a specific GPU type (e.g., A100 or V100).

Hint / Solution

# Check available GPU types
sinfo -p gpu -o "%N %G"

# Request a specific type (adjust the type to match your cluster)
sbatch --partition=gpu --gres=gpu:a100:1 --cpus-per-task=4 --mem=16G \
    --time=00:10:00 --output=gpu_type_%j.out --job-name=gpu_type \
    --wrap='nvidia-smi --query-gpu=name --format=csv,noheader'

Check CUDA_VISIBLE_DEVICES with multiple GPUs

Submit a job requesting 2 GPUs. Inside the job, print $CUDA_VISIBLE_DEVICES and use nvidia-smi to list the allocated GPU details (name, memory, index).

Hint / Solution

sbatch --partition=gpu --gres=gpu:2 --cpus-per-task=8 --mem=32G \
    --time=00:10:00 --output=multi_gpu_%j.out --job-name=multi_gpu \
    --wrap='echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"; nvidia-smi --query-gpu=index,name,memory.total --format=csv'

Use sacct to see GPU allocation details

After a GPU job completes, use sacct with the AllocTRES format field to see exactly what trackable resources (including GPUs) were allocated. Compare this with the CPU and memory allocation.

Hint / Solution

# After your GPU job completes:
sacct -j <jobid> --format=JobID,JobName,AllocTRES%60,Elapsed,State

# The AllocTRES column will show something like:
# billing=12,cpu=4,gres/gpu=1,mem=16G,node=1

GPU Jobs

Exercises¶

References¶