GPU Jobs
Exercises¶
- Request 1 GPU and verify the allocation
Submit a job to the GPU partition requesting 1 GPU. Inside the job, run nvidia-smi to confirm a GPU was allocated, and print $CUDA_VISIBLE_DEVICES to see which GPU device was assigned.
Hint / Solution
- Request a specific GPU type
First, check what GPU types are available on your cluster using sinfo. Then submit a job requesting a specific GPU type (e.g., A100 or V100).
Hint / Solution
# Check available GPU types
sinfo -p gpu -o "%N %G"
# Request a specific type (adjust the type to match your cluster)
sbatch --partition=gpu --gres=gpu:a100:1 --cpus-per-task=4 --mem=16G \
--time=00:10:00 --output=gpu_type_%j.out --job-name=gpu_type \
--wrap='nvidia-smi --query-gpu=name --format=csv,noheader'
- Check CUDA_VISIBLE_DEVICES with multiple GPUs
Submit a job requesting 2 GPUs. Inside the job, print $CUDA_VISIBLE_DEVICES and use nvidia-smi to list the allocated GPU details (name, memory, index).
Hint / Solution
- Use sacct to see GPU allocation details
After a GPU job completes, use sacct with the AllocTRES format field to see exactly what trackable resources (including GPUs) were allocated. Compare this with the CPU and memory allocation.