Containers on Slurm

# Containers on Slurm

## Why Containers in HPC?

Containers package software and its dependencies into a portable unit. In HPC, containers solve common problems:

- **Reproducibility:** Same container, same results, regardless of the host OS
- **Dependency conflicts:** Different projects need different library versions
- **Portability:** Move workloads between clusters, cloud, and laptops
- **Complex software stacks:** Pre-built containers for tools like CryoSPARC, AlphaFold, TensorFlow

---

## Container Runtimes in HPC

| Runtime | Use Case | Privilege Required |
|---------|----------|--------------------|
| **Apptainer/Singularity** | HPC standard, runs unprivileged | No |
| **Docker** | Development, CI/CD | Yes (root, not available on most HPC clusters) |
| **Podman** | Rootless Docker alternative | No |
| **Slurm native OCI** | Direct `--container` support (newer) | No |

**Apptainer** (formerly Singularity) is the de facto standard for HPC containers because it runs without root privileges and integrates naturally with shared filesystems and schedulers.

---

## Apptainer/Singularity with Slurm

### Running a Container in a Batch Job

```bash
#!/bin/bash
#SBATCH --job-name=container_job
#SBATCH --time=02:00:00
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G

module load apptainer    # or singularity

apptainer exec my_image.sif ./run_analysis.sh
```

### Common Apptainer Commands

```bash
# Run a command inside a container
apptainer exec image.sif <command>

# Get an interactive shell
apptainer shell image.sif

# Run the container's default command
apptainer run image.sif

# Pull a container from Docker Hub
apptainer pull docker://ubuntu:22.04

# Build from a definition file
apptainer build my_image.sif my_definition.def
```

### Bind Mounts (Accessing Your Data)

By default, Apptainer mounts your home directory and current working directory. For other paths:

```bash
# Bind external directories into the container
apptainer exec --bind /data:/data,/scratch:/scratch image.sif ./analysis.sh

# In a job script
#!/bin/bash
#SBATCH --job-name=container_analysis
#SBATCH --time=04:00:00

apptainer exec \
    --bind /shared/databases:/databases \
    --bind $SLURM_SUBMIT_DIR:/work \
    my_pipeline.sif \
    /work/run_pipeline.sh
```

### GPU Containers

```bash
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --time=12:00:00

module load apptainer

# --nv flag enables NVIDIA GPU support
apptainer exec --nv tensorflow_latest.sif python train.py
```

The `--nv` flag maps the host's NVIDIA drivers and GPU devices into the container.

---

## Real-World Container Examples

### AlphaFold via Container

```bash
#!/bin/bash
#SBATCH --job-name=alphafold
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=24:00:00

module load apptainer

apptainer exec --nv \
    --bind /shared/alphafold_data:/data \
    --bind $PWD:/output \
    alphafold_2.3.sif \
    python /app/run_alphafold.py \
        --fasta_paths=/output/target.fasta \
        --output_dir=/output/results \
        --data_dir=/data
```

### Bioconductor R Analysis

```bash
#!/bin/bash
#SBATCH --job-name=bioc_analysis
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=02:00:00

module load apptainer

apptainer exec \
    --bind $SLURM_SUBMIT_DIR:/work \
    bioconductor_3.18.sif \
    Rscript /work/deseq2_analysis.R
```

### Custom Pipeline Container

```bash
#!/bin/bash
#SBATCH --job-name=nf_rnaseq
#SBATCH --cpus-per-task=16
#SBATCH --mem=64G
#SBATCH --time=12:00:00

module load apptainer nextflow

nextflow run nf-core/rnaseq \
    -profile singularity \
    --input samplesheet.csv \
    --genome GRCh38
```

Many pipeline frameworks (Nextflow, Snakemake) have built-in support for running each step in a container.

---

## Slurm Native OCI Container Support

Since Slurm 21.08, there is native support for OCI containers via the `--container` flag:

```bash
# Run a job using an OCI container bundle
sbatch --container=/path/to/oci/bundle --time=01:00:00 --wrap="hostname"

# Interactive
srun --container=/path/to/oci/bundle --pty bash
```

This is a newer feature and requires administrator configuration (`oci.conf`). Most sites still use Apptainer/Singularity.

> **ParallelCluster Note:** Since compute nodes are ephemeral, store Apptainer/Singularity `.sif` images on **shared storage** (FSx for Lustre or EFS), not on the head node's local disk. To avoid pull-time overhead, pre-stage images as part of a **custom AMI** or post-install script.

> **PCS Note:** Similar to ParallelCluster, container images should live on shared storage attached to your PCS cluster. Custom AMIs can also be used to pre-bake images onto compute nodes.

---

## Tips

1. **Pull containers to shared storage**, not your home directory. Container images can be large (several GB).

2. **Use `--bind` for data paths.** Don't copy data into the container.

3. **Match GPU drivers.** The CUDA version inside your container must be compatible with the host's NVIDIA driver. The `--nv` flag handles this for most cases.

4. **Build on a machine with root access** (your laptop, a CI system), then copy the `.sif` file to the cluster. Building on the cluster itself requires `--fakeroot` support.

5. **Cache management.** Apptainer caches images in `$HOME/.apptainer`. Set `APPTAINER_CACHEDIR` to a scratch location if home is limited.

Containers on Slurm

References¶