Containers on Slurm # Containers on Slurm ## Why Containers in HPC? Containers package software and its dependencies into a portable unit. In HPC, containers solve common problems: - **Reproducibility:** Same container, same results, regardless of the host OS - **Dependency conflicts:** Different projects need different library versions - **Portability:** Move workloads between clusters, cloud, and laptops - **Complex software stacks:** Pre-built containers for tools like CryoSPARC, AlphaFold, TensorFlow --- ## Container Runtimes in HPC | Runtime | Use Case | Privilege Required | |---------|----------|--------------------| | **Apptainer/Singularity** | HPC standard, runs unprivileged | No | | **Docker** | Development, CI/CD | Yes (root, not available on most HPC clusters) | | **Podman** | Rootless Docker alternative | No | | **Slurm native OCI** | Direct `--container` support (newer) | No | **Apptainer** (formerly Singularity) is the de facto standard for HPC containers because it runs without root privileges and integrates naturally with shared filesystems and schedulers. --- ## Apptainer/Singularity with Slurm ### Running a Container in a Batch Job ```bash #!/bin/bash #SBATCH --job-name=container_job #SBATCH --time=02:00:00 #SBATCH --cpus-per-task=8 #SBATCH --mem=32G module load apptainer # or singularity apptainer exec my_image.sif ./run_analysis.sh ``` ### Common Apptainer Commands ```bash # Run a command inside a container apptainer exec image.sif # Get an interactive shell apptainer shell image.sif # Run the container's default command apptainer run image.sif # Pull a container from Docker Hub apptainer pull docker://ubuntu:22.04 # Build from a definition file apptainer build my_image.sif my_definition.def ``` ### Bind Mounts (Accessing Your Data) By default, Apptainer mounts your home directory and current working directory. For other paths: ```bash # Bind external directories into the container apptainer exec --bind /data:/data,/scratch:/scratch image.sif ./analysis.sh # In a job script #!/bin/bash #SBATCH --job-name=container_analysis #SBATCH --time=04:00:00 apptainer exec \ --bind /shared/databases:/databases \ --bind $SLURM_SUBMIT_DIR:/work \ my_pipeline.sif \ /work/run_pipeline.sh ``` ### GPU Containers ```bash #!/bin/bash #SBATCH --partition=gpu #SBATCH --gres=gpu:1 #SBATCH --time=12:00:00 module load apptainer # --nv flag enables NVIDIA GPU support apptainer exec --nv tensorflow_latest.sif python train.py ``` The `--nv` flag maps the host's NVIDIA drivers and GPU devices into the container. --- ## Real-World Container Examples ### AlphaFold via Container ```bash #!/bin/bash #SBATCH --job-name=alphafold #SBATCH --partition=gpu #SBATCH --gres=gpu:1 #SBATCH --cpus-per-task=8 #SBATCH --mem=64G #SBATCH --time=24:00:00 module load apptainer apptainer exec --nv \ --bind /shared/alphafold_data:/data \ --bind $PWD:/output \ alphafold_2.3.sif \ python /app/run_alphafold.py \ --fasta_paths=/output/target.fasta \ --output_dir=/output/results \ --data_dir=/data ``` ### Bioconductor R Analysis ```bash #!/bin/bash #SBATCH --job-name=bioc_analysis #SBATCH --cpus-per-task=4 #SBATCH --mem=16G #SBATCH --time=02:00:00 module load apptainer apptainer exec \ --bind $SLURM_SUBMIT_DIR:/work \ bioconductor_3.18.sif \ Rscript /work/deseq2_analysis.R ``` ### Custom Pipeline Container ```bash #!/bin/bash #SBATCH --job-name=nf_rnaseq #SBATCH --cpus-per-task=16 #SBATCH --mem=64G #SBATCH --time=12:00:00 module load apptainer nextflow nextflow run nf-core/rnaseq \ -profile singularity \ --input samplesheet.csv \ --genome GRCh38 ``` Many pipeline frameworks (Nextflow, Snakemake) have built-in support for running each step in a container. --- ## Slurm Native OCI Container Support Since Slurm 21.08, there is native support for OCI containers via the `--container` flag: ```bash # Run a job using an OCI container bundle sbatch --container=/path/to/oci/bundle --time=01:00:00 --wrap="hostname" # Interactive srun --container=/path/to/oci/bundle --pty bash ``` This is a newer feature and requires administrator configuration (`oci.conf`). Most sites still use Apptainer/Singularity. > **ParallelCluster Note:** Since compute nodes are ephemeral, store Apptainer/Singularity `.sif` images on **shared storage** (FSx for Lustre or EFS), not on the head node's local disk. To avoid pull-time overhead, pre-stage images as part of a **custom AMI** or post-install script. > **PCS Note:** Similar to ParallelCluster, container images should live on shared storage attached to your PCS cluster. Custom AMIs can also be used to pre-bake images onto compute nodes. --- ## Tips 1. **Pull containers to shared storage**, not your home directory. Container images can be large (several GB). 2. **Use `--bind` for data paths.** Don't copy data into the container. 3. **Match GPU drivers.** The CUDA version inside your container must be compatible with the host's NVIDIA driver. The `--nv` flag handles this for most cases. 4. **Build on a machine with root access** (your laptop, a CI system), then copy the `.sif` file to the cluster. Building on the cluster itself requires `--fakeroot` support. 5. **Cache management.** Apptainer caches images in `$HOME/.apptainer`. Set `APPTAINER_CACHEDIR` to a scratch location if home is limited. References¶ SchedMD: Containers Guide Apptainer Documentation SchedMD: OCI Container Support