Skip to content

Installation

Exercises

  1. Verify Munge authentication across nodes

From the management node, generate a Munge credential and validate it on a remote compute node. Confirm that the credential encodes and decodes successfully on both sides.

Hint / Solution
# Local test
munge -n | unmunge

# Remote test -- pipe a credential to a compute node
munge -n | ssh cpu001 unmunge

# If the remote test fails, check:
# - Is the munge.key identical on both hosts?
# - Is the munge service running on cpu001?
# - Is the clock skew between hosts less than 2 minutes?
systemctl status munge
ssh cpu001 systemctl status munge
  1. Check SlurmDBD connectivity

Verify that slurmctld can communicate with slurmdbd. Look at the slurmdbd log to confirm the connection is established and the database is reachable.

Hint / Solution
# Check slurmdbd is running
systemctl status slurmdbd

# Verify the database connection from Slurm's perspective
sacctmgr show cluster
# If this returns your cluster name, slurmdbd is reachable

# Check slurmdbd log for connection messages
tail -50 /var/log/slurm/slurmdbd.log | grep -i "connection\|error\|database"
  1. Verify the cluster is registered in accounting

Confirm that your cluster appears in the SlurmDBD accounting database. Then verify that a root account exists and at least one user is associated with it.

Hint / Solution
# List registered clusters
sacctmgr show cluster format=Cluster,ControlHost,ControlPort,RPC

# List accounts
sacctmgr show account format=Account,Description,Organization

# List user associations
sacctmgr show association format=Cluster,Account,User,Fairshare
  1. Test configless mode with sackd

On a login node (or a test node), start sackd pointing at the controller and confirm that it pulls down the Slurm configuration. Verify that Slurm client commands work without a local copy of slurm.conf.

Hint / Solution
# On the login/test node, remove or rename any local slurm.conf
mv /etc/slurm/slurm.conf /etc/slurm/slurm.conf.bak

# Start sackd pointing at the controller
sackd --conf-server mgmt01:6817

# Test that client commands work (they fetch config from sackd)
sinfo
squeue
scontrol show config | head -5

# If successful, sackd fetched the config from slurmctld
# Restore slurm.conf if you're not keeping configless mode
mv /etc/slurm/slurm.conf.bak /etc/slurm/slurm.conf

References