Installation
Exercises¶
- Verify Munge authentication across nodes
From the management node, generate a Munge credential and validate it on a remote compute node. Confirm that the credential encodes and decodes successfully on both sides.
Hint / Solution
# Local test
munge -n | unmunge
# Remote test -- pipe a credential to a compute node
munge -n | ssh cpu001 unmunge
# If the remote test fails, check:
# - Is the munge.key identical on both hosts?
# - Is the munge service running on cpu001?
# - Is the clock skew between hosts less than 2 minutes?
systemctl status munge
ssh cpu001 systemctl status munge
- Check SlurmDBD connectivity
Verify that slurmctld can communicate with slurmdbd. Look at the slurmdbd log to confirm the connection is established and the database is reachable.
Hint / Solution
# Check slurmdbd is running
systemctl status slurmdbd
# Verify the database connection from Slurm's perspective
sacctmgr show cluster
# If this returns your cluster name, slurmdbd is reachable
# Check slurmdbd log for connection messages
tail -50 /var/log/slurm/slurmdbd.log | grep -i "connection\|error\|database"
- Verify the cluster is registered in accounting
Confirm that your cluster appears in the SlurmDBD accounting database. Then verify that a root account exists and at least one user is associated with it.
Hint / Solution
- Test configless mode with sackd
On a login node (or a test node), start sackd pointing at the controller and confirm that it pulls down the Slurm configuration. Verify that Slurm client commands work without a local copy of slurm.conf.
Hint / Solution
# On the login/test node, remove or rename any local slurm.conf
mv /etc/slurm/slurm.conf /etc/slurm/slurm.conf.bak
# Start sackd pointing at the controller
sackd --conf-server mgmt01:6817
# Test that client commands work (they fetch config from sackd)
sinfo
squeue
scontrol show config | head -5
# If successful, sackd fetched the config from slurmctld
# Restore slurm.conf if you're not keeping configless mode
mv /etc/slurm/slurm.conf.bak /etc/slurm/slurm.conf