Running on SLURM#
You will learn: how to submit a PlayMolecule job to a SLURM cluster, poll its status, and use the resources requested by the app’s manifest.
Prerequisites:
First app run completed.
SSH access to a node that can
sbatch.outdiron a path visible to all SLURM nodes (a shared filesystem) — local/tmp/will not work for a remote worker.
Setup#
from playmolecule import JobStatus
from playmolecule.apps import proteinprepare
Step 2 — Submit#
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)
Passing queue="slurm" to run() submits through SLURM instead of running the container locally. Every other keyword (partition, ncpu, ngpu, memory, walltime, nodelist, exclude, envvars, prerun, …) is forwarded to the SLURM queue.
The call returns immediately. The SLURM queue object is stored on ed and the job ID is in SLURM’s normal accounting.
Step 3 — Poll status#
print(ed.status)
You’ll see one of the four states from JobStatus:
JobStatus.WAITING_INFO— submitted but not yet running.JobStatus.RUNNINGJobStatus.COMPLETEDJobStatus.ERROR
A simple polling loop:
import time
while ed.status not in (JobStatus.COMPLETED, JobStatus.ERROR):
time.sleep(30)
print("Done:", ed.status)
For background — what each state means and how it’s detected — see Job lifecycle.
Step 4 — Use app-default resources#
The manifest declares per-app resources (CPUs, GPUs) that you don’t have to repeat. If you don’t pass ncpu / ngpu, the app’s defaults are used. So for an app whose manifest sets ncpu=4, ngpu=1, this is enough:
proteinprepare(outdir="/shared/scratch/me/run").run(queue="slurm", partition="normalCPU")
Override only when you need to deviate from the manifest defaults.
Step 5 — Preset the queue once#
When PM_QUEUE_CONFIG is set, ed.run() with no arguments picks up the queue, partition, and resources from the environment and submits to SLURM automatically:
export PM_QUEUE_CONFIG='{"queue": "slurm", "cpu_partition": "normalCPU", "gpu_partition": "normalGPU"}'
ed.run() # picks gpu_partition if the manifest requests GPUs, cpu_partition otherwise
Useful in shared CI scripts and admin-managed environments where users shouldn’t have to know which partition to use.
Recap#
Always set
outdirto a shared-filesystem path before submitting to SLURM.ed.run(queue="slurm", ...)returns immediately; queryed.statusto see progress.The app manifest provides default
ncpu/ngpu; override only when needed.PM_QUEUE_CONFIGletsed.run()pick the partition automatically.
Next#
Side note: ed.slurm(...)#
ed.slurm(partition=..., ncpu=..., ...) is a thin alias for ed.run(queue="slurm", ...) that pre-dates the unified run(queue=...) interface. New code should use run(queue="slurm") — it composes with PM_QUEUE_CONFIG, parallels the local-run call, and avoids a second method to remember. slurm() will continue to work.