Run an app on SLURM#
Goal#
Submit a PlayMolecule job to a SLURM cluster and let it run asynchronously.
Minimal example#
from playmolecule.apps import proteinprepare
ed = proteinprepare(
outdir="/shared/scratch/me/proteinprepare-3ptb",
pdbid="3ptb",
)
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)
outdir must be on a filesystem visible to all SLURM nodes. The call returns immediately; the job runs on a worker.
Parameters that matter#
Pass queue="slurm" to run(); every other keyword is forwarded to the SLURM submission:
Parameter |
Type |
What it does |
|---|---|---|
|
|
Queue to run on. Pass a list and the queue offering earliest start is used. |
|
|
CPUs requested. Defaults to the app manifest’s |
|
|
GPUs requested. Defaults to the app manifest’s |
|
|
RAM in MiB. |
|
|
Minimum GPU memory in MiB (requires |
|
|
Timeout in seconds. |
|
|
SLURM priority class. |
|
|
Job identifier shown in |
|
|
Whitelist of nodes — jobs will be duplicated across them, not load-balanced. |
|
|
Blacklist of nodes. |
|
|
Comma-separated env vars to propagate from the submit node to the worker. |
|
|
Shell commands run on the worker before the container starts (e.g., |
|
|
|
|
|
Email address for |
|
|
SLURM stdout file path. |
|
|
SLURM stderr file path. |
When ncpu / ngpu aren’t passed explicitly, PlayMolecule reads them from the app manifest’s resource defaults. Override only when you want to deviate from them.
Preset the queue from the environment#
Set the queue config once and ed.run() with no arguments will route to SLURM automatically:
export PM_QUEUE_CONFIG='{"queue": "slurm", "cpu_partition": "normalCPU", "gpu_partition": "normalGPU"}'
ed.run() # picks gpu_partition if the manifest requests GPUs, cpu_partition otherwise
Other keys in the JSON pass through as kwargs (e.g., memory, walltime).
Check on the job#
print(ed.status) # JobStatus.WAITING_INFO / RUNNING / COMPLETED / ERROR
See Check job status for the polling pattern.
Gotchas#
/tmp/is not shared. If you setoutdir=/tmp/...your job will start and immediately fail when the worker can’t read the inputs. Use shared storage.Logs go to wherever SLURM was configured to write them (and to
outdir/run_<id>/). Use--output/outputstreamto override.The submitting Python process does not need to stay alive — the job is owned by SLURM. Status queries work from any process by reconstructing the
ExecutableDirectoryfromdirname.
Side note: ed.slurm(...)#
ed.slurm(partition=..., ncpu=..., ...) is a thin alias for ed.run(queue="slurm", ...) retained for backwards compatibility. New code should prefer run(queue="slurm") so the same call style works for local, SLURM, and HTTP backends, and so PM_QUEUE_CONFIG can drop the kwargs entirely.
See also#
run()slurm()