Run an app on SLURM#

Goal#

Submit a PlayMolecule job to a SLURM cluster and let it run asynchronously.

Minimal example#

from playmolecule.apps import proteinprepare

ed = proteinprepare(
    outdir="/shared/scratch/me/proteinprepare-3ptb",
    pdbid="3ptb",
)
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)

outdir must be on a filesystem visible to all SLURM nodes. The call returns immediately; the job runs on a worker.

Parameters that matter#

Pass queue="slurm" to run(); every other keyword is forwarded to the SLURM submission:

Parameter	Type	What it does
`partition`	`str` or `list[str]`	Queue to run on. Pass a list and the queue offering earliest start is used.
`ncpu`	`int`	CPUs requested. Defaults to the app manifest’s `resources.ncpu`.
`ngpu`	`int`	GPUs requested. Defaults to the app manifest’s `resources.ngpu`.
`memory`	`int`	RAM in MiB.
`gpumemory`	`int`	Minimum GPU memory in MiB (requires `gpu_mem` SLURM feature).
`walltime`	`int`	Timeout in seconds.
`priority`	`str`	SLURM priority class.
`jobname`	`str`	Job identifier shown in `squeue`.
`nodelist`	`list[str]`	Whitelist of nodes — jobs will be duplicated across them, not load-balanced.
`exclude`	`list[str]`	Blacklist of nodes.
`envvars`	`str`	Comma-separated env vars to propagate from the submit node to the worker.
`prerun`	`list[str]`	Shell commands run on the worker before the container starts (e.g., `module load apptainer`).
`mailtype`	`str`	`BEGIN,END,FAIL,...` — what to email on.
`mailuser`	`str`	Email address for `mailtype`.
`outputstream`	`str`	SLURM stdout file path.
`errorstream`	`str`	SLURM stderr file path.

When ncpu / ngpu aren’t passed explicitly, PlayMolecule reads them from the app manifest’s resource defaults. Override only when you want to deviate from them.

Preset the queue from the environment#

Set the queue config once and ed.run() with no arguments will route to SLURM automatically:

export PM_QUEUE_CONFIG='{"queue": "slurm", "cpu_partition": "normalCPU", "gpu_partition": "normalGPU"}'

ed.run()    # picks gpu_partition if the manifest requests GPUs, cpu_partition otherwise

Other keys in the JSON pass through as kwargs (e.g., memory, walltime).

Check on the job#

print(ed.status)        # JobStatus.WAITING_INFO / RUNNING / COMPLETED / ERROR

See Check job status for the polling pattern.

Gotchas#

/tmp/ is not shared. If you set outdir=/tmp/... your job will start and immediately fail when the worker can’t read the inputs. Use shared storage.
Logs go to wherever SLURM was configured to write them (and to outdir/run_<id>/). Use --output / outputstream to override.
The submitting Python process does not need to stay alive — the job is owned by SLURM. Status queries work from any process by reconstructing the ExecutableDirectory from dirname.

Side note: `ed.slurm(...)`#

ed.slurm(partition=..., ncpu=..., ...) is a thin alias for ed.run(queue="slurm", ...) retained for backwards compatibility. New code should prefer run(queue="slurm") so the same call style works for local, SLURM, and HTTP backends, and so PM_QUEUE_CONFIG can drop the kwargs entirely.