Run an app on SLURM#

Goal#

Submit a PlayMolecule job to a SLURM cluster and let it run asynchronously.

Minimal example#

from playmolecule.apps import proteinprepare

ed = proteinprepare(
    outdir="/shared/scratch/me/proteinprepare-3ptb",
    pdbid="3ptb",
)
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)

outdir must be on a filesystem visible to all SLURM nodes. The call returns immediately; the job runs on a worker.

Parameters that matter#

Pass queue="slurm" to run(); every other keyword is forwarded to the SLURM submission:

Parameter

Type

What it does

partition

str or list[str]

Queue to run on. Pass a list and the queue offering earliest start is used.

ncpu

int

CPUs requested. Defaults to the app manifest’s resources.ncpu.

ngpu

int

GPUs requested. Defaults to the app manifest’s resources.ngpu.

memory

int

RAM in MiB.

gpumemory

int

Minimum GPU memory in MiB (requires gpu_mem SLURM feature).

walltime

int

Timeout in seconds.

priority

str

SLURM priority class.

jobname

str

Job identifier shown in squeue.

nodelist

list[str]

Whitelist of nodes — jobs will be duplicated across them, not load-balanced.

exclude

list[str]

Blacklist of nodes.

envvars

str

Comma-separated env vars to propagate from the submit node to the worker.

prerun

list[str]

Shell commands run on the worker before the container starts (e.g., module load apptainer).

mailtype

str

BEGIN,END,FAIL,... — what to email on.

mailuser

str

Email address for mailtype.

outputstream

str

SLURM stdout file path.

errorstream

str

SLURM stderr file path.

When ncpu / ngpu aren’t passed explicitly, PlayMolecule reads them from the app manifest’s resource defaults. Override only when you want to deviate from them.

Preset the queue from the environment#

Set the queue config once and ed.run() with no arguments will route to SLURM automatically:

export PM_QUEUE_CONFIG='{"queue": "slurm", "cpu_partition": "normalCPU", "gpu_partition": "normalGPU"}'
ed.run()    # picks gpu_partition if the manifest requests GPUs, cpu_partition otherwise

Other keys in the JSON pass through as kwargs (e.g., memory, walltime).

Check on the job#

print(ed.status)        # JobStatus.WAITING_INFO / RUNNING / COMPLETED / ERROR

See Check job status for the polling pattern.

Gotchas#

  • /tmp/ is not shared. If you set outdir=/tmp/... your job will start and immediately fail when the worker can’t read the inputs. Use shared storage.

  • Logs go to wherever SLURM was configured to write them (and to outdir/run_<id>/). Use --output / outputstream to override.

  • The submitting Python process does not need to stay alive — the job is owned by SLURM. Status queries work from any process by reconstructing the ExecutableDirectory from dirname.

Side note: ed.slurm(...)#

ed.slurm(partition=..., ncpu=..., ...) is a thin alias for ed.run(queue="slurm", ...) retained for backwards compatibility. New code should prefer run(queue="slurm") so the same call style works for local, SLURM, and HTTP backends, and so PM_QUEUE_CONFIG can drop the kwargs entirely.

See also#