jobqueues.slurmqueue module#

class jobqueues.slurmqueue.SlurmQueue(_configapp=None, _configfile=None, _findExecutables=True, _logger=True)#

Bases: SimQueue

Queue system for SLURM

Parameters:
  • jobname (str, default=None) – Job name (identifier)

  • partition (str or list of str, default=None) – The queue (partition) or list of queues to run on. If list, the one offering earliest initiation will be used.

  • priority (str, default=None) – Job priority

  • ngpu (int, default=1) – Number of GPUs to use for a single job

  • ncpu (int, default=1) – Number of CPUs to use for a single job

  • memory (int, default=1000) – Amount of memory per job (MiB)

  • gpumemory (int, default=None) – Only run on GPUs with at least this much memory. Needs special setup of SLURM. Check how to define gpu_mem on SLURM.

  • walltime (int, default=None) – Job timeout (s)

  • mailtype (str, default=None) – When to send emails. Separate options with commas like ‘END,FAIL’.

  • mailuser (str, default=None) – User email address.

  • outputstream (str, default='slurm.%N.%j.out') – Output stream.

  • errorstream (str, default='slurm.%N.%j.err') – Error stream.

  • datadir (str, default=None) – The path in which to store completed trajectories.

  • trajext (str, default='xtc') – Extension of trajectory files. This is needed to copy them to datadir.

  • nodelist (list, default=None) – A list of nodes on which to run every job at the same time! Careful! The jobs will be duplicated!

  • exclude (list, default=None) – A list of nodes on which not to run the jobs. Use this to select nodes on which to allow the jobs to run on.

  • envvars (str, default='ACEMD_HOME,HTMD_LICENSE_FILE') – Envvars to propagate from submission node to the running node (comma-separated)

  • prerun (list, default=None) – Shell commands to execute on the running node before the job (e.g. loading modules)

Examples

>>> s = SlurmQueue()
>>> s.partition = 'multiscale'
>>> s.submit('/my/runnable/folder/')  # Folder containing a run.sh bash script
inprogress()#

Returns the sum of the number of running and queued workunits of the specific group in the engine.

Returns:

total – Total running and queued workunits

Return type:

int

jobInfo()#
property memory#

Subclasses need to have this property. This property is expected to return a integer in MiB

property ncpu#

Subclasses need to have this property

property ngpu#

Subclasses need to have this property

retrieve()#

Subclasses need to implement this method

stop()#

Cancels all currently running and queued jobs

submit(dirs, commands=None, _dryrun=False, nvidia_mps=False)#

Submits all directories

Parameters:
  • dirs (list) – A list of executable directories.

  • nvidia_mps (bool) – Whether to use Nvidia’s Multi-Process Service (MPS) to share GPU resources among all jobs in dirs.