jobqueues.slurmqueue module#
- class jobqueues.slurmqueue.SlurmQueue(_configapp=None, _configfile=None, _findExecutables=True, _logger=True)#
Bases:
SimQueue
Queue system for SLURM
- Parameters:
jobname (str, default=None) – Job name (identifier)
partition (str or list of str, default=None) – The queue (partition) or list of queues to run on. If list, the one offering earliest initiation will be used.
priority (str, default=None) – Job priority
ngpu (int, default=1) – Number of GPUs to use for a single job
ncpu (int, default=1) – Number of CPUs to use for a single job
memory (int, default=1000) – Amount of memory per job (MiB)
gpumemory (int, default=None) – Only run on GPUs with at least this much memory. Needs special setup of SLURM. Check how to define gpu_mem on SLURM.
walltime (int, default=None) – Job timeout (s)
mailtype (str, default=None) – When to send emails. Separate options with commas like ‘END,FAIL’.
mailuser (str, default=None) – User email address.
outputstream (str, default='slurm.%N.%j.out') – Output stream.
errorstream (str, default='slurm.%N.%j.err') – Error stream.
datadir (str, default=None) – The path in which to store completed trajectories.
trajext (str, default='xtc') – Extension of trajectory files. This is needed to copy them to datadir.
nodelist (list, default=None) – A list of nodes on which to run every job at the same time! Careful! The jobs will be duplicated!
exclude (list, default=None) – A list of nodes on which not to run the jobs. Use this to select nodes on which to allow the jobs to run on.
envvars (str, default='ACEMD_HOME,HTMD_LICENSE_FILE') – Envvars to propagate from submission node to the running node (comma-separated)
prerun (list, default=None) – Shell commands to execute on the running node before the job (e.g. loading modules)
Examples
>>> s = SlurmQueue() >>> s.partition = 'multiscale' >>> s.submit('/my/runnable/folder/') # Folder containing a run.sh bash script
- inprogress()#
Returns the sum of the number of running and queued workunits of the specific group in the engine.
- Returns:
total – Total running and queued workunits
- Return type:
- jobInfo()#
- property memory#
Subclasses need to have this property. This property is expected to return a integer in MiB
- property ncpu#
Subclasses need to have this property
- property ngpu#
Subclasses need to have this property
- retrieve()#
Subclasses need to implement this method
- stop()#
Cancels all currently running and queued jobs