jobqueues.slurmqueue module#
- class jobqueues.slurmqueue.SlurmQueue(_configapp=None, _configfile=None, _findExecutables=True, _logger=True)#
Bases:
SimQueue
Queue system for SLURM
- Parameters:
jobname (str, default=None) – Job name (identifier)
partition (str or list of str, default=None) – The queue (partition) or list of queues to run on. If list, the one offering earliest initiation will be used.
priority (str, default=None) – Job priority
ngpu (int, default=1) – Number of GPUs to use for a single job
ncpu (int, default=1) – Number of CPUs to use for a single job
memory (int, default=1000) – Amount of memory per job (MiB)
gpumemory (int, default=None) – Only run on GPUs with at least this much memory. Needs special setup of SLURM. Check how to define gpu_mem on SLURM.
walltime (int, default=None) – Job timeout (s)
mailtype (str, default=None) – When to send emails. Separate options with commas like ‘END,FAIL’.
mailuser (str, default=None) – User email address.
outputstream (str, default='slurm.%N.%j.out') – Output stream.
errorstream (str, default='slurm.%N.%j.err') – Error stream.
datadir (str, default=None) – The path in which to store completed trajectories.
trajext (str, default='xtc') – Extension of trajectory files. This is needed to copy them to datadir.
nodelist (list, default=None) – A list of nodes on which to run every job at the same time! Careful! The jobs will be duplicated!
exclude (list, default=None) – A list of nodes on which not to run the jobs. Use this to select nodes on which to allow the jobs to run on.
envvars (str, default='ACEMD_HOME,HTMD_LICENSE_FILE') – Envvars to propagate from submission node to the running node (comma-separated)
prerun (list, default=None) – Shell commands to execute on the running node before the job (e.g. loading modules)
Examples
>>> s = SlurmQueue() >>> s.partition = 'multiscale' >>> s.submit('/my/runnable/folder/') # Folder containing a run.sh bash script
- inprogress()#
Returns the sum of the number of running and queued workunits of the specific group in the engine.
- Returns:
total – Total running and queued workunits
- Return type:
- jobInfo()#
- property memory#
Subclasses need to have this property. This property is expected to return a integer in MiB
- property ncpu#
Subclasses need to have this property
- property ngpu#
Subclasses need to have this property
- retrieve()#
Subclasses need to implement this method
- stop()#
Cancels all currently running and queued jobs
- submit(dirs, commands=None, runscripts=None, _dryrun=False, nvidia_mps=False)#
Submits all directories
- Parameters:
dirs (list) – A list of executable directories. By default it will search for the run.sh script in each directory. You can override the script name by setting the runscript parameter.
commands (list) – A list of commands to run in each directory. If not provided, the run.sh script will be executed. The length of commands must be the same as the length of dirs.
runscripts (list) – A list of run scripts to run in each directory. If not provided, the run.sh script will be detected and executed. This can be used if each folder contains a differently named run script. The length of runscripts must be the same as the length of dirs.
nvidia_mps (bool) – Whether to use Nvidia’s Multi-Process Service (MPS) to share GPU resources among all jobs in dirs.