playmolecule.datacenter module#

class playmolecule.datacenter.DataCenter(session)#

Bases: object

Class which manages all the datasets in the PlayMolecule backend

Parameters

session (Session object) – A Session object

download_dataset(datasetid, path, tmpdir=None, attempts=1, _logger=True)#

Download a dataset from the backend

Parameters
  • datasetid (int) – The ID of the dataset we want to download

  • path (str) – The location to which to download the dataset

  • tmpdir (str) – A location to store temporary data. If set to None, the default is /tmp/

  • attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections

  • _logger (bool) – Set to False to reduce the verbosity

Examples

>>> dc.download_dataset(182, "./dataset_182/")
get_dataset_tags(datasetid, _logger=True)#

Returns tags associated with a dataset

Parameters
  • datasetid (int) – The ID of the dataset for which to return tags

  • _logger (bool) – Set to False to reduce verbosity

get_datasets(public=None, datasetid=None, remotepath=None, useronly=False, startswith=None, tags=None, taggedonly=False, completedonly=False, group=None, filelist=False, execid=None, returnobj=False, _logger=True)#

Get a list of datasets filtered by various arguments

Parameters
  • public (bool) – If set to True it will only return public datasets. If set to False it will only return private datasets. If set to None this parameter will be ignored.

  • datasetid (int) – The ID of a specific dataset for which we want to retrieve information

  • remotepath (str) – The remote (virtual) path at which the dataset is located

  • useronly (bool) – Returns only datasets of the currently logged in user of the Session

  • startswith (str) – Returns any datasets whose remote (virtual) path starts with the specific string

  • tags (list) – Returns only datasets which have the specified tags

  • taggedonly (bool) – If set to True it will only return datasets which have tags

  • completedonly (bool) – If set to True it will only return datasets whose jobs have completed successfully

  • group (str) – Only get datasets related to a job group

  • filelist (bool) – Set to True to return the files of each dataset as well

  • execid (str) – Search for datasets produced by this execution ID

  • returnobj (bool) – Set to True to return Dataset objects instead of dictionaries

  • _logger (bool) – Set to False to reduce verbosity

Returns

datasetlist – A list of datasets retrieved with the above filters

Return type

list

Examples

>>> datasets = dc.get_datasets(remotepath="KdeepTrainer/models/PDBBind2019")
>>> datasets = dc.get_datasets(tags=["app:kdeep"])
remove_dataset(datasetid, _logger=True)#

Removes (deletes) a dataset from the backend

Parameters
  • datasetid (int) – The ID of the dataset we want to delete

  • _logger (bool) – Set to False to reduce verbosity

remove_dataset_tag(datasetid, tag, _logger=True)#

Removes a tag attached to a dataset

Parameters
  • datasetid (int) – The ID of the dataset

  • tag (str) – The tag we want to remove from the dataset

  • _logger (bool) – Set to False to reduce verbosity

upload_dataset(localpath, remotepath, comments='', public=False, execid='', overwrite=False, tags=None, tmpdir=None, attempts=1, _logger=True)#

Uploads a dataset to the backend data center

Parameters
  • localpath (str) – The location of the file we want to upload

  • remotepath (str) – The remote (virtual) location to which the file should be uploaded

  • comments (str) – Comments to attach to the specific dataset

  • public (bool) – Set to True to make the dataset public (available to all users)

  • execid (str) – Optionally you can relate this dataset to a specific job execution by passing it’s job ID

  • overwrite (bool) – Set to True to overwrite existing datasets at the specified remote (virtual) location

  • tags (list of str) – A list of tags to attach to the specific dataset

  • tmpdir (str) – Location to use for creating the upload archive file. The file will be deleted after uploading. If set to None it will use /tmp/

  • attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections

  • _logger (bool) – Set to False to reduce the verbosity

exception playmolecule.datacenter.DataCenterError#

Bases: Exception

class playmolecule.datacenter.Dataset(datasetid, files=(), _session=None, _props=None)#

Bases: object

download(path, tmpdir=None, attempts=1, _logger=True)#

Download the dataset from the backend

Parameters
  • path (str) – The location to which to download the dataset

  • tmpdir (str) – A location to store temporary data. If set to None, the default is /tmp/

  • attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections

  • _logger (bool) – Set to False to reduce the verbosity

Examples

>>> ds.download("./dataset_data/")
property identifier#
list_files()#
move(remotepath)#
remove(_logger=True)#
remove_tags()#
subset(key)#

Gets a subset of the dataset

Parameters

key (str or list or regex) – Select files of the dataset either as a single string, a list of strings or a regular expression

Examples

>>> ds2 = ds.subset(["mol6.xtc", "mol6.pdb"])
>>> ds2 = ds.subset(re.compile(".*.png"))

Or alternatively >>> ds2 = ds[[“mol6.xtc”, “mol6.pdb”]] >>> ds2 = ds[re.compile(“.*.png”)]

update()#
update_comments(comments)#
playmolecule.datacenter.download_dataset(session, datasetid, path, files=None, tmpdir=None, attempts=1, _logger=True)#

Download a dataset from the backend

Parameters
  • session (Session) – A Session object

  • datasetid (int) – The ID of the dataset we want to download

  • path (str) – The location to which to download the dataset

  • files (list) – A subset list of files inside of the dataset to retrieve

  • tmpdir (str) – A location to store temporary data. If set to None, the default is /tmp/

  • attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections

  • _logger (bool) – Set to False to reduce the verbosity

Examples

>>> download_dataset(session, 182, "./dataset_182/")
playmolecule.datacenter.get_datasets(session, public=None, datasetid=None, remotepath=None, useronly=False, startswith=None, tags=None, taggedonly=False, completedonly=False, group=None, filelist=False, execid=None, returnobj=False)#

Get a list of datasets filtered by various arguments

Parameters
  • session (Session) – A Session object

  • public (bool) – If set to True it will only return public datasets. If set to False it will only return private datasets. If set to None this parameter will be ignored.

  • datasetid (int) – The ID of a specific dataset for which we want to retrieve information

  • remotepath (str) – The remote (virtual) path at which the dataset is located

  • useronly (bool) – Returns only datasets of the currently logged in user of the Session

  • startswith (str) – Returns any datasets whose remote (virtual) path starts with the specific string

  • tags (list) – Returns only datasets which have the specified tags

  • taggedonly (bool) – If set to True it will only return datasets which have tags

  • completedonly (bool) – If set to True it will only return datasets whose jobs have completed successfully

  • group (str) – Only get datasets related to a job group

  • filelist (bool) – Set to True to return the files of each dataset as well

  • execid (str) – Search for datasets produced by this execution ID

  • returnobj (bool) – Set to True to return Dataset objects instead of dictionaries

Returns

datasetlist – A list of datasets retrieved with the above filters

Return type

list

Examples

>>> datasets = get_datasets(session, remotepath="KdeepTrainer/models/PDBBind2019")
>>> datasets = get_datasets(session, tags=["app:kdeep"])
playmolecule.datacenter.move_dataset(session, datasetid, remotepath, _logger=True)#

Changes the remote location of a dataset

Parameters

sessionSession

A Session object

datasetidint

The ID of the dataset we want to delete

remotepathstr

The new remote location of the dataset

_loggerbool

Set to False to reduce verbosity

playmolecule.datacenter.remove_dataset(session, datasetid, _logger=True)#

Removes (deletes) a dataset from the backend

Parameters
  • session (Session) – A Session object

  • datasetid (int) – The ID of the dataset we want to delete

  • _logger (bool) – Set to False to reduce verbosity

playmolecule.datacenter.update_dataset_comments(session, datasetid, comments, _logger=True)#

Changes the remote location of a dataset

Parameters

sessionSession

A Session object

datasetidint

The ID of the dataset we want to delete

commentsstr

The new comments of the dataset

_loggerbool

Set to False to reduce verbosity