playmolecule.datacenter module

class playmolecule.datacenter.DataCenter(session)

Bases: object

Class which manages all the datasets in the PlayMolecule backend

Parameters

session (Session object) – A Session object

download_dataset(datasetid, path, tmpdir=None, attempts=1, _logger=True)

Download a dataset from the backend

Parameters
  • datasetid (int) – The ID of the dataset we want to download

  • path (str) – The location to which to download the dataset

  • tmpdir (str) – A location to store temporary data. If set to None, the default is /tmp/

  • attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections

  • _logger (bool) – Set to False to reduce the verbosity

Examples

>>> dc.download_dataset(182, "./dataset_182/")
get_dataset_tags(datasetid, _logger=True)

Returns tags associated with a dataset

Parameters
  • datasetid (int) – The ID of the dataset for which to return tags

  • _logger (bool) – Set to False to reduce verbosity

get_datasets(public=None, datasetid=None, remotepath=None, useronly=False, startswith=None, tags=None, taggedonly=False, completedonly=False, group=None, _logger=True)

Get a list of datasets filtered by various arguments

Parameters
  • public (bool) – If set to True it will only return public datasets. If set to False it will only return private datasets. If set to None this parameter will be ignored.

  • datasetid (int) – The ID of a specific dataset for which we want to retrieve information

  • remotepath (str) – The remote (virtual) path at which the dataset is located

  • useronly (bool) – Returns only datasets of the currently logged in user of the Session

  • startswith (str) – Returns any datasets whose remote (virtual) path starts with the specific string

  • tags (list) – Returns only datasets which have the specified tags

  • taggedonly (bool) – If set to True it will only return datasets which have tags

  • completedonly (bool) – If set to True it will only return datasets whose jobs have completed successfully

  • group (str) – Only get datasets related to a job group

  • _logger (bool) – Set to False to reduce verbosity

Returns

datasetlist – A list of datasets retrieved with the above filters

Return type

list

Examples

>>> datasets = dc.get_datasets(remotepath="KdeepTrainer/models/PDBBind2019")
>>> datasets = dc.get_datasets(tags=["app:kdeep"])
remove_dataset(datasetid, _logger=True)

Removes (deletes) a dataset from the backend

Parameters
  • datasetid (int) – The ID of the dataset we want to delete

  • _logger (bool) – Set to False to reduce verbosity

remove_dataset_tag(datasetid, tag, _logger=True)

Removes a tag attached to a dataset

Parameters
  • datasetid (int) – The ID of the dataset

  • tag (str) – The tag we want to remove from the dataset

  • _logger (bool) – Set to False to reduce verbosity

upload_dataset(localpath, remotepath, comments='', public=False, execid='', overwrite=False, tags=None, tmpdir=None, attempts=1, _logger=True)

Uploads a dataset to the backend data center

Parameters
  • localpath (str) – The location of the file we want to upload

  • remotepath (str) – The remote (virtual) location to which the file should be uploaded

  • comments (str) – Comments to attach to the specific dataset

  • public (bool) – Set to True to make the dataset public (available to all users)

  • execid (str) – Optionally you can relate this dataset to a specific job execution by passing it’s job ID

  • overwrite (bool) – Set to True to overwrite existing datasets at the specified remote (virtual) location

  • tags (list of str) – A list of tags to attach to the specific dataset

  • tmpdir (str) – Location to use for creating the upload archive file. The file will be deleted after uploading. If set to None it will use /tmp/

  • attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections

  • _logger (bool) – Set to False to reduce the verbosity

exception playmolecule.datacenter.DataCenterError

Bases: Exception

class playmolecule.datacenter.Dataset(datasetid, files=(), _session=None, _props=None)

Bases: object

download(path, tmpdir=None, attempts=1, _logger=True)

Download the dataset from the backend

Parameters
  • path (str) – The location to which to download the dataset

  • tmpdir (str) – A location to store temporary data. If set to None, the default is /tmp/

  • attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections

  • _logger (bool) – Set to False to reduce the verbosity

Examples

>>> ds.download("./dataset_data/")
property identifier
list_files(_logger=True, attempts=5)
remove(_logger=True)
remove_tags()