playmolecule.datacenter module#
- class playmolecule.datacenter.DataCenter(session)#
Bases:
object
Class which manages all the datasets in the PlayMolecule backend
- Parameters:
session (Session object) – A Session object
- download_dataset(datasetid, path, tmpdir=None, attempts=1, _logger=True)#
Download a dataset from the backend
- Parameters:
datasetid (int) – The ID of the dataset we want to download
path (str) – The location to which to download the dataset
tmpdir (str) – A location to store temporary data. If set to None, the default is /tmp/
attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections
_logger (bool) – Set to False to reduce the verbosity
Examples
>>> dc.download_dataset(182, "./dataset_182/")
- get_dataset_tags(datasetid, _logger=True)#
Returns tags associated with a dataset
- get_datasets(public=None, datasetid=None, remotepath=None, useronly=False, startswith=None, tags=None, taggedonly=False, completedonly=False, group=None, filelist=False, execid=None, returnobj=False, _logger=True)#
Get a list of datasets filtered by various arguments
- Parameters:
public (bool) – If set to True it will only return public datasets. If set to False it will only return private datasets. If set to None this parameter will be ignored.
datasetid (int) – The ID of a specific dataset for which we want to retrieve information
remotepath (str) – The remote (virtual) path at which the dataset is located
useronly (bool) – Returns only datasets of the currently logged in user of the Session
startswith (str) – Returns any datasets whose remote (virtual) path starts with the specific string
tags (list) – Returns only datasets which have the specified tags
taggedonly (bool) – If set to True it will only return datasets which have tags
completedonly (bool) – If set to True it will only return datasets whose jobs have completed successfully
group (str) – Only get datasets related to a job group
filelist (bool) – Set to True to return the files of each dataset as well
execid (str) – Search for datasets produced by this execution ID
returnobj (bool) – Set to True to return Dataset objects instead of dictionaries
_logger (bool) – Set to False to reduce verbosity
- Returns:
datasetlist – A list of datasets retrieved with the above filters
- Return type:
Examples
>>> datasets = dc.get_datasets(remotepath="KdeepTrainer/models/PDBBind2019") >>> datasets = dc.get_datasets(tags=["app:kdeep"])
- remove_dataset(datasetid, _logger=True)#
Removes (deletes) a dataset from the backend
- remove_dataset_tag(datasetid, tag, _logger=True)#
Removes a tag attached to a dataset
- upload_dataset(localpath, remotepath, comments='', public=False, execid='', overwrite=False, tags=None, tmpdir=None, attempts=1, _logger=True)#
Uploads a dataset to the backend data center
- Parameters:
localpath (str) – The location of the file we want to upload
remotepath (str) – The remote (virtual) location to which the file should be uploaded
comments (str) – Comments to attach to the specific dataset
public (bool) – Set to True to make the dataset public (available to all users)
execid (str) – Optionally you can relate this dataset to a specific job execution by passing it’s job ID
overwrite (bool) – Set to True to overwrite existing datasets at the specified remote (virtual) location
tags (list of str) – A list of tags to attach to the specific dataset
tmpdir (str) – Location to use for creating the upload archive file. The file will be deleted after uploading. If set to None it will use /tmp/
attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections
_logger (bool) – Set to False to reduce the verbosity
- class playmolecule.datacenter.Dataset(datasetid, files=(), _session=None, _props=None)#
Bases:
object
- download(path, tmpdir=None, attempts=1, _logger=True)#
Download the dataset from the backend
- Parameters:
Examples
>>> ds.download("./dataset_data/")
- property identifier#
- list_files()#
- move(remotepath)#
- remove(_logger=True)#
- remove_tags()#
- subset(key)#
Gets a subset of the dataset
- Parameters:
key (str or list or regex) – Select files of the dataset either as a single string, a list of strings or a regular expression
Examples
>>> ds2 = ds.subset(["mol6.xtc", "mol6.pdb"]) >>> ds2 = ds.subset(re.compile(".*.png"))
Or alternatively >>> ds2 = ds[[“mol6.xtc”, “mol6.pdb”]] >>> ds2 = ds[re.compile(“.*.png”)]
- update()#
- update_comments(comments)#
- playmolecule.datacenter.download_dataset(session, datasetid, path, files=None, tmpdir=None, attempts=1, _logger=True)#
Download a dataset from the backend
- Parameters:
session (Session) – A Session object
datasetid (int) – The ID of the dataset we want to download
path (str) – The location to which to download the dataset
files (list) – A subset list of files inside of the dataset to retrieve
tmpdir (str) – A location to store temporary data. If set to None, the default is /tmp/
attempts (int) – Number of times to attempt uploading the file. Can help with unstable connections
_logger (bool) – Set to False to reduce the verbosity
Examples
>>> download_dataset(session, 182, "./dataset_182/")
- playmolecule.datacenter.get_datasets(session, public=None, datasetid=None, remotepath=None, useronly=False, startswith=None, tags=None, taggedonly=False, completedonly=False, group=None, filelist=False, execid=None, returnobj=False)#
Get a list of datasets filtered by various arguments
- Parameters:
session (Session) – A Session object
public (bool) – If set to True it will only return public datasets. If set to False it will only return private datasets. If set to None this parameter will be ignored.
datasetid (int) – The ID of a specific dataset for which we want to retrieve information
remotepath (str) – The remote (virtual) path at which the dataset is located
useronly (bool) – Returns only datasets of the currently logged in user of the Session
startswith (str) – Returns any datasets whose remote (virtual) path starts with the specific string
tags (list) – Returns only datasets which have the specified tags
taggedonly (bool) – If set to True it will only return datasets which have tags
completedonly (bool) – If set to True it will only return datasets whose jobs have completed successfully
group (str) – Only get datasets related to a job group
filelist (bool) – Set to True to return the files of each dataset as well
execid (str) – Search for datasets produced by this execution ID
returnobj (bool) – Set to True to return Dataset objects instead of dictionaries
- Returns:
datasetlist – A list of datasets retrieved with the above filters
- Return type:
Examples
>>> datasets = get_datasets(session, remotepath="KdeepTrainer/models/PDBBind2019") >>> datasets = get_datasets(session, tags=["app:kdeep"])
- playmolecule.datacenter.move_dataset(session, datasetid, remotepath, _logger=True)#
Changes the remote location of a dataset
- playmolecule.datacenter.remove_dataset(session, datasetid, _logger=True)#
Removes (deletes) a dataset from the backend
- playmolecule.datacenter.update_dataset_comments(session, datasetid, comments, _logger=True)#
Changes the remote location of a dataset