moleculekit.smallmol.tools.clustering module#

moleculekit.smallmol.tools.clustering.DiceDistances(fp1, fps)#

Returns the dice row based on fingeprints passed

Parameters:
  • fp1 (rdkit fingerprint) – The rdkit fingerprint computed used as reference

  • fps (list) – The list of the rdkit fingerprint computed

Returns:

dicerow – A list with the dice row similarities

Return type:

list

moleculekit.smallmol.tools.clustering.ParallelExecutor(**joblib_args)#

A wrapper for joblib.Parallel to allow custom progress bars.

moleculekit.smallmol.tools.clustering.TanimotoDistances(fp1, fps)#

Returns the tanimoto row based on fingeprints passed

Parameters:
  • fp1 (rdkit fingerprint) – The rdkit fingerprint computed used as reference

  • fps (list) – The list of the rdkit fingerprint computed

Returns:

tanimotorow – A list with the tanimoto row similarities

Return type:

list

moleculekit.smallmol.tools.clustering.cluster(smallmol_list, method, distThresholds=0.2, returnDetails=True, removeHs=True)#

Return the SmallMol objects grouped in the cluster. It can also return the details of the clusters computed.

Parameters:
  • smallmol_list (list) – The list of moleculekit.smallmol.smallmol.SmallMol objects

  • method (str) – The cluster methods. Can be [‘maccs’, ‘pathFingerprints’, ‘atomsFingerprints’, ‘torsionsFingerprints’, ‘circularFingerprints’, ‘shape’, ‘mcs’]

  • distThresholds (float) – The disance cutoff for the clusters Default: 0.2

  • returnDetails (bool) – If True, the cluster details are also returned Default: True

  • removeHs (bool) – If True, the hydrogens are not considered Default: True

Returns:

  • clusters (list) – List of lists, That contains the SmallMol objects grouped based on the cluster belongings

  • details (list) – A list with all the cluster details

moleculekit.smallmol.tools.clustering.getMaximumCommonSubstructure(smallmol_list, removeHs=True, returnAtomIdxs=False)#

Returns the maximum common substructure and two list of lists. The first one contains for each molecules the atom indexes that are part of the MCS, the second list contains the indexes that are not part of the MCS.

Parameters:
  • smallmol_list (list) – The list of SmallMol objects

  • removeHs (bool) – If True, the atom the hydrogens where not considered Default: True

  • returnAtomIdxs (bool) – If True, the lists of the atom indexes are returned Default: False

Returns:

  • mcs_mol (rdkit.Chem.rdchem.Mol) – The MCS molecule

  • atom_mcs_list (list) – A list of lists containing the atom indexes that are part of the MCS

  • atom_no_mcs_list (list) – A list of lists containing the atom indexes that are not part of the MCS