moleculekit.smallmol.smallmol module#

class moleculekit.smallmol.smallmol.SmallMol(mol, ignore_errors=False, force_reading=False, fixHs=True, removeHs=False, verbose=True, sanitize=True, _logger=True, **kwargs)#

Bases: object

Class to manipulate small molecule structures

Parameters:
  • mol (rdkit.Chem.rdchem.Mol or filename or smile or moleculekit.smallmol.smallmol.SmallMol) – (i) Rdkit molecule or (ii) Location of molecule file (“.pdb”/”.mol2”) or (iii) a smile string or iv) another SmallMol object or v) moleculekit.molecule.Molecule object

  • ignore_errors (bool) – If True, errors will not be raised.

  • force_reading (bool) – If True, and the mol provided is not accepted, the molecule will be initially converted into sdf

  • fixHs (bool) – If True, the missing hydrogens are assigned, the others are correctly assinged into the graph of the molecule

  • removeHs (bool) – If True, remove the hydrogens

  • verbose (bool) – If True, additional information is logged during initialization.

  • sanitize (bool) – If True, the molecule is sanitized after reading.

Examples

>>> import os
>>> from moleculekit.smallmol.smallmol import SmallMol
>>> SmallMol('CCO')
>>> SmallMol('ligand.pdb', fixHs=False, removeHs=True )
>>> sm = SmallMol('benzamidine.mol2')
>>> print(sm)
SmallMol with 18 atoms and 1 conformers
Atom field - bondtype
Atom field - charge
...

Methods

Attributes

addHs(addCoords=True)#

Adds explicit hydrogen atoms to the molecule in place.

Parameters:

addCoords (bool) – If True, 3D coordinates are also generated for the added hydrogens. Default: True

align(refmol)#

Aligns the molecule in place onto a reference molecule using an Open3DAlign overlay.

The molecule’s coordinates are modified so that it is superimposed onto the reference and the resulting RMSD is logged.

Parameters:

refmol (SmallMol or rdkit.Chem.rdchem.Mol or moleculekit.molecule.Molecule) – The reference molecule to align this molecule onto

assignStereoChemistry(from3D=True)#

Assigns stereochemistry to the molecule in place.

Parameters:

from3D (bool) – If True, the stereochemistry is derived from the 3D conformer coordinates. If False, it is assigned from the molecular graph, recomputing and overwriting any existing stereo information. Default: True

containsMetals(metalSMARTS='[Mg,Ca,Zn,As,Mn,Al,Pd,Pt,Co,Ba,Cr,Cu,Ni,Ag,Fe,Hg,Cd,Gd,Na]')#

Returns True if the molecule contains metals

Parameters:

metalSMARTS (str) – SMARTS for detecting metals

Returns:

contains – True if the molecule contains metals, else False

Return type:

bool

copy()#

Create a copy of the molecule object

Returns:

newsmallmol – A copy of the object

Return type:

SmallMol

depict(sketch=True, filename=None, ipython=False, optimize=False, optimizemode='std', removeHs=True, atomlabels=None, highlightAtoms=None, resolution=(400, 200))#

Depicts the molecules. It is possible to save it into an svg file and also generates a jupiter-notebook rendering

Parameters:
  • sketch (bool) – Set to True for 2D depiction

  • filename (str | None) – Set the filename for the svg file

  • ipython (bool) – Set to True to return the jupiter-notebook rendering

  • optimize (bool) – Set to True to optimize the conformation. Works only with 3D.

  • optimizemode (str) – Set the optimization mode for 3D conformation

  • removeHs (bool) – Set to True to hide hydrogens in the depiction

  • atomlabels (str | None) – Accept any combinations of the following pararemters as unique string ‘%a%i%c%*’ a:atom name, i:atom index, c:atom formal charge (+/-), :chiral ( if atom is chiral)

  • highlightAtoms (list | None) – List of atom to highlight. It can be also a list of atom list, in this case different colors will be used

  • resolution (tuple) – Resolution in pixels: (X, Y)

Returns:

ipython_svg – An SVG rendering object if ipython is True, otherwise None

Return type:

IPython.display.SVG or None

Example

>>> sm.depict(ipython=True, optimize=True, optimizemode='std')
>>> sm.depict(ipython=True, sketch=True)
>>> sm.depict(ipython=True, sketch=True)
>>> sm.depict(ipython=True, sketch=True, atomlabels="%a%i%c")
>>> ids = np.intersect1d(sm.get('idx', 'hybridization SP2'), sm.get('idx', 'element C'))
>>> sm.depict(ipython=True, sketch=True,highlightAtoms=ids.tolist(), removeHs=False)
dropFrames(frames='all')#

Removes conformers (frames) from the molecule in place.

Parameters:

frames (str | int | list | ndarray) – The frame indices to remove. Use "all" to remove every conformer, an integer to remove a single frame, or a list/array of indices to remove several. Default: “all”

Raises:

RuntimeError – If any requested frame index is greater than or equal to the number of conformers

filter(sel)#

Not implemented.

Parameters:

sel (str | ndarray) – Atom selection (string, boolean mask, or integer index array).

Raises:

NotImplementedError – Always, since filtering atoms is not supported.

foundBondBetween(sel1, sel2, bondtype=None)#

Checks whether at least one bond exists between the two atom selections.

It is possible to restrict the check to a specific bond type. If one or more matching bonds are found, a tuple (True, details) is returned where details describes each bond. If no matching bond is found, the bare value False is returned.

Parameters:
  • sel1 (str) – The selection for the first set of atoms

  • sel2 (str) – The selection for the second set of atoms

  • bondtype (str | int | None) – The bondtype as index or string Default: None

Returns:

result – If a bond was found, a tuple (True, details) where details is a list of lists, each holding the (idx1, idx2) atom indices of the bond and its bond type as a string. If no bond was found, the bare boolean False.

Return type:

tuple or bool

property frame: int#

The currently active conformer (frame) index.

Returns:

frame – The index of the active conformer

Return type:

int

Raises:

RuntimeError – If the stored frame index is out of the range of available conformers

generateConformers(num_confs=400, optimizemode='mmff', align=True, append=True, pruneRmsThresh=0.5, maxAttempts=10000, seed=None, numThreads=1, useRandomCoords=True)#

Generates ligand conformers

Parameters:
  • num_confs (int) – Number of conformers to generate.

  • optimizemode (str) – The optimizemode to use. Can be ‘uff’, ‘mmff’

  • align (bool) – If True, the conformer are aligned to the first one

  • append (bool) – If False, the current conformers are deleted

  • pruneRmsThresh (float) – The RMSD threshold for pruning conformers

  • maxAttempts (int) – The maximum number of attempts to generate conformers

  • seed (int | None) – The seed for the random number generator

  • numThreads (int) – The number of threads to use when embedding multiple conformations

  • useRandomCoords (bool) – Start the embedding from random coordinates instead of using eigenvalues of the distance matrix

get(returnField, sel='all', convertType=True, invert=False)#

Returns the property for the atom specified with the selection. The selection is another atom property

Parameters:
  • returnField (str) – The field of the atom to return

  • sel (str) – The selection string. It is an atom field name followed by one or more space-separated values to match for that field, for example "idx 0 1 7" or "element N". Atoms whose value for that field equals any of the given values are selected. Use "all" to select every atom.

  • convertType (bool) – If True, and where possible the returnField is converted in rdkit object Default: True

  • invert (bool) – If True, the selection is inverted Default: False

Returns:

values – The array of values for the property

Return type:

np.array

Example

>>> sm.get('element', 'idx 0 1 7')
array(['C', 'C', 'H'],
      dtype='<U1')
>>> sm.get('hybridization', 'element N')
array([rdkit.Chem.rdchem.HybridizationType.SP2,
       rdkit.Chem.rdchem.HybridizationType.SP2], dtype=object)
>>> sm.get('hybridization', 'element N', convertType=False)
array([3, 3])
>>> sm.get('element', 'hybridization sp2')
array(['C', 'C', 'C', 'C', 'C', 'C', 'C', 'N', 'N'],
      dtype='<U1')
>>> sm.get('element', 'hybridization S')
array(['H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H'],
      dtype='<U1')
>>> sm.get('element', 'hybridization 1')
array(['H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H'],
      dtype='<U1')
>>> sm.get('atomobject', 'element N')
array([<rdkit.Chem.rdchem.Atom object at 0x7faf616dd120>,
       <rdkit.Chem.rdchem.Atom object at 0x7faf616dd170>], dtype=object)
getAtoms()#

Returns an array with the rdkit.Chem.rdchem.Atom objects present in the molecule.

Returns:

atoms – An object array of the rdkit.Chem.rdchem.Atom objects of the molecule

Return type:

ndarray

getCenter()#

Returns the geometrical center of the molecule for the currently active conformation.

Returns:

center – The (x, y, z) coordinates of the geometrical center

Return type:

ndarray

getDescriptors(prefix='', ignore=('Ipc',))#

Calculate descriptors for the molecule

Returns rdkit descriptors for the molecule, like DESC_NumRotatableBonds or DESC_MolLogP. See rdkit.Chem.Descriptors for more.

Parameters:
  • prefix (str) – A string prefix to add to all dictionary keys

  • ignore (list | tuple) – A list of descriptors which to not calculate

Returns:

descriptors – A dictionary containing all descriptors of the molecule

Return type:

dict

getFingerprint(mode, radius=2, num_bits=1024)#

Computes a single molecular fingerprint of the requested type.

Parameters:
  • mode (str) – The fingerprint type to compute. One of ‘Morgan’, ‘MACCS’, ‘AvalonCount’.

  • radius (int) – Radius to define a local environment. Only used for the ‘Morgan’ fingerprint.

  • num_bits (int) – The number of bits to use in the fingerprint. Larger avoids collisions. Used for the ‘Morgan’ and ‘AvalonCount’ fingerprints.

Returns:

fingerprint – The computed fingerprint for the chosen mode: a hashed Morgan count fingerprint for ‘Morgan’, a MACCS keys bit vector for ‘MACCS’, or an Avalon count fingerprint for ‘AvalonCount’.

Return type:

rdkit fingerprint object

Raises:

RuntimeError – If mode is not one of the supported fingerprint types

getProp(prop_name)#

Returns a given property of the molecule.

Parameters:

prop_name (str) – The name of the property to return

Returns:

value – The value of the property

Return type:

str

getTautomers(canonical=True, genConformers=False, returnScores=True, maxTautomers=200, filterTauts=None)#

Enumerates the tautomers of the molecule.

Parameters:
  • canonical (bool) – If True, only the single canonical tautomer is returned. If False, all enumerated tautomers are returned. Default: True

  • genConformers (bool) – If True, a conformer is generated for each returned tautomer. Default: False

  • returnScores (bool) – If True, the tautomer scores are also returned alongside the tautomers. Default: True

  • maxTautomers (int) – The maximum number of tautomers to enumerate. Default: 200

  • filterTauts (float | None) – If not None, only tautomers whose score is within this value of the maximum score are kept. Default: None

Returns:

  • tautomers (SmallMolLib) – A library containing the enumerated tautomers

  • scores (list) – The scores of the returned tautomers. Only returned if returnScores is True.

isChiral(returnDetails=False)#

Returns True if the molecule has at least one chiral atom. If returnDetails is set as True, a list of tuples with the atom idx and chiral type is returned.

Parameters:

returnDetails (bool) – If True, returns the chiral atoms and their chiral types Default: False

Returns:

  • ischiral (bool) – True if the atom has at least a chiral atom

  • details (list) – A list of tuple with the chiral atoms and their types

Example

>>> chiralmol.isChiral()
True
>>> chiralmol.isChiral(returnDetails=True)
(True, [('C2', 'R')])
property ligname: str#

The ligand name of the molecule.

Returns the value of the molecule’s _Name property, or "UNK" if it is not set.

Returns:

ligname – The ligand name

Return type:

str

property numAtoms: int#

The number of atoms in the molecule.

Returns:

numatoms – The number of atoms

Return type:

int

property numFrames: int#

The number of conformers (frames) of the molecule.

Returns:

numframes – The number of conformers

Return type:

int

removeHs()#

Removes explicit hydrogen atoms from the molecule in place.

sanitize()#

Sanitizes the molecule in place using rdkit.

This cleans up the molecule, computing properties such as valences, ring information and aromaticity.

setProp(key, value)#

Sets a property on the molecule.

The value is stored as a string on the underlying molecule.

Parameters:
  • key (str) – The name of the property to set

  • value – The value to store. It is converted to a string before being stored.

stripSalts()#

Removes any salts from the molecule

toMolecule(ids=None)#

Return the moleculekit.molecule.Molecule

Parameters:

ids (list | None) – The list of conformer ids to store in the moleculekit Molecule object- If None, all are returned Default: None

Returns:

mol – The moleculekit Molecule object

Return type:

Molecule

toSMARTS(explicitHs=False)#

Returns the smarts string of the molecule

Parameters:

explicitHs (bool) – Set as True for keep the hydrogens

Returns:

smart – The smarts string

Return type:

str

toSMILES(explicitHs=False, kekulizeSmile=True)#

Returns the smiles string of the molecule

Parameters:
  • explicitHs (bool) – Set as True to keep the hydrogens

  • kekulizeSmile (bool) – Set as True to return the kekule smile format

Returns:

smi – The smiles string

Return type:

str

view(*args, **kwargs)#

Visualizes the molecule.

The molecule is converted to a moleculekit.molecule.Molecule and all arguments are forwarded to its view method.

write(fname, frames=None, merge=True)#

Writes the molecule to a file.

The output format is determined by the file extension. For .sdf files the molecule is written with rdkit; other formats are written by first converting to a moleculekit.molecule.Molecule.

Parameters:
  • fname (str) – The output file name. The extension determines the file format.

  • frames (list | None) – The conformer indices to write. If None, all conformers are written. Default: None

  • merge (bool) – Only used for .sdf output. If True, all conformers are written to a single file. If False, one file is written per conformer with the frame index appended to the file name. Default: True