How to convert a Molecule to RDKit or OpenFF#
Goal#
Hand a Molecule off to RDKit (for cheminformatics) or to the OpenFF Toolkit (for force-field assignment, charge calculation, parameterization), and round-trip the result back if needed.
Which conversion to use#
You want… |
Call |
|---|---|
An |
|
An |
|
Round-trip back from RDKit to a moleculekit Molecule |
The OpenFF conversion goes through RDKit internally, so a healthy RDKit conversion is the prerequisite for a healthy OpenFF conversion.
Minimal example#
from moleculekit.molecule import Molecule
from moleculekit.tools.preparation import systemPrepare
mol = Molecule("3PTB")
# Template the ligand from RCSB SMILES so it has correct bond orders and
# formal charges before conversion.
mol.templateResidueFromSmiles(
mol.resname == "BEN",
smiles="NC(=N)c1ccccc1",
addHs=True,
)
# Copy the ligand into a standalone Molecule for cheminformatics work.
lig = mol.copy(sel="resname BEN")
rdmol = lig.toRDKitMol(sanitize=True)
rdmol is now a fully-fledged rdkit.Chem.Mol you can pass into any RDKit pipeline:
from rdkit import Chem
print(Chem.MolToSmiles(rdmol)) # canonical SMILES
print(Chem.Descriptors.MolWt(rdmol))
Parameters that matter#
Molecule.toRDKitMol#
Parameter |
Type |
Default |
What it does |
|---|---|---|---|
|
|
|
Run RDKit’s sanitization (valence, aromaticity, kekulization). Required for most downstream RDKit operations. |
|
|
|
Force Kekulé bond perception (alternating single/double) instead of aromatic flags. |
|
|
|
Assign stereochemistry from 3D coordinates. |
|
|
|
If |
Molecule.toOpenFFMolecule#
Parameter |
Type |
Default |
What it does |
|---|---|---|---|
|
|
|
Forwarded to the internal |
|
|
|
Forwarded to the internal |
|
|
|
Forwarded to the internal |
Per-atom mol.charge values are copied onto offmol.partial_charges automatically. Residue / chain / insertion identity is carried through offmol.atoms[i].metadata so the OpenFF Topology hierarchy schemes can reconstruct the residue structure.
Common variations#
# OpenFF Molecule for parameter assignment
offmol = lig.toOpenFFMolecule(sanitize=True)
# Assign GAFF parameters via OpenFF
from openff.toolkit.typing.engines.smirnoff import ForceField
ff = ForceField("openff-2.1.0.offxml")
system = ff.create_openmm_system(offmol.to_topology())
# Build a Molecule from an RDKit Mol (e.g. a SMILES + embedded conformer)
from rdkit import Chem
from rdkit.Chem import AllChem
rdmol = Chem.MolFromSmiles("CCO")
rdmol = Chem.AddHs(rdmol)
AllChem.EmbedMolecule(rdmol)
AllChem.MMFFOptimizeMolecule(rdmol)
mol = Molecule.fromRDKitMol(rdmol)
# Convert an entire protein–ligand complex (not just the ligand)
rdmol = mol.toRDKitMol(sanitize=False) # sanitize=False for non-canonical residues
Gotchas#
Conversion needs explicit bonds. If
mol.bondsis empty (plain PDB load), runtemplateResidueFromSmiles()for non-canonical residues so the conversion gets correct bond orders, or passguessBonds=Truefor a distance-based fallback.sanitize=Truewill raise on residues whose bonding or formal charges are inconsistent with chemical rules. For a freshly read PDB this often happens around metal centers and covalent ligands; template those residues first (see Custom residues from SMILES).Hydrogens must be present for stereochemistry / valence checks. Run
systemPrepare()(or set explicit Hs viatemplateResidueFromSmiles(..., addHs=True)) before conversion.toOpenFFMoleculerequiresopenff-toolkitandopenff-unitsinstalled.