How to convert a Molecule to RDKit or OpenFF#

Goal#

Hand a Molecule off to RDKit (for cheminformatics) or to the OpenFF Toolkit (for force-field assignment, charge calculation, parameterization), and round-trip the result back if needed.

Which conversion to use#

You want…

Call

An rdkit.Chem.Mol for substructure search, descriptors, fingerprints, SMILES generation

toRDKitMol()

An openff.toolkit.topology.Molecule for OpenFF parameterization, partial-charge calculation, force-field assignment

toOpenFFMolecule()

Round-trip back from RDKit to a moleculekit Molecule

fromRDKitMol()

The OpenFF conversion goes through RDKit internally, so a healthy RDKit conversion is the prerequisite for a healthy OpenFF conversion.

Minimal example#

from moleculekit.molecule import Molecule
from moleculekit.tools.preparation import systemPrepare

mol = Molecule("3PTB")

# Template the ligand from RCSB SMILES so it has correct bond orders and
# formal charges before conversion.
mol.templateResidueFromSmiles(
    mol.resname == "BEN",
    smiles="NC(=N)c1ccccc1",
    addHs=True,
)

# Copy the ligand into a standalone Molecule for cheminformatics work.
lig = mol.copy(sel="resname BEN")
rdmol = lig.toRDKitMol(sanitize=True)

rdmol is now a fully-fledged rdkit.Chem.Mol you can pass into any RDKit pipeline:

from rdkit import Chem
print(Chem.MolToSmiles(rdmol))   # canonical SMILES
print(Chem.Descriptors.MolWt(rdmol))

Parameters that matter#

Molecule.toRDKitMol#

Parameter

Type

Default

What it does

sanitize

bool

False

Run RDKit’s sanitization (valence, aromaticity, kekulization). Required for most downstream RDKit operations.

kekulize

bool

False

Force Kekulé bond perception (alternating single/double) instead of aromatic flags.

assignStereo

bool

True

Assign stereochemistry from 3D coordinates.

guessBonds

bool

False

If mol.bonds is empty, run a distance-based guesser before converting.

Molecule.toOpenFFMolecule#

Parameter

Type

Default

What it does

sanitize

bool

False

Forwarded to the internal toRDKitMol call.

kekulize

bool

False

Forwarded to the internal toRDKitMol call.

assignStereo

bool

True

Forwarded to the internal toRDKitMol call.

Per-atom mol.charge values are copied onto offmol.partial_charges automatically. Residue / chain / insertion identity is carried through offmol.atoms[i].metadata so the OpenFF Topology hierarchy schemes can reconstruct the residue structure.

Common variations#

# OpenFF Molecule for parameter assignment
offmol = lig.toOpenFFMolecule(sanitize=True)

# Assign GAFF parameters via OpenFF
from openff.toolkit.typing.engines.smirnoff import ForceField
ff = ForceField("openff-2.1.0.offxml")
system = ff.create_openmm_system(offmol.to_topology())
# Build a Molecule from an RDKit Mol (e.g. a SMILES + embedded conformer)
from rdkit import Chem
from rdkit.Chem import AllChem

rdmol = Chem.MolFromSmiles("CCO")
rdmol = Chem.AddHs(rdmol)
AllChem.EmbedMolecule(rdmol)
AllChem.MMFFOptimizeMolecule(rdmol)
mol = Molecule.fromRDKitMol(rdmol)
# Convert an entire protein–ligand complex (not just the ligand)
rdmol = mol.toRDKitMol(sanitize=False)   # sanitize=False for non-canonical residues

Gotchas#

  • Conversion needs explicit bonds. If mol.bonds is empty (plain PDB load), run templateResidueFromSmiles() for non-canonical residues so the conversion gets correct bond orders, or pass guessBonds=True for a distance-based fallback.

  • sanitize=True will raise on residues whose bonding or formal charges are inconsistent with chemical rules. For a freshly read PDB this often happens around metal centers and covalent ligands; template those residues first (see Custom residues from SMILES).

  • Hydrogens must be present for stereochemistry / valence checks. Run systemPrepare() (or set explicit Hs via templateResidueFromSmiles(..., addHs=True)) before conversion.

  • toOpenFFMolecule requires openff-toolkit and openff-units installed.

See also#