How to generate docking poses for downstream MD#

Goal#

Use HTMD’s dock() wrapper around AutoDock Vina to generate candidate poses of a ligand inside (or around) a protein binding pocket, then write each pose out as a starting structure for a separate MD trajectory.

Minimal example#

from moleculekit.molecule import Molecule
from htmd.dock import dock

protein = Molecule("protein.pdb")
ligand  = Molecule("ligand.mol2")            # or .sdf / .pdbqt

poses, scores = dock(protein, ligand,
                     center=protein.getCenter("resname POCKET_RES"),
                     extent=[20, 20, 20],    # XYZ box size in Å
                     numposes=20)

for i, pose in enumerate(poses):
    pose.write(f"./poses/pose_{i:02d}.pdb")
    print(f"pose {i:2d}: affinity {scores[i, 0]:6.2f} kcal/mol, "
          f"RMSD lb {scores[i, 1]:.2f} ub {scores[i, 2]:.2f}")

dock() calls AutoDock Vina under the hood and returns a 2-tuple: poses is a list of Molecule objects (ligand-only, in their docked positions), and scores is an (N, 3) numpy array of Vina’s per-pose (affinity kcal/mol, RMSD lower bound, RMSD upper bound). Merge each pose with the protein if you need a complex.

Parameters that matter#

Parameter

What it does

protein, ligand

The receptor and ligand molecules. Hydrogens should already be present and correctly assigned at your target pH (use systemPrepare() upstream).

center

XYZ centre of the search box in Å. Pass the centroid of a known binding pocket residue, or a literal [x, y, z].

extent

Box edge lengths in Å ([dx, dy, dz]). Larger boxes cover more of the protein surface but slow the search and dilute scoring.

numposes

Maximum number of poses Vina emits. Vina caps this at 20; values above 20 silently return at most 20 poses.

babelexe

Path to obabel (Open Babel) for format conversion. Default "obabel" - must be on $PATH.

vinaexe

Path to the Vina binary. Default is f"{platform.system()}-vina" (e.g. Linux-vina, Darwin-vina), resolved via shutil.which - the platform-prefixed binary must be on $PATH.

Common variations#

Build an MD system from each pose#

from htmd.builder import amber
from htmd.builder.solvate import solvate
from moleculekit.tools.preparation import systemPrepare

for i, pose in enumerate(poses):
    complex_mol = protein.copy()
    complex_mol.append(pose)
    prepared, _ = systemPrepare(complex_mol, pH=7.4)
    solvated = solvate(prepared, pad=12)
    amber.build(solvated, outdir=f"./build_pose_{i:02d}",
                ionize=True, saltconc=0.15)

This is the docking → MD pipeline: generate poses, build a complete simulation system from each, then run all of them as parallel trajectories (the seed pool for an adaptive-binding study).

Targeted docking with multiple ligands#

from glob import glob
all_poses = {}
all_scores = {}
for ligand_path in glob("./compound_library/*.mol2"):
    lig = Molecule(ligand_path)
    poses, scores = dock(protein, lig, center=center, extent=extent, numposes=10)
    all_poses[ligand_path] = poses
    all_scores[ligand_path] = scores

For library screening you’d want a more featureful Vina wrapper (parallel docking, per-compound scoring) - this loop pattern handles small libraries (≤100 compounds) on a single workstation.

Re-score Vina poses with FFEvaluate#

from ffevaluation.ffevaluate import FFEvaluate, loadParameters

# Build the complex once (any pose), then plug each pose's coordinates back in
complex0 = protein.copy(); complex0.append(poses[0])
prepared, _ = systemPrepare(complex0, pH=7.4)
amber.build(prepared, outdir="./scoring_build", ionize=True, saltconc=0.15)

# Load the built system: prmtop carries the topology (bonds, atom types, charges);
# the PDB provides the coordinates ParameterSet doesn't store.
prm  = loadParameters("./scoring_build/structure.prmtop")
mol  = Molecule("./scoring_build/structure.prmtop")
mol.read("./scoring_build/structure.pdb")
ffev = FFEvaluate(mol, prm, betweensets=("protein", "resname LIG"))

# Boolean mask for the ligand atoms in `mol`. The pose Molecules from dock()
# are ligand-only and share the same atom order, so we can slot pose coords
# straight into the ligand block of `mol.coords`.
lig_mask = mol.resname == "LIG"
assert lig_mask.sum() == poses[0].numAtoms, "ligand atom count must match"

for i, pose in enumerate(poses):
    mol.coords[lig_mask] = pose.coords
    score = ffev.calculateEnergies(mol.coords)["total"]
    print(f"pose {i}: ff-score {score:.2f} kcal/mol")

Useful sanity check: Vina’s score isn’t a real energy - cross-checking with a force-field re-score (or MM-GBSA later) can flag obviously-wrong poses.

Gotchas#

  • Inputs must be single-frame. Both protein and ligand must have exactly one frame in mol.coords (third axis = 1). dock raises NameError otherwise - drop extra frames with mol.dropFrames(keep=0) before calling.

  • Vina expects PDBQT format. HTMD’s dock calls Open Babel internally to convert your .pdb / .mol2 to PDBQT and back. If obabel is missing or returns a malformed PDBQT, dock raises a confusing Vina error - check obabel --version first.

  • Hydrogens must be assigned before docking. Vina assumes the ligand and protein are at their correct protonation states; running dock on a heavy-atom-only PDB gives docked-but-wrong poses. Use systemPrepare upstream.

  • A 20 Å × 20 Å × 20 Å search box is the usual default for a single pocket. For metal-binding sites, peptide-binding grooves, or large allosteric cavities, increase the box.

  • The poses returned are ligand-only - you need to append the protein (or build the system from the complex) for any MD downstream.

  • Vina scores are not free energies. Use them for ranking, not for thermodynamic claims.

See also#