# How to generate docking poses for downstream MD ## Goal Use HTMD's {py:func}`~htmd.dock.dock` wrapper around AutoDock Vina to generate candidate poses of a ligand inside (or around) a protein binding pocket, then write each pose out as a starting structure for a separate MD trajectory. ## Minimal example ```python from moleculekit.molecule import Molecule from htmd.dock import dock protein = Molecule("protein.pdb") ligand = Molecule("ligand.mol2") # or .sdf / .pdbqt poses, scores = dock(protein, ligand, center=protein.getCenter("resname POCKET_RES"), extent=[20, 20, 20], # XYZ box size in Å numposes=20) for i, pose in enumerate(poses): pose.write(f"./poses/pose_{i:02d}.pdb") print(f"pose {i:2d}: affinity {scores[i, 0]:6.2f} kcal/mol, " f"RMSD lb {scores[i, 1]:.2f} ub {scores[i, 2]:.2f}") ``` {py:func}`~htmd.dock.dock` calls AutoDock Vina under the hood and returns a **2-tuple**: `poses` is a list of {py:class}`~moleculekit.molecule.Molecule` objects (ligand-only, in their docked positions), and `scores` is an `(N, 3)` numpy array of Vina's per-pose `(affinity kcal/mol, RMSD lower bound, RMSD upper bound)`. Merge each pose with the protein if you need a complex. ## Parameters that matter | Parameter | What it does | | --- | --- | | `protein`, `ligand` | The receptor and ligand molecules. Hydrogens should already be present and correctly assigned at your target pH (use {py:func}`~moleculekit.tools.preparation.systemPrepare` upstream). | | `center` | XYZ centre of the search box in Å. Pass the centroid of a known binding pocket residue, or a literal `[x, y, z]`. | | `extent` | Box edge lengths in Å (`[dx, dy, dz]`). Larger boxes cover more of the protein surface but slow the search and dilute scoring. | | `numposes` | Maximum number of poses Vina emits. Vina caps this at **20**; values above 20 silently return at most 20 poses. | | `babelexe` | Path to `obabel` (Open Babel) for format conversion. Default `"obabel"` - must be on `$PATH`. | | `vinaexe` | Path to the Vina binary. Default is `f"{platform.system()}-vina"` (e.g. `Linux-vina`, `Darwin-vina`), resolved via `shutil.which` - the platform-prefixed binary must be on `$PATH`. | ## Common variations ### Build an MD system from each pose ```python from htmd.builder import amber from htmd.builder.solvate import solvate from moleculekit.tools.preparation import systemPrepare for i, pose in enumerate(poses): complex_mol = protein.copy() complex_mol.append(pose) prepared, _ = systemPrepare(complex_mol, pH=7.4) solvated = solvate(prepared, pad=12) amber.build(solvated, outdir=f"./build_pose_{i:02d}", ionize=True, saltconc=0.15) ``` This is the docking → MD pipeline: generate poses, build a complete simulation system from each, then run all of them as parallel trajectories (the seed pool for an adaptive-binding study). ### Blind docking (whole-protein search box) ```python poses, scores = dock(protein, ligand, numposes=20) ``` Omitting `center` and `extent` triggers the whole-protein auto-bounding-box: `dock` sets the box centre to the protein's geometric centre and the extent to the protein's bounding-box dimensions plus a 10 Å buffer on each side. Useful when the binding site is unknown - no need to compute the box yourself. Expect noisier scores and more time per dock. ### Targeted docking with multiple ligands ```python from glob import glob all_poses = {} all_scores = {} for ligand_path in glob("./compound_library/*.mol2"): lig = Molecule(ligand_path) poses, scores = dock(protein, lig, center=center, extent=extent, numposes=10) all_poses[ligand_path] = poses all_scores[ligand_path] = scores ``` For library screening you'd want a more featureful Vina wrapper (parallel docking, per-compound scoring) - this loop pattern handles small libraries (≤100 compounds) on a single workstation. ### Re-score Vina poses with FFEvaluate ```python from ffevaluation.ffevaluate import FFEvaluate, loadParameters # Build the complex once (any pose), then plug each pose's coordinates back in complex0 = protein.copy(); complex0.append(poses[0]) prepared, _ = systemPrepare(complex0, pH=7.4) amber.build(prepared, outdir="./scoring_build", ionize=True, saltconc=0.15) # Load the built system: prmtop carries the topology (bonds, atom types, charges); # the PDB provides the coordinates ParameterSet doesn't store. prm = loadParameters("./scoring_build/structure.prmtop") mol = Molecule("./scoring_build/structure.prmtop") mol.read("./scoring_build/structure.pdb") ffev = FFEvaluate(mol, prm, betweensets=("protein", "resname LIG")) # Boolean mask for the ligand atoms in `mol`. The pose Molecules from dock() # are ligand-only and share the same atom order, so we can slot pose coords # straight into the ligand block of `mol.coords`. lig_mask = mol.resname == "LIG" assert lig_mask.sum() == poses[0].numAtoms, "ligand atom count must match" for i, pose in enumerate(poses): mol.coords[lig_mask] = pose.coords score = ffev.calculateEnergies(mol.coords)["total"] print(f"pose {i}: ff-score {score:.2f} kcal/mol") ``` Useful sanity check: Vina's score isn't a real energy - cross-checking with a force-field re-score (or MM-GBSA later) can flag obviously-wrong poses. ## Gotchas - **Inputs must be single-frame.** Both `protein` and `ligand` must have exactly one frame in `mol.coords` (third axis = 1). `dock` raises `NameError` otherwise - drop extra frames with `mol.dropFrames(keep=0)` before calling. - **Vina expects PDBQT format.** HTMD's `dock` calls Open Babel internally to convert your `.pdb` / `.mol2` to PDBQT and back. If `obabel` is missing or returns a malformed PDBQT, `dock` raises a confusing Vina error - check `obabel --version` first. - Hydrogens **must** be assigned before docking. Vina assumes the ligand and protein are at their correct protonation states; running `dock` on a heavy-atom-only PDB gives docked-but-wrong poses. Use `systemPrepare` upstream. - A 20 Å × 20 Å × 20 Å search box is the usual default for a single pocket. For metal-binding sites, peptide-binding grooves, or large allosteric cavities, increase the box. - The poses returned are **ligand-only** - you need to append the protein (or build the system from the complex) for any MD downstream. - Vina scores are not free energies. Use them for ranking, not for thermodynamic claims. ## See also - {doc}`How to use a custom force field with amber.build ` - feeding the docked complex into the build pipeline. - {doc}`How to evaluate force-field energies on a frame ` - re-scoring poses with a real force field. - [AutoDock Vina docs](https://vina.scripps.edu/) - flags / scoring details not exposed by HTMD's wrapper.