Mutation, gap closing, and auto-segmentation#
You will learn: how to fix missing chain/segment IDs with autoSegment(), mutate a residue using the Dunbrack rotamer library, and (with the ProMod3 Singularity image) close missing-residue gaps — the surrounding workflow you typically combine with systemPrepare().
Prerequisites:
The Custom residues from SMILES tutorial.
Setup#
from moleculekit.molecule import Molecule
from moleculekit.tools.preparation import systemPrepare
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.modelling import model_gaps
Step 1 — Auto-segment a structure with a chain break#
We use 1ITG, an integrin headpiece that has 13 residues missing from the deposited structure (residues 141–153 are absent from the PDB file). After loading and filtering to protein only, the molecule carries a single chain letter but lacks meaningful segids:
mol = Molecule("1itg")
mol.filter("protein")
mol.segid[:] = ""
mol.chain[:] = ""
moleculekit.molecule - INFO - Removed 67 atoms. 1070 atoms remaining in the molecule.
Loading drops 67 non-protein atoms; clearing segids and chain IDs gives us a clean slate that mirrors what you receive from many homology-model or trajectory outputs.
mol = autoSegment(mol, sel="protein", fields=("chain", "segid"))
sorted(set(zip(mol.chain, mol.segid)))
moleculekit.tools.autosegment - INFO - Residue gap between GLY:140: and MET:154: with backbone distance 17.3A > 4.0A. Creating new chain.
[('A', 'P0'), ('B', 'P1')]
autoSegment() detects that the backbone distance between GLY 140 and MET 154 (the flanking residues of the gap) exceeds the default 4 Å spatial threshold, and so it creates two independent segments: P0 on chain A (residues 55–140) and P1 on chain B (residues 154–209). Both the chain and segid fields are now consistent, which avoids warnings during systemPrepare().
Step 2 — Mutate a residue with the “best” rotamer#
We mutate GLN 95 (a surface-exposed glutamine in segment P0) to alanine. The "best" rotamer mode queries the Dunbrack backbone-dependent rotamer library for the phi/psi bin of that residue and picks the rotamer with the lowest van der Waals clash energy against the surrounding atoms:
mut = mol.copy()
print("before:", set(mut.resname[(mut.chain == "A") & (mut.resid == 95)]))
mut.mutateResidue("chain A and resid 95", "ALA", rotamer_mode="best")
print("after :", set(mut.resname[(mut.chain == "A") & (mut.resid == 95)]))
before: {'GLN'}
after : {'ALA'}
before: {'GLN'} then after: {'ALA'} — the mutation flipped the residue identity. mutateResidue() modifies the molecule in place: the old sidechain is stripped, an ideal ALA template is Kabsch-aligned onto the backbone (N, CA, C), and the new sidechain is placed. Notice that the two masks are recomputed against mut after the mutation, not reused from before — the mutation shrinks the atom array (GLN has more sidechain atoms than ALA), and a precomputed mask would now be the wrong length (see the warning in the Atom selection tutorial).
Step 3 — Compare “random” vs “best”, and add minimization#
rotamer_mode="random" samples a rotamer weighted by its library probability rather than scoring clash energy, which is faster but may leave residual steric strain:
mut_random = mol.copy()
mut_random.mutateResidue("chain A and resid 95", "ALA", rotamer_mode="random")
Use "random" when throughput matters more than placement quality.
Adding minimize=True invokes an OpenMM soft-potential minimization after rotamer placement that relaxes sidechain dihedrals to remove residual strain. When OpenMM is not installed the step is silently skipped:
mut_min = mol.copy()
mut_min.mutateResidue("chain A and resid 95", "ALA", rotamer_mode="best", minimize=True)
moleculekit.openmmtools - INFO - OpenMM not available -- skipping soft-potential minimization.
minimize=True is the safest choice for downstream MD setup; without it the rotamer placement is still physically reasonable but may retain small clashes.
Step 4 — Close the gap with ProMod3#
model_gaps() calls ProMod3 inside a Singularity/Apptainer container to reconstruct the missing loop. This cell is skipped in CI because it requires the ProMod3 image:
modeled_p0 = model_gaps(
mol,
sequence="DCSPGIWQLDCTHLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTVHTDNGSNFTSTTVKAACWWAGIKQEFGIPNKELKKIIGQVRDQAEHLKTAVQMAVFIHNKKRKGGIGGYSAGERIVDIIATDIQ",
segid="P0",
promod_img="/path/to/promod3.sif",
)
model_gaps() aligns sequence against the atoms present in segment P0, writes a temporary FASTA alignment, runs ProMod3’s loop-modelling pipeline inside the container, and returns a new Molecule with the gap filled.
ProMod3 is available as an Apptainer/Singularity image from the OpenStructure project — follow the instructions at https://openstructure.org/promod3/ to obtain promod3.sif. If neither apptainer nor singularity is on the PATH, model_gaps() raises a RuntimeError immediately; there is no fallback gap-filling path.
Step 5 — Wrap up with systemPrepare#
With segmentation and mutation done, pass the result through systemPrepare() to assign protonation states:
pmol, specs = systemPrepare(mut_min, verbose=False)
pmol.numAtoms
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 1 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 67 A (pKa= 6.87)
2198
The full pipeline — segment, mutate, prepare — is now complete.
Recap#
autoSegment()detects backbone discontinuities and assigns a unique segid (and optionally chain letter) per topologically connected fragment; usefields=("chain", "segid")to keep both fields consistent.mutateResidue()withselandnewresswaps a residue’s sidechain using Dunbrack rotamer selection:rotamer_mode="best"minimises VdW clashes against neighbours,rotamer_mode="random"samples by probability for speed. Addminimize=Trueto relax residual strain with OpenMM.model_gaps()fills missing residues by sequence using the ProMod3 loop-modelling engine — but it requires the ProMod3 Singularity image; there is no fallback.
Next#
Back to the system-prep index