Mutation, gap closing, and auto-segmentation#
You will learn: how to fix missing chain/segment IDs with autoSegment(), mutate a residue using the Dunbrack rotamer library, and (with the ProMod3 Singularity image) close missing-residue gaps — the surrounding workflow you typically combine with systemPrepare().
Prerequisites:
The Custom residues from SMILES tutorial.
Setup#
from moleculekit.molecule import Molecule
from moleculekit.tools.preparation import systemPrepare
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.modelling import model_gaps
Step 1 — Auto-segment a structure with a chain break#
We use 1ITG, an integrin headpiece that has 13 residues missing from the deposited structure (residues 141–153 are absent from the PDB file). After loading and filtering to protein only, the molecule carries a single chain letter but lacks meaningful segids:
mol = Molecule("1itg")
mol.filter("protein")
moleculekit.molecule - INFO - Removed 67 atoms. 1070 atoms remaining in the molecule.
array([1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080,
1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091,
1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102,
1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113,
1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124,
1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135,
1136], dtype=uint32)
Loading drops 67 non-protein atoms, leaving a single chain A and the deposited segid 1. There’s no need to reset those — autoSegment() with fields=("chain", "segid") overwrites both fields from scratch based on backbone continuity, so any pre-existing chain/segid values are irrelevant.
mol = autoSegment(mol, sel="protein", fields=("chain", "segid"))
sorted(set(zip(mol.chain, mol.segid)))
moleculekit.tools.autosegment - INFO - autoSegment: break before MET:154:A (after GLY:140:A)
[('A', 'P0'), ('B', 'P1')]
autoSegment() detects that the backbone is broken between GLY 140 and MET 154 (the flanking residues of the gap) — their C–N distance far exceeds the peptide-bond cutoff (protein_cutoff, 2 Å by default) — and so it creates two independent segments: P0 on chain A (residues 55–140) and P1 on chain B (residues 154–209). Both the chain and segid fields are now consistent, which avoids warnings during systemPrepare().
Step 2 — Mutate a residue with the “best” rotamer#
We mutate GLN 95 (a surface-exposed glutamine in segment P0) to tryptophan — a bulky aromatic sidechain whose placement genuinely depends on the rotamer chosen (alanine, by contrast, has no rotameric freedom, so its "best" and "random" modes would be indistinguishable). The "best" rotamer mode queries the Dunbrack backbone-dependent rotamer library for the phi/psi bin of that residue and picks the rotamer with the lowest van der Waals clash energy against the surrounding atoms:
mut = mol.copy()
print("before:", set(mut.resname[(mut.chain == "A") & (mut.resid == 95)]))
mut.mutateResidue("chain A and resid 95", "TRP", rotamer_mode="best")
print("after :", set(mut.resname[(mut.chain == "A") & (mut.resid == 95)]))
before: {'GLN'}
after : {'TRP'}
before: {'GLN'} then after: {'TRP'} — the mutation flipped the residue identity. mutateResidue() modifies the molecule in place: the old sidechain is stripped, an ideal TRP template is Kabsch-aligned onto the backbone (N, CA, C), and the new sidechain is placed. Notice that the two masks are recomputed against mut after the mutation, not reused from before — the mutation changes the size of the atom array (TRP’s indole sidechain has more atoms than GLN’s), and a precomputed mask would now be the wrong length (see the warning in the Atom selection tutorial).
Step 3 — Compare “random” vs “best”, and add minimization#
rotamer_mode="random" samples a rotamer weighted by its library probability rather than scoring clash energy, which is faster but may leave residual steric strain:
mut_random = mol.copy()
mut_random.mutateResidue("chain A and resid 95", "TRP", rotamer_mode="random")
Use "random" when throughput matters more than placement quality.
Adding minimize=True invokes an OpenMM soft-potential minimization after rotamer placement that relaxes sidechain dihedrals to remove residual strain. When OpenMM is not installed the step is silently skipped:
mut_min = mol.copy()
mut_min.mutateResidue("chain A and resid 95", "TRP", rotamer_mode="best", minimize=True)
moleculekit.openmmtools - INFO - Soft-potential minimization: 59.8 -> 59.3 kcal/mol (delta = -0.5 kcal/mol)
minimize=True is the safest choice for downstream MD setup; without it the rotamer placement is still physically reasonable but may retain small clashes.
Step 4 — Close the gap with ProMod3#
model_gaps() calls ProMod3 inside a Singularity/Apptainer container to reconstruct the missing loop. This cell is skipped in CI because it requires the ProMod3 image:
modeled_p0 = model_gaps(
mol,
sequence="DCSPGIWQLDCTHLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTVHTDNGSNFTSTTVKAACWWAGIKQEFGIPNKELKKIIGQVRDQAEHLKTAVQMAVFIHNKKRKGGIGGYSAGERIVDIIATDIQ",
segid="P0",
promod_img="/path/to/promod3.sif",
)
model_gaps() aligns sequence against the atoms present in segment P0, writes a temporary FASTA alignment, runs ProMod3’s loop-modelling pipeline inside the container, and returns a new Molecule with the gap filled.
ProMod3 is available as an Apptainer/Singularity image from the OpenStructure project — follow the instructions at https://openstructure.org/promod3/ to obtain promod3.sif. If neither apptainer nor singularity is on the PATH, model_gaps() raises a RuntimeError immediately; there is no fallback gap-filling path.
Step 5 — Wrap up with systemPrepare#
With segmentation and mutation done, pass the result through systemPrepare() to assign protonation states:
pmol, specs = systemPrepare(mut_min, verbose=False)
pmol.numAtoms
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 1 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 67 A (pKa= 6.87)
2212
The full pipeline — segment, mutate, prepare — is now complete.
Recap#
autoSegment()detects backbone discontinuities from atomic coordinates and assigns a unique segid (and optionally chain letter) per backbone-continuous segment; usefields=("chain", "segid")to keep both fields consistent.mutateResidue()withselandnewresswaps a residue’s sidechain using Dunbrack rotamer selection:rotamer_mode="best"minimises VdW clashes against neighbours,rotamer_mode="random"samples by probability for speed. Addminimize=Trueto relax residual strain with OpenMM.model_gaps()fills missing residues by sequence using the ProMod3 loop-modelling engine — but it requires the ProMod3 Singularity image; there is no fallback.
Next#
Back to the system-prep index