Build a bicyclic peptide#

You will learn: how to build a peptide whose backbone is threaded onto a small-molecule scaffold through three side-chain crosslinks, exemplified by PDB 8QFZ chain B - a 12-mer cyclic peptide whose three cysteines are thioether-linked to an LFI scaffold.

Prerequisites:

  • HTMD installed.

  • You’ve worked through Build a stapled peptide - this tutorial extends the same scaffolded-NCAA pattern.

Note

The workflow below is identical to Build a protein with a ligand - the only change is the single SMILES you pass to templateResidueFromSmiles() (for the LFI scaffold). detectNonStandardResidues() reads the three SG-Cn thioether bonds and the three CYS chain-positions from the input structure’s connectivity on its own, and parameterizeFromSpecs() emits a custombonds list and topology / parameter files which you still pass explicitly to build() — same plumbing as tutorial 02.

What the bicycle is#

PDB 8QFZ chain B is a 12-mer peptide containing three cysteine residues - CYS11 (N-terminal), CYS17 (mid-chain), CYS22 (C-terminal) - whose sulphur atoms (SG) are each thioether-bonded to one of the three anchor carbons (C10, C11, C12) of a small heterocyclic scaffold residue called LFI (1,3,5-tris(3-bromopropanoyl)hexahydro-1,3,5-triazine). Each SG-Cn closure replaces an LFI bromine, so the templating SMILES carries the unbound form with the three Br leaving groups.

The combined topology makes the peptide bicyclic: the backbone plus the three SG-Cn crosslinks define two non-trivial loops. In the build flow this means:

  1. detectNonStandardResidues() emits one ScaffoldSpec (for LFI) plus three ChainResidueSpec entries (for the three CYS), each with a different new_resname because the three CYS sit in three distinct chain-position buckets (N-terminal, mid-chain, C-terminal).

  2. parameterizeFromSpecs() produces one LFI.cif for the scaffold plus one .prepi per CYS bucket, and three entries in custombonds - one per SG-Cn thioether closure.

Note

This tutorial skips solvation and ionisation so the build runs in seconds and the focus stays on the scaffolding. For a production run, solvate first with solvate() (and keep ionize=True on the build). If you want implicit-solvent dynamics downstream, pass gbsa=True to build() — this only sets the GB-compatible radii on the prmtop; the GB model itself is enabled by the MD engine at run time.

Setup#

from moleculekit.molecule import Molecule
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
from moleculekit.tools.preparation import systemPrepare
from htmd.builder import amber
from htmd.builder.nonstandard import parameterizeFromSpecs
Copyright by Acellera Ltd. By executing you are accepting the License. In order to register, run htmd_register on your terminal.
The registration information must be valid so that it might be verified.
rdkit - INFO - Enabling RDKit 2026.03.3 jupyter extensions

Step 1 - Load and segment#

mol = Molecule("8QFZ")
mol.filter("chain B")
mol = autoSegment(mol, fields=("segid", "chain"))
moleculekit.molecule - INFO - Removed 958 atoms. 139 atoms remaining in the molecule.

Step 2 - Detect#

specs = detectNonStandardResidues(mol)
for spec in specs:
    print(spec)
ChainResidueSpec(resname='CYS', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fd15889d7c0>
UniqueResidueID<resname: 'CYS', chain: 'A', resid: 11, insertion: '', segid: 'P0'>, new_resname='XX1', anchor_atom='SG', is_n_term=True, is_c_term=False)
ChainResidueSpec(resname='CYS', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fd1064c50a0>
UniqueResidueID<resname: 'CYS', chain: 'A', resid: 17, insertion: '', segid: 'P0'>, new_resname='XX2', anchor_atom='SG', is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='CYS', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fd1064c5a30>
UniqueResidueID<resname: 'CYS', chain: 'A', resid: 22, insertion: '', segid: 'P0'>, new_resname='XX3', anchor_atom='SG', is_n_term=False, is_c_term=True)
ScaffoldSpec(resname='LFI', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fd1064c5a60>
UniqueResidueID<resname: 'LFI', chain: 'C', resid: 101, insertion: '', segid: 'P2'>)

You should see one ScaffoldSpec(resname='LFI', ...) and three ChainResidueSpec(resname='CYS', ..., new_resname='XX*') entries. The three CYS residues each get a different new_resname so the downstream parameterization can write a distinct topology per residue. Two reasons for the rename:

  1. They’re not canonical anymore. Each one carries an extra SG-Cn thioether bond to the scaffold, so the standard ff14SB CYS template no longer matches - the sulphur valence is different, and ff14SB wouldn’t know what to do at the new bond. Renaming to a non-canonical bucket lifts these residues out of the built-in CYS library and lets parameterizeFromSpecs write a per-residue topology that combines ff14SB backbone types with GAFF2 sulphur-side typing.

  2. Each sits in a different chain position. CYS11 is N-terminal, CYS17 is mid-chain, CYS22 is C-terminal - so the backbone hydrogen counts and formal charges differ between them. The detect step deduplicates by (resname, anchor_atom, partner_resname, is_n_term, is_c_term); giving the three CYS the same new_resname would collapse them into one bucket and lose the chain-position distinction.

The HG hydrogen on each CYS sulphur is dropped at build time by amber.build (the CYS anchor variants that ship with HTMD’s bond library have no HG); the rename plus the custombonds entry are what tell the builder which CYS variant to apply.

Step 3 - Template the LFI scaffold from SMILES#

LFI_SMILES = "C1N(CN(CN1C(=O)CCBr)C(=O)CCBr)C(=O)CCBr"
mol.templateResidueFromSmiles("resname LFI", LFI_SMILES, addHs=True)
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'CCC(=O)N1CN(C(=O)CC)CN(C(=O)CC)C1'

The SMILES describes the unbound LFI with all three bromopropanoyl arms intact. templateResidueFromSmiles matches the central triazinane skeleton against the residue in the structure, sets bond orders and formal charges on the matched atoms, then strips the three unmatched terminal Br atoms automatically - the same mechanism that strips a terminal -OH for mid-chain amino acids. What remains is the bound LFI form with three carbons (C10, C11, C12) primed for the thioether closures.

The three CYS residues are canonical and don’t need a SMILES template.

Step 4 - Prepare#

prepared, specs = systemPrepare(mol, pH=7.4, detect_specs=specs)
---- Molecule chain report ----
Chain A:
    First residue:  XX1    11  
    Final residue:  XX3    22  
Chain B:
    First residue:  HOH   201  
    Final residue:  HOH   214  
Chain C:
          residue:  LFI   101  
---- End of chain report ----
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'N[C@H](C=O)CS'
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'N[C@H](C=O)CS'
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'N[C@H](C=O)CS'
moleculekit.tools.preparation - INFO - Found 3 covalent bonds from protein to non-protein molecules.
moleculekit.tools.preparation - INFO - Freezing protein residue XX1:A:11 bonded to non-protein molecule LFI:C:101
moleculekit.tools.preparation - INFO - Freezing protein residue XX2:A:17 bonded to non-protein molecule LFI:C:101
moleculekit.tools.preparation - INFO - Freezing protein residue XX3:A:22 bonded to non-protein molecule LFI:C:101
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: LFI, XX3
moleculekit.tools.preparation - INFO - Modified residue HIS    12 A to HID

PDB2PQR protonates the canonical residues normally. The three CYS specs from step 2 carry the rename + HG-drop instructions, which systemPrepare applies at the end of the prep. The bond-capture mechanism preserves the three pre-existing SG-Cn bonds across the PDB2PQR pass.

Step 5 - Parameterize#

out = parameterizeFromSpecs(
    specs,
    prepared,
    outdir="./params",
    charge_method="gasteiger",
)
print(out)
ClusterOutputs(topo_paths=['./params/cluster_000/XX1.prepi', './params/cluster_000/XX2.prepi', './params/cluster_000/XX3.prepi', './params/cluster_000/LFI.cif'], frcmod_paths=['./params/cluster_000/XX1.frcmod', './params/cluster_000/XX2.frcmod', './params/cluster_000/XX3.frcmod', './params/cluster_000/LFI.frcmod'], custombonds=[('segid "P0" and chain "A" and resid 11 and name "SG"', 'segid "P2" and chain "C" and resid 101 and name "C12"'), ('segid "P0" and chain "A" and resid 17 and name "SG"', 'segid "P2" and chain "C" and resid 101 and name "C10"'), ('segid "P0" and chain "A" and resid 22 and name "SG"', 'segid "P2" and chain "C" and resid 101 and name "C11"')], xml_paths=['./params/gaff_combined.xml'])

parameterizeFromSpecs runs antechamber once per cluster and emits:

  • One LFI.cif for the scaffold (full GAFF2 typing).

  • Three .prepi files, one per CYS bucket (each carries ff14SB types on the standard backbone + canonical sidechain atoms, with the sulphur side staying GAFF2 to match the scaffold).

  • Three custombonds entries - one per SG-Cn thioether closure.

The .frcmod files include cross-force-field bond terms that bridge ff14SB backbone atoms (N, CT, C, O, …) to GAFF2 sidechain types (lowercase atom types) so tLeap can resolve the canonical-to-scaffold junctions.

Step 6 - Build#

amber.build(
    prepared,
    outdir="./build",
    custombonds=out.custombonds,
    topo=out.topo_paths,
    param=out.frcmod_paths,
    caps={"P0": ("none", "none")},
    ionize=False,
)
htmd.builder.amber - INFO - Detecting disulfide bonds.
htmd.builder.amber - INFO - Starting the build.
htmd.builder.amber - INFO - Finished building.
moleculekit.tools.sequencestructuralalignment - INFO - Alignment #0 was done on 12 residues: 1-12
<moleculekit.molecule.Molecule object at 0x7fd102259f10>
Molecule with 275 atoms and 1 frames
Atom field - altloc shape: (275,)
Atom field - atomtype shape: (275,)
Atom field - beta shape: (275,)
Atom field - chain shape: (275,)
Atom field - charge shape: (275,)
Atom field - coords shape: (275, 3, 1)
Atom field - element shape: (275,)
Atom field - formalcharge shape: (275,)
Atom field - insertion shape: (275,)
Atom field - masses shape: (275,)
Atom field - name shape: (275,)
Atom field - occupancy shape: (275,)
Atom field - record shape: (275,)
Atom field - resid shape: (275,)
Atom field - resname shape: (275,)
Atom field - segid shape: (275,)
Atom field - serial shape: (275,)
Atom field - virtualsite shape: (275,)
angles shape: (430, 3)
bonds shape: (283, 2)
bondtype shape: (283,)
box shape: (3, 1)
boxangles shape: (3, 1)
crystalinfo: None
dihedrals shape: (909, 4)
fileloc shape: (1, 2)
impropers shape: (67, 4)
reps: 
step shape: (1,)
time shape: (1,)
topoloc: /tmp/tmpkpq4sreb/build/structure.prmtop
viewname: structure.prmtop

caps={"P0": ("none", "none")} disables auto-capping on the peptide segment: the N-terminal CYS is bonded to LFI (no free amine for an ACE cap to attach to) and the C-terminal CYS already carries OXT from its template.

Step 7 - Verify#

import numpy as np
built = Molecule("./build/structure.prmtop")
built.read("./build/structure.pdb")

sg_idxs = np.where(built.name == "SG")[0]
lfi_c_idxs = np.where(
    (built.resname == "LFI") & np.isin(built.name, ["C10", "C11", "C12"])
)[0]
n_thioether = sum(
    1 for sg in sg_idxs for c in lfi_c_idxs if built.hasBond(int(sg), int(c))[0]
)
print("SG-Cn thioether bonds:", n_thioether)        # should be 3
SG-Cn thioether bonds: 3

Gotchas#

  • Pass the unbound scaffold SMILES (with the leaving groups). templateResidueFromSmiles recognises the leaving-group atoms as terminal heavy atoms unmatched by the structure and strips them automatically; templating the bound form (no leaving groups) doesn’t have enough atoms to disambiguate the MCS match.

  • detectNonStandardResidues buckets canonical anchors by (resname, anchor_atom, partner_resname, is_n_term, is_c_term). With three CYS in three distinct chain positions you get three distinct buckets — and downstream parameterizeFromSpecs writes one .prepi per bucket. Don’t be surprised by three CYS-derived topology files; the cluster pipeline needs them distinct because the backbone protonation differs at each position.

  • The same flow handles other “constrained peptide” scaffolds (TATA, TBMT, …) - swap LFI_SMILES for the scaffold’s SMILES and let detect emit the appropriate ScaffoldSpec + canonical-anchor renames.

See also#