---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Build a bicyclic peptide

**You will learn:** how to build a peptide whose backbone is threaded onto a small-molecule **scaffold** through three side-chain crosslinks, exemplified by PDB `8QFZ` chain B - a 12-mer cyclic peptide whose three cysteines are thioether-linked to an LFI scaffold.

**Prerequisites:**
- HTMD installed.
- You've worked through {doc}`Build a stapled peptide <04-stapled-peptide>` - this tutorial extends the same scaffolded-NCAA pattern.

```{note}
The workflow below is **identical** to {doc}`Build a protein with a ligand <02-protein-ligand>` - the only change is the single SMILES you pass to {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromSmiles` (for the LFI scaffold). {py:func}`~moleculekit.tools.nonstandard_residues.detectNonStandardResidues` reads the three `SG-Cn` thioether bonds and the three CYS chain-positions from the input structure's connectivity on its own, and {py:func}`~htmd.builder.nonstandard.parameterizeFromSpecs` emits a `custombonds` list and topology / parameter files which you still pass explicitly to {py:func}`~htmd.builder.amber.build` — same plumbing as tutorial 02.
```

## What the bicycle is

PDB `8QFZ` chain B is a 12-mer peptide containing three cysteine residues - **CYS11 (N-terminal), CYS17 (mid-chain), CYS22 (C-terminal)** - whose sulphur atoms (`SG`) are each thioether-bonded to one of the three anchor carbons (`C10`, `C11`, `C12`) of a small heterocyclic scaffold residue called **`LFI`** (1,3,5-tris(3-bromopropanoyl)hexahydro-1,3,5-triazine). Each `SG-Cn` closure replaces an LFI bromine, so the templating SMILES carries the unbound form with the three `Br` leaving groups.

The combined topology makes the peptide **bicyclic**: the backbone plus the three SG-Cn crosslinks define two non-trivial loops. In the build flow this means:

1. {py:func}`~moleculekit.tools.nonstandard_residues.detectNonStandardResidues` emits one {py:class}`~moleculekit.tools.nonstandard_residues.ScaffoldSpec` (for `LFI`) plus three {py:class}`~moleculekit.tools.nonstandard_residues.ChainResidueSpec` entries (for the three `CYS`), each with a *different* `new_resname` because the three CYS sit in three distinct chain-position buckets (N-terminal, mid-chain, C-terminal).
2. {py:func}`~htmd.builder.nonstandard.parameterizeFromSpecs` produces one `LFI.cif` for the scaffold plus one `.prepi` per CYS bucket, and three entries in `custombonds` - one per `SG-Cn` thioether closure.

```{note}
This tutorial **skips solvation and ionisation** so the build runs in seconds and the focus stays on the scaffolding. For a production run, solvate first with {py:func}`~htmd.builder.solvate.solvate` (and keep `ionize=True` on the build). If you want implicit-solvent dynamics downstream, pass `gbsa=True` to {py:func}`~htmd.builder.amber.build` — this only sets the GB-compatible radii on the prmtop; the GB model itself is enabled by the MD engine at run time.
```

## Setup

```{code-cell} python
from moleculekit.molecule import Molecule
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
from moleculekit.tools.preparation import systemPrepare
from htmd.builder import amber
from htmd.builder.nonstandard import parameterizeFromSpecs
```

```{code-cell} python
:tags: [remove-input]
from acellera_docs_theme.molstar import show3d
```

## Step 1 - Load and segment

```{code-cell} python
mol = Molecule("8QFZ")
mol.filter("chain B")
mol = autoSegment(mol, fields=("segid", "chain"))
```

```{code-cell} python
:tags: [remove-input]
show3d(mol)
```

## Step 2 - Detect

```{code-cell} python
specs = detectNonStandardResidues(mol)
for spec in specs:
    print(spec)
```

You should see one `ScaffoldSpec(resname='LFI', ...)` and three `ChainResidueSpec(resname='CYS', ..., new_resname='XX*')` entries. The three CYS residues each get a **different** `new_resname` so the downstream parameterization can write a distinct topology per residue. Two reasons for the rename:

1. **They're not canonical anymore.** Each one carries an extra `SG-Cn` thioether bond to the scaffold, so the standard ff14SB `CYS` template no longer matches - the sulphur valence is different, and ff14SB wouldn't know what to do at the new bond. Renaming to a non-canonical bucket lifts these residues out of the built-in `CYS` library and lets `parameterizeFromSpecs` write a per-residue topology that combines ff14SB backbone types with GAFF2 sulphur-side typing.
2. **Each sits in a different chain position.** `CYS11` is N-terminal, `CYS17` is mid-chain, `CYS22` is C-terminal - so the backbone hydrogen counts and formal charges differ between them. The detect step deduplicates by `(resname, anchor_atom, partner_resname, is_n_term, is_c_term)`; giving the three CYS the same `new_resname` would collapse them into one bucket and lose the chain-position distinction.

The `HG` hydrogen on each CYS sulphur is dropped at build time by `amber.build` (the CYS anchor variants that ship with HTMD's bond library have no `HG`); the rename plus the `custombonds` entry are what tell the builder which CYS variant to apply.

## Step 3 - Template the LFI scaffold from SMILES

```{code-cell} python
LFI_SMILES = "C1N(CN(CN1C(=O)CCBr)C(=O)CCBr)C(=O)CCBr"
mol.templateResidueFromSmiles("resname LFI", LFI_SMILES, addHs=True)
```

The SMILES describes the **unbound LFI** with all three bromopropanoyl arms intact. `templateResidueFromSmiles` matches the central triazinane skeleton against the residue in the structure, sets bond orders and formal charges on the matched atoms, then **strips the three unmatched terminal `Br` atoms** automatically - the same mechanism that strips a terminal `-OH` for mid-chain amino acids. What remains is the bound LFI form with three carbons (`C10`, `C11`, `C12`) primed for the thioether closures.

The three CYS residues are canonical and don't need a SMILES template.

## Step 4 - Prepare

```{code-cell} python
prepared, specs = systemPrepare(mol, pH=7.4, detect_specs=specs)
```

PDB2PQR protonates the canonical residues normally. The three CYS specs from step 2 carry the rename + `HG`-drop instructions, which `systemPrepare` applies at the end of the prep. The bond-capture mechanism preserves the three pre-existing `SG-Cn` bonds across the PDB2PQR pass.

## Step 5 - Parameterize

```{code-cell} python
out = parameterizeFromSpecs(
    specs,
    prepared,
    outdir="./params",
    charge_method="gasteiger",
)
print(out)
```

`parameterizeFromSpecs` runs antechamber once per cluster and emits:

- One `LFI.cif` for the scaffold (full GAFF2 typing).
- Three `.prepi` files, one per CYS bucket (each carries ff14SB types on the standard backbone + canonical sidechain atoms, with the sulphur side staying GAFF2 to match the scaffold).
- Three `custombonds` entries - one per `SG-Cn` thioether closure.

The `.frcmod` files include **cross-force-field bond terms** that bridge ff14SB backbone atoms (`N`, `CT`, `C`, `O`, ...) to GAFF2 sidechain types (lowercase atom types) so tLeap can resolve the canonical-to-scaffold junctions.

## Step 6 - Build

```{code-cell} python
amber.build(
    prepared,
    outdir="./build",
    custombonds=out.custombonds,
    topo=out.topo_paths,
    param=out.frcmod_paths,
    caps={"P0": ("none", "none")},
    ionize=False,
)
```

`caps={"P0": ("none", "none")}` disables auto-capping on the peptide segment: the N-terminal CYS is bonded to LFI (no free amine for an ACE cap to attach to) and the C-terminal CYS already carries `OXT` from its template.

## Step 7 - Verify

```{code-cell} python
import numpy as np
built = Molecule("./build/structure.prmtop")
built.read("./build/structure.pdb")

sg_idxs = np.where(built.name == "SG")[0]
lfi_c_idxs = np.where(
    (built.resname == "LFI") & np.isin(built.name, ["C10", "C11", "C12"])
)[0]
n_thioether = sum(
    1 for sg in sg_idxs for c in lfi_c_idxs if built.hasBond(int(sg), int(c))[0]
)
print("SG-Cn thioether bonds:", n_thioether)        # should be 3
```

## Gotchas

- Pass the **unbound** scaffold SMILES (with the leaving groups). `templateResidueFromSmiles` recognises the leaving-group atoms as terminal heavy atoms unmatched by the structure and strips them automatically; templating the *bound* form (no leaving groups) doesn't have enough atoms to disambiguate the MCS match.
- `detectNonStandardResidues` buckets canonical anchors by `(resname, anchor_atom, partner_resname, is_n_term, is_c_term)`. With three CYS in three distinct chain positions you get **three** distinct buckets — and downstream `parameterizeFromSpecs` writes one `.prepi` per bucket. Don't be surprised by three CYS-derived topology files; the cluster pipeline needs them distinct because the backbone protonation differs at each position.
- The same flow handles other "constrained peptide" scaffolds (TATA, TBMT, ...) - swap `LFI_SMILES` for the scaffold's SMILES and let detect emit the appropriate `ScaffoldSpec` + canonical-anchor renames.

## See also

- {doc}`Build a stapled peptide <04-stapled-peptide>` - the simpler case of a direct NCAA-NCAA crosslink with no scaffold.
- {doc}`Build a cyclic peptide <05-cyclic-peptide>` - head-to-tail cyclisation as another macrocycle topology.
- {doc}`System-building overview <../../explanation/system-building>` - the conceptual map of the whole stack.