---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Build a membrane-embedded protein

**You will learn:** how to fetch a bilayer-aligned protein from the [OPM database](https://opm.phar.umich.edu/), wrap a custom-composition lipid bilayer around it, and produce an AMBER `prmtop` for the full membrane system - all in one script.

**Prerequisites:**
- HTMD installed.
- You've worked through {doc}`Build a protein with a ligand <02-protein-ligand>` - this tutorial reuses the same NCAA detection / templating / parameterization pipeline.

## The system

PDB `5VBL` is the **apelin receptor** (a GPCR, chain B) bound to a **peptide agonist** (chain A) containing five chain-resident non-canonical amino acids - `200`, `ALC`, `HRG`, `NLE`, `OIC`. The structure also carries a co-crystallised `OLC` (monoolein, a monoacylglycerol lipid from crystallization), which we discard before wrapping our own bilayer. The receptor is membrane-embedded - OPM ships it with a fitted bilayer thickness of 33.4 Å - so we'll wrap a mixed **POPC / cholesterol** bilayer around it.

```{note}
This tutorial sets `minimize=0` and `equilibrate=0` on {py:func}`~htmd.membranebuilder.build_membrane.buildMembrane` so the tutorial executes in seconds. The resulting membrane carries only the **initial lipid placement** - lipid tails will have clashes and the bilayer is not relaxed. For a production system, set `minimize` to a few hundred steps and `equilibrate` to a few nanoseconds (and run on `platform="CUDA"` if you have a GPU).
```

## Setup

```{code-cell} python
from moleculekit.molecule import Molecule
from moleculekit.opm import get_opm_pdb
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
from moleculekit.tools.preparation import systemPrepare
from htmd.builder import amber
from htmd.builder.nonstandard import parameterizeFromSpecs
from htmd.builder.solvate import solvate
from htmd.membranebuilder.build_membrane import buildMembrane
import numpy as np
```

```{code-cell} python
:tags: [remove-input]
from acellera_docs_theme.molstar import show3d
```

## Step 1 - Load the RCSB structure and align it to OPM

```{code-cell} python
mol = Molecule("5VBL")
mol.remove("water", _logger=False)

ref, thickness = get_opm_pdb("5VBL", validateElements=False)
mol.align("protein and name CA", refmol=ref, mode="structure")

print(f"bilayer thickness: {thickness} Å, atoms after removing water: {mol.numAtoms}")
```

```{code-cell} python
:tags: [remove-input]
show3d(mol)
```

The workflow is: load the structure you actually want to simulate (from RCSB or a local file), then rotate it onto OPM's membrane-aligned orientation. {py:func}`~moleculekit.opm.get_opm_pdb` downloads OPM's pre-aligned copy of the PDB - already rotated/translated so the bilayer center sits at `z=0` and the membrane normal points along `+z` (HTMD strips the `DUM` membrane-marker pseudo-atoms before returning, since the default `keep=False`). {py:meth}`~moleculekit.molecule.Molecule.align` with `mode="structure"` runs TM-align between the two, so the two structures don't need to have identical atom counts (OPM often trims residues that don't sit in the bilayer plane). The returned `thickness` is the OPM-fit double-leaflet thickness (33.4 Å for 5VBL). This is the alignment {py:func}`~htmd.membranebuilder.build_membrane.buildMembrane` expects from its `solute=` argument.

## Step 2 - Detect the non-standard residues

```{code-cell} python
mol = autoSegment(mol, fields=("segid", "chain"))

specs = detectNonStandardResidues(mol)
for spec in specs:
    print(spec)
```

Look at what detect found:

- Five `ChainResidueSpec` entries for the chain-resident NCAAs - `HRG`, `ALC`, `OIC`, `NLE`, and `200` (the C-terminal one). These are exactly the NCAAs we expect from the peptide's chemistry.
- **Two extra `ChainResidueSpec` entries for canonical residues** - `GLU` (resid 10) renamed to `XX1` with `anchor_atom='CD'`, and `LYS` (resid 13) renamed to `XX2` with `anchor_atom='NZ'`. Those are the canonical residues at the **two ends of an isopeptide bond**: the apelin agonist is a side-chain macrocycle closed by a `GLU.CD - LYS.NZ` γ-glutamyl / ε-lysyl crosslink. Detect saw the inter-residue bond on the side-chain atoms (not the backbone) and emitted the rename + anchor automatically, so the downstream parameterization gives each end of the isopeptide its own per-residue topology. The `custombonds` list emitted by {py:func}`~htmd.builder.nonstandard.parameterizeFromSpecs` will close the ring at build time.
- One `LigandSpec` for the free `OLC` lipid co-crystallised with the protein.

`OLC` is monoolein, a crystallization lipid — not part of the system we want to simulate, and we're about to build our own POPC/cholesterol bilayer — so drop it from both the structure and the spec list:

```{code-cell} python
mol.remove("resname OLC", _logger=False)
specs = [s for s in specs if s.resname != "OLC"]
```

Now template every chain-resident NCAA - but **not** `GLU` / `LYS` (they're canonical residues whose only modification is the inter-side-chain bond, which detect handles entirely from connectivity):

```{code-cell} python
smiles = {
    "200": "c1cc(ccc1C[C@@H](C(=O)O)N)Cl",
    "ALC": "C1CCC(CC1)C[C@@H](C=O)N",
    "HRG": "C(CCNC(=N)N)C[C@@H](C=O)N",
    "NLE": "CCCC[C@@H](C=O)N",
    "OIC": "C1CC[C@H]2[C@@H](C1)C[C@H](N2)C=O",
}
for resname, smi in smiles.items():
    mol.templateResidueFromSmiles(f'resname "{resname}"', smi, addHs=True)

prepared, specs = systemPrepare(mol, pH=7.4, detect_specs=specs)

out = parameterizeFromSpecs(
    specs, prepared, outdir="./params", charge_method="gasteiger"
)
print(out)
```

## Step 3 - Build the membrane around the protein

```{code-cell} python
memb = buildMembrane(
    [80, 80],
    ratioupper={"popc": 0.7, "chl1": 0.3},
    ratiolower={"popc": 0.7, "chl1": 0.3},
    minimize=0,
    equilibrate=0,
    platform="CPU",
    outdir="./memb",
    solute=prepared,
)
print(f"membrane mol: {memb.numAtoms} atoms")
```

What the arguments do:

| Argument | Effect |
| --- | --- |
| `[80, 80]` | XY footprint in Å. Make it ~10 Å wider than the protein in each direction so the bilayer fully surrounds it. |
| `ratioupper` / `ratiolower` | Per-leaflet lipid mole fractions. Different upper/lower compositions are allowed - useful for asymmetric bilayers. Run {py:func}`~htmd.membranebuilder.build_membrane.listLipids` to see what's available. |
| `solute=prepared` | The OPM-aligned protein. {py:func}`~htmd.membranebuilder.build_membrane.buildMembrane` carves the **membrane-spanning** protein footprint out of the lipid placement (only protein heavy atoms inside the bilayer-thickness slab are used; extramembrane domains don't affect the carve) so lipids and protein don't overlap from the start. |
| `minimize=0`, `equilibrate=0` | Skip the OpenMM relaxation step - only the initial lipid placement runs. Set both to positive values for production. |
| `platform="CPU"` | OpenMM platform for the lipid placement step. Switch to `"CUDA"` if you have a GPU. |

The returned `memb` is the **lipid bilayer + a water buffer above and below** - the protein is *not* included in the return; we'll merge them in the next step.

## Step 4 - Merge protein + membrane (lipids only)

```{code-cell} python
memb.remove("water", _logger=False)

system = Molecule()
system.append(prepared)
system.append(memb)
print(f"merged system: {system.numAtoms} atoms")
```

```{code-cell} python
:tags: [remove-input]
show3d(system)
```

Two things going on here:

- `buildMembrane` already carved the protein footprint out of the lipid placement (that's what `solute=prepared` does in the previous step), so the membrane and protein occupy disjoint XY space and no additional clash-removal is needed - a direct append is enough.
- We drop `buildMembrane`'s own water layer (the builder solvates the bilayer with a `waterbuff` z-buffer above and below — default 20 Å — using an internal call to `solvate`) and re-solvate the full system in the next step so the water box covers the protein's extramembrane regions too.

## Step 5 - Solvate with a membrane-aware box

```{code-cell} python
lipid_mask = system.atomselect("lipid")
xy_min = system.coords[lipid_mask, :2, 0].min(axis=0)
xy_max = system.coords[lipid_mask, :2, 0].max(axis=0)
z_min = system.coords[:, 2, 0].min() - 15
z_max = system.coords[:, 2, 0].max() + 15

system = solvate(
    system,
    minmax=[[xy_min[0], xy_min[1], z_min],
            [xy_max[0], xy_max[1], z_max]],
)
```

The XY box matches the lipid extent (so water doesn't poke out past the bilayer edge), and Z extends 15 Å above and below the tallest / lowest atom in the system - enough to fully solvate the protein's intracellular and extracellular domains. `solvate` will only place waters where there's space - no waters appear inside the bilayer.

```{code-cell} python
:tags: [remove-input]
show3d(system)
```

## Step 6 - Build under AMBER

```{code-cell} python
amber.build(
    system,
    outdir="./build",
    custombonds=out.custombonds,
    topo=out.topo_paths,
    param=out.frcmod_paths,
    caps={"P0": ("none", "none")},
    ionize=True,
    saltconc=0.15,
)
```

A couple of notes on the arguments:

- `caps={"P0": ("none", "none")}` - the tuple is ordered `(N-cap, C-cap)`. The peptide inhibitor's C-terminal NCAA (`200`) carries its own `OXT`, so the C-cap is suppressed; we set both to `"none"` here for symmetry, but only the C-cap is strictly required. Real cap names elsewhere are uppercase (`ACE`, `NME`) — the library lookup is case-sensitive. If your autoSegment assigns a different segid to the inhibitor, swap `"P0"` for the correct value (inspect `set(prepared.segid)`).
- `ionize=True, saltconc=0.15` - tLeap places counter-ions and salt in the waters added by `solvate` in the previous step.

The output `./build/structure.prmtop` + `structure.pdb` is now ready for an equilibration protocol (e.g. {py:func}`acemd.protocols.setup_equilibration`).

## Gotchas

- {py:func}`~htmd.membranebuilder.build_membrane.buildMembrane` with `solute=` assumes the protein already has the bilayer centered at `z=0` and the membrane normal pointing along `+z`. If you're not sure your structure is in that frame, align it onto an OPM reference with {py:func}`~moleculekit.opm.get_opm_pdb` (for PDBs present in OPM) or {py:func}`~moleculekit.opm.align_to_opm` (for new structures) first.
- {py:func}`~htmd.membranebuilder.build_membrane.buildMembrane` returns the bilayer **without** the protein; merge them explicitly. Same goes for any cofactors or co-crystallised ligands.
- `minimize=0, equilibrate=0` produces a raw, unrelaxed membrane straight out of the initial lipid placement. Re-run {py:func}`~htmd.membranebuilder.build_membrane.buildMembrane` with `minimize=300, equilibrate=1` (1 ns) before any production simulation; if your machine has a GPU also switch `platform="CUDA"`.
- The default lipid library ({py:func}`~htmd.membranebuilder.build_membrane.listLipids`) covers the **PA / PC / PE / PG / PS** head-group families across the **DA / DL / DM / DO / DP / DS / POP / SDP** tail variants (so POPC, POPE, POPG, POPS, POPA, DOPC, DOPE, ..., plus cholesterol `chl1`). There's no `POPI` and no sphingolipids in the shipped library. Mixed-composition leaflets work; if you need a lipid that isn't listed, add a custom lipid CIF to the lipid directory (or open an issue upstream).

## See also

- {doc}`Build a protein with a ligand <02-protein-ligand>` - the NCAA workflow this tutorial reuses.
- {doc}`System-building overview <../../explanation/system-building>` - where the membrane builder fits in the wider stack.