Build a membrane-embedded protein#

You will learn: how to fetch a bilayer-aligned protein from the OPM database, wrap a custom-composition lipid bilayer around it, and produce an AMBER prmtop for the full membrane system - all in one script.

Prerequisites:

  • HTMD installed.

  • You’ve worked through Build a protein with a ligand - this tutorial reuses the same NCAA detection / templating / parameterization pipeline.

The system#

PDB 5VBL is the apelin receptor (a GPCR, chain B) bound to a peptide agonist (chain A) containing five chain-resident non-canonical amino acids - 200, ALC, HRG, NLE, OIC. The structure also carries a co-crystallised OLC (monoolein, a monoacylglycerol lipid from crystallization), which we discard before wrapping our own bilayer. The receptor is membrane-embedded - OPM ships it with a fitted bilayer thickness of 33.4 Å - so we’ll wrap a mixed POPC / cholesterol bilayer around it.

Note

This tutorial sets minimize=0 and equilibrate=0 on buildMembrane() so the tutorial executes in seconds. The resulting membrane carries only the initial lipid placement - lipid tails will have clashes and the bilayer is not relaxed. For a production system, set minimize to a few hundred steps and equilibrate to a few nanoseconds (and run on platform="CUDA" if you have a GPU).

Setup#

from moleculekit.molecule import Molecule
from moleculekit.opm import get_opm_pdb
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
from moleculekit.tools.preparation import systemPrepare
from htmd.builder import amber
from htmd.builder.nonstandard import parameterizeFromSpecs
from htmd.builder.solvate import solvate
from htmd.membranebuilder.build_membrane import buildMembrane
import numpy as np
Copyright by Acellera Ltd. By executing you are accepting the License. In order to register, run htmd_register on your terminal.
The registration information must be valid so that it might be verified.
rdkit - INFO - Enabling RDKit 2026.03.3 jupyter extensions

Step 1 - Load the RCSB structure and align it to OPM#

mol = Molecule("5VBL")
mol.remove("water", _logger=False)

ref, thickness = get_opm_pdb("5VBL", validateElements=False)
mol.align("protein and name CA", refmol=ref, mode="structure")

print(f"bilayer thickness: {thickness} Å, atoms after removing water: {mol.numAtoms}")
bilayer thickness: 33.4 Å, atoms after removing water: 2949
moleculekit.molecule - INFO - Structural alignement gave TM-Scores of [0.99999999] and local RMSDs of [0.00049815]

The workflow is: load the structure you actually want to simulate (from RCSB or a local file), then rotate it onto OPM’s membrane-aligned orientation. get_opm_pdb() downloads OPM’s pre-aligned copy of the PDB - already rotated/translated so the bilayer center sits at z=0 and the membrane normal points along +z (HTMD strips the DUM membrane-marker pseudo-atoms before returning, since the default keep=False). align() with mode="structure" runs TM-align between the two, so the two structures don’t need to have identical atom counts (OPM often trims residues that don’t sit in the bilayer plane). The returned thickness is the OPM-fit double-leaflet thickness (33.4 Å for 5VBL). This is the alignment buildMembrane() expects from its solute= argument.

Step 2 - Detect the non-standard residues#

mol = autoSegment(mol, fields=("segid", "chain"))

specs = detectNonStandardResidues(mol)
for spec in specs:
    print(spec)
ChainResidueSpec(resname='HRG', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cb00>
UniqueResidueID<resname: 'HRG', chain: 'A', resid: 8, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='ALC', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cbc0>
UniqueResidueID<resname: 'ALC', chain: 'A', resid: 9, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='GLU', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cbf0>
UniqueResidueID<resname: 'GLU', chain: 'A', resid: 10, insertion: '', segid: 'P0'>, new_resname='XX1', anchor_atom='CD', is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='LYS', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cc80>
UniqueResidueID<resname: 'LYS', chain: 'A', resid: 13, insertion: '', segid: 'P0'>, new_resname='XX2', anchor_atom='NZ', is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='OIC', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780ccb0>
UniqueResidueID<resname: 'OIC', chain: 'A', resid: 14, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='NLE', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cce0>
UniqueResidueID<resname: 'NLE', chain: 'A', resid: 15, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='200', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cd40>
UniqueResidueID<resname: '200', chain: 'A', resid: 17, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=True)
LigandSpec(resname='OLC', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b7831010>
UniqueResidueID<resname: 'OLC', chain: 'D', resid: 1102, insertion: '', segid: 'P3'>)
moleculekit.tools.autosegment - INFO - autoSegment: break before CYS:19:B (after 200:17:A)

Look at what detect found:

  • Five ChainResidueSpec entries for the chain-resident NCAAs - HRG, ALC, OIC, NLE, and 200 (the C-terminal one). These are exactly the NCAAs we expect from the peptide’s chemistry.

  • Two extra ChainResidueSpec entries for canonical residues - GLU (resid 10) renamed to XX1 with anchor_atom='CD', and LYS (resid 13) renamed to XX2 with anchor_atom='NZ'. Those are the canonical residues at the two ends of an isopeptide bond: the apelin agonist is a side-chain macrocycle closed by a GLU.CD - LYS.NZ γ-glutamyl / ε-lysyl crosslink. Detect saw the inter-residue bond on the side-chain atoms (not the backbone) and emitted the rename + anchor automatically, so the downstream parameterization gives each end of the isopeptide its own per-residue topology. The custombonds list emitted by parameterizeFromSpecs() will close the ring at build time.

  • One LigandSpec for the free OLC lipid co-crystallised with the protein.

OLC is monoolein, a crystallization lipid — not part of the system we want to simulate, and we’re about to build our own POPC/cholesterol bilayer — so drop it from both the structure and the spec list:

mol.remove("resname OLC", _logger=False)
specs = [s for s in specs if s.resname != "OLC"]

Now template every chain-resident NCAA - but not GLU / LYS (they’re canonical residues whose only modification is the inter-side-chain bond, which detect handles entirely from connectivity):

smiles = {
    "200": "c1cc(ccc1C[C@@H](C(=O)O)N)Cl",
    "ALC": "C1CCC(CC1)C[C@@H](C=O)N",
    "HRG": "C(CCNC(=N)N)C[C@@H](C=O)N",
    "NLE": "CCCC[C@@H](C=O)N",
    "OIC": "C1CC[C@H]2[C@@H](C1)C[C@H](N2)C=O",
}
for resname, smi in smiles.items():
    mol.templateResidueFromSmiles(f'resname "{resname}"', smi, addHs=True)

prepared, specs = systemPrepare(mol, pH=7.4, detect_specs=specs)

out = parameterizeFromSpecs(
    specs, prepared, outdir="./params", charge_method="gasteiger"
)
print(out)
---- Molecule chain report ----
Chain A:
    First residue:  LYS     1  
    Final residue:  200    17  
Chain B:
    First residue:  CYS    19  
    Final residue:  ARG   330  
Chain C:
          residue:  ZN   1101  
---- End of chain report ----
ClusterOutputs(topo_paths=['./params/cluster_000/XX1.prepi', './params/cluster_000/XX2.prepi', './params/cluster_001/HRG.prepi', './params/cluster_002/ALC.prepi', './params/cluster_003/OIC.prepi', './params/cluster_004/NLE.prepi', './params/cluster_005/200.prepi'], frcmod_paths=['./params/cluster_000/XX1.frcmod', './params/cluster_000/XX2.frcmod', './params/cluster_001/HRG.frcmod', './params/cluster_002/ALC.frcmod', './params/cluster_003/OIC.frcmod', './params/cluster_004/NLE.frcmod', './params/cluster_005/200.frcmod'], custombonds=[('segid "P0" and chain "A" and resid 13 and name "NZ"', 'segid "P0" and chain "A" and resid 10 and name "CD"')], xml_paths=['./params/gaff_combined.xml'])
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'N[C@H](C=O)CCC=O'
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NCCCC[C@H](N)C=O'
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: ZN
moleculekit.tools.preparation - INFO - Modified residue HIS    11 A to HID
moleculekit.tools.preparation - INFO - Modified residue CYS    19 B to CYX
moleculekit.tools.preparation - INFO - Modified residue ASP    75 B to ASH
moleculekit.tools.preparation - INFO - Modified residue CYS   102 B to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   181 B to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS  1006 B to CYM
moleculekit.tools.preparation - INFO - Modified residue HIS   265 B to HID
moleculekit.tools.preparation - INFO - Modified residue HIS   278 B to HID
moleculekit.tools.preparation - INFO - Modified residue CYS   281 B to CYX
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state:    HIS    11 A (pKa= 6.54)
moleculekit.tools.preparation - WARNING - Dubious protonation state:    ASP    75 B (pKa= 7.78)
moleculekit.tools.preparation - WARNING - Dubious protonation state:    CYS  1039 B (pKa= 7.59)
htmd.builder._ambertools - INFO - Fixed residue 200 atom name Cl -> CL to match the input structure.
htmd.builder._ambertools - INFO - Fixed residue 200 atom name Cl -> CL to match the input structure.

Step 3 - Build the membrane around the protein#

memb = buildMembrane(
    [80, 80],
    ratioupper={"popc": 0.7, "chl1": 0.3},
    ratiolower={"popc": 0.7, "chl1": 0.3},
    minimize=0,
    equilibrate=0,
    platform="CPU",
    outdir="./memb",
    solute=prepared,
)
print(f"membrane mol: {memb.numAtoms} atoms")
membrane mol: 58670 atoms
htmd.membranebuilder.build_membrane - INFO - upper leaflet: solute occupies 24.5% of XY area; placing 86 lipids ({'popc': 60, 'chl1': 26}).
htmd.membranebuilder.build_membrane - INFO - lower leaflet: solute occupies 21.4% of XY area; placing 89 lipids ({'popc': 62, 'chl1': 27}).
htmd.membranebuilder.build_membrane - INFO - Optimized rotation for 78 lipids near solute (within 15.0 A).
htmd.membranebuilder.ringpenetration - INFO - 0 penetrating molecule(s) remaining
htmd.builder.solvate - INFO - Using water pdb file at: /home/runner/work/htmd/htmd/htmd/share/solvate/wat.pdb
htmd.builder.solvate - INFO - Replicating 4 water segments, 2 by 2 by 1
htmd.builder.solvate - INFO - 6416 water molecules were added to the system.
htmd.builder.solvate - INFO - Using water pdb file at: /home/runner/work/htmd/htmd/htmd/share/solvate/wat.pdb
htmd.builder.solvate - INFO - Replicating 4 water segments, 2 by 2 by 1
htmd.builder.solvate - INFO - 6384 water molecules were added to the system.

What the arguments do:

Argument

Effect

[80, 80]

XY footprint in Å. Make it ~10 Å wider than the protein in each direction so the bilayer fully surrounds it.

ratioupper / ratiolower

Per-leaflet lipid mole fractions. Different upper/lower compositions are allowed - useful for asymmetric bilayers. Run listLipids() to see what’s available.

solute=prepared

The OPM-aligned protein. buildMembrane() carves the membrane-spanning protein footprint out of the lipid placement (only protein heavy atoms inside the bilayer-thickness slab are used; extramembrane domains don’t affect the carve) so lipids and protein don’t overlap from the start.

minimize=0, equilibrate=0

Skip the OpenMM relaxation step - only the initial lipid placement runs. Set both to positive values for production.

platform="CPU"

OpenMM platform for the lipid placement step. Switch to "CUDA" if you have a GPU.

The returned memb is the lipid bilayer + a water buffer above and below - the protein is not included in the return; we’ll merge them in the next step.

Step 4 - Merge protein + membrane (lipids only)#

memb.remove("water", _logger=False)

system = Molecule()
system.append(prepared)
system.append(memb)
print(f"merged system: {system.numAtoms} atoms")
merged system: 26216 atoms

Two things going on here:

  • buildMembrane already carved the protein footprint out of the lipid placement (that’s what solute=prepared does in the previous step), so the membrane and protein occupy disjoint XY space and no additional clash-removal is needed - a direct append is enough.

  • We drop buildMembrane’s own water layer (the builder solvates the bilayer with a waterbuff z-buffer above and below — default 20 Å — using an internal call to solvate) and re-solvate the full system in the next step so the water box covers the protein’s extramembrane regions too.

Step 5 - Solvate with a membrane-aware box#

lipid_mask = system.atomselect("lipid")
xy_min = system.coords[lipid_mask, :2, 0].min(axis=0)
xy_max = system.coords[lipid_mask, :2, 0].max(axis=0)
z_min = system.coords[:, 2, 0].min() - 15
z_max = system.coords[:, 2, 0].max() + 15

system = solvate(
    system,
    minmax=[[xy_min[0], xy_min[1], z_min],
            [xy_max[0], xy_max[1], z_max]],
)
htmd.builder.solvate - INFO - Using water pdb file at: /home/runner/work/htmd/htmd/htmd/share/solvate/wat.pdb
htmd.builder.solvate - INFO - Replicating 8 water segments, 2 by 2 by 2
htmd.builder.solvate - INFO - 20579 water molecules were added to the system.

The XY box matches the lipid extent (so water doesn’t poke out past the bilayer edge), and Z extends 15 Å above and below the tallest / lowest atom in the system - enough to fully solvate the protein’s intracellular and extracellular domains. solvate will only place waters where there’s space - no waters appear inside the bilayer.

Step 6 - Build under AMBER#

amber.build(
    system,
    outdir="./build",
    custombonds=out.custombonds,
    topo=out.topo_paths,
    param=out.frcmod_paths,
    caps={"P0": ("none", "none")},
    ionize=True,
    saltconc=0.15,
)
Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'B', resid: 101, insertion: '', segid: 'P1'>
                   and: UniqueResidueID<resname: 'CYX', chain: 'B', resid: 180, insertion: '', segid: 'P1'>

Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'B', resid: 18, insertion: '', segid: 'P1'>
                   and: UniqueResidueID<resname: 'CYX', chain: 'B', resid: 321, insertion: '', segid: 'P1'>
htmd.builder.amber - INFO - Detecting disulfide bonds.
htmd.builder.builder - INFO - 2 disulfide bonds were added
htmd.builder.amber - INFO - Starting the build.
htmd.builder.amber - INFO - Finished building.
htmd.builder.ionize - INFO - Adding 6 anions + 0 cations for neutralizing and 116 ions for the given salt concentration 0.15 M.
htmd.builder.amber - INFO - Starting the build.
htmd.builder.amber - INFO - Finished building.
moleculekit.writers - WARNING - Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
moleculekit.tools.sequencestructuralalignment - INFO - Alignment #0 was done on 370 residues: 1-370
<moleculekit.molecule.Molecule object at 0x7fb1b0e3fa10>
Molecule with 87709 atoms and 1 frames
Atom field - altloc shape: (87709,)
Atom field - atomtype shape: (87709,)
Atom field - beta shape: (87709,)
Atom field - chain shape: (87709,)
Atom field - charge shape: (87709,)
Atom field - coords shape: (87709, 3, 1)
Atom field - element shape: (87709,)
Atom field - formalcharge shape: (87709,)
Atom field - insertion shape: (87709,)
Atom field - masses shape: (87709,)
Atom field - name shape: (87709,)
Atom field - occupancy shape: (87709,)
Atom field - record shape: (87709,)
Atom field - resid shape: (87709,)
Atom field - resname shape: (87709,)
Atom field - segid shape: (87709,)
Atom field - serial shape: (87709,)
Atom field - virtualsite shape: (87709,)
angles shape: (50445, 3)
bonds shape: (87711, 2)
bondtype shape: (87711,)
box shape: (3, 1)
boxangles shape: (3, 1)
crystalinfo: None
dihedrals shape: (111540, 4)
fileloc shape: (1, 2)
impropers shape: (1842, 4)
reps: 
step shape: (1,)
time shape: (1,)
topoloc: /tmp/tmpd1010s97/build/structure.prmtop
viewname: structure.prmtop

A couple of notes on the arguments:

  • caps={"P0": ("none", "none")} - the tuple is ordered (N-cap, C-cap). The peptide inhibitor’s C-terminal NCAA (200) carries its own OXT, so the C-cap is suppressed; we set both to "none" here for symmetry, but only the C-cap is strictly required. Real cap names elsewhere are uppercase (ACE, NME) — the library lookup is case-sensitive. If your autoSegment assigns a different segid to the inhibitor, swap "P0" for the correct value (inspect set(prepared.segid)).

  • ionize=True, saltconc=0.15 - tLeap places counter-ions and salt in the waters added by solvate in the previous step.

The output ./build/structure.prmtop + structure.pdb is now ready for an equilibration protocol (e.g. acemd.protocols.setup_equilibration()).

Gotchas#

  • buildMembrane() with solute= assumes the protein already has the bilayer centered at z=0 and the membrane normal pointing along +z. If you’re not sure your structure is in that frame, align it onto an OPM reference with get_opm_pdb() (for PDBs present in OPM) or align_to_opm() (for new structures) first.

  • buildMembrane() returns the bilayer without the protein; merge them explicitly. Same goes for any cofactors or co-crystallised ligands.

  • minimize=0, equilibrate=0 produces a raw, unrelaxed membrane straight out of the initial lipid placement. Re-run buildMembrane() with minimize=300, equilibrate=1 (1 ns) before any production simulation; if your machine has a GPU also switch platform="CUDA".

  • The default lipid library (listLipids()) covers the PA / PC / PE / PG / PS head-group families across the DA / DL / DM / DO / DP / DS / POP / SDP tail variants (so POPC, POPE, POPG, POPS, POPA, DOPC, DOPE, …, plus cholesterol chl1). There’s no POPI and no sphingolipids in the shipped library. Mixed-composition leaflets work; if you need a lipid that isn’t listed, add a custom lipid CIF to the lipid directory (or open an issue upstream).

See also#