Build a membrane-embedded protein#
You will learn: how to fetch a bilayer-aligned protein from the OPM database, wrap a custom-composition lipid bilayer around it, and produce an AMBER prmtop for the full membrane system - all in one script.
Prerequisites:
HTMD installed.
You’ve worked through Build a protein with a ligand - this tutorial reuses the same NCAA detection / templating / parameterization pipeline.
The system#
PDB 5VBL is the apelin receptor (a GPCR, chain B) bound to a peptide agonist (chain A) containing five chain-resident non-canonical amino acids - 200, ALC, HRG, NLE, OIC. The structure also carries a co-crystallised OLC (monoolein, a monoacylglycerol lipid from crystallization), which we discard before wrapping our own bilayer. The receptor is membrane-embedded - OPM ships it with a fitted bilayer thickness of 33.4 Å - so we’ll wrap a mixed POPC / cholesterol bilayer around it.
Note
This tutorial sets minimize=0 and equilibrate=0 on buildMembrane() so the tutorial executes in seconds. The resulting membrane carries only the initial lipid placement - lipid tails will have clashes and the bilayer is not relaxed. For a production system, set minimize to a few hundred steps and equilibrate to a few nanoseconds (and run on platform="CUDA" if you have a GPU).
Setup#
from moleculekit.molecule import Molecule
from moleculekit.opm import get_opm_pdb
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
from moleculekit.tools.preparation import systemPrepare
from htmd.builder import amber
from htmd.builder.nonstandard import parameterizeFromSpecs
from htmd.builder.solvate import solvate
from htmd.membranebuilder.build_membrane import buildMembrane
import numpy as np
Copyright by Acellera Ltd. By executing you are accepting the License. In order to register, run htmd_register on your terminal.
The registration information must be valid so that it might be verified.
rdkit - INFO - Enabling RDKit 2026.03.3 jupyter extensions
Step 1 - Load the RCSB structure and align it to OPM#
mol = Molecule("5VBL")
mol.remove("water", _logger=False)
ref, thickness = get_opm_pdb("5VBL", validateElements=False)
mol.align("protein and name CA", refmol=ref, mode="structure")
print(f"bilayer thickness: {thickness} Å, atoms after removing water: {mol.numAtoms}")
bilayer thickness: 33.4 Å, atoms after removing water: 2949
moleculekit.molecule - INFO - Structural alignement gave TM-Scores of [0.99999999] and local RMSDs of [0.00049815]
The workflow is: load the structure you actually want to simulate (from RCSB or a local file), then rotate it onto OPM’s membrane-aligned orientation. get_opm_pdb() downloads OPM’s pre-aligned copy of the PDB - already rotated/translated so the bilayer center sits at z=0 and the membrane normal points along +z (HTMD strips the DUM membrane-marker pseudo-atoms before returning, since the default keep=False). align() with mode="structure" runs TM-align between the two, so the two structures don’t need to have identical atom counts (OPM often trims residues that don’t sit in the bilayer plane). The returned thickness is the OPM-fit double-leaflet thickness (33.4 Å for 5VBL). This is the alignment buildMembrane() expects from its solute= argument.
Step 2 - Detect the non-standard residues#
mol = autoSegment(mol, fields=("segid", "chain"))
specs = detectNonStandardResidues(mol)
for spec in specs:
print(spec)
ChainResidueSpec(resname='HRG', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cb00>
UniqueResidueID<resname: 'HRG', chain: 'A', resid: 8, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='ALC', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cbc0>
UniqueResidueID<resname: 'ALC', chain: 'A', resid: 9, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='GLU', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cbf0>
UniqueResidueID<resname: 'GLU', chain: 'A', resid: 10, insertion: '', segid: 'P0'>, new_resname='XX1', anchor_atom='CD', is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='LYS', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cc80>
UniqueResidueID<resname: 'LYS', chain: 'A', resid: 13, insertion: '', segid: 'P0'>, new_resname='XX2', anchor_atom='NZ', is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='OIC', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780ccb0>
UniqueResidueID<resname: 'OIC', chain: 'A', resid: 14, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='NLE', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cce0>
UniqueResidueID<resname: 'NLE', chain: 'A', resid: 15, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)
ChainResidueSpec(resname='200', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b780cd40>
UniqueResidueID<resname: '200', chain: 'A', resid: 17, insertion: '', segid: 'P0'>, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=True)
LigandSpec(resname='OLC', residue=<moleculekit.molecule.UniqueResidueID object at 0x7fb1b7831010>
UniqueResidueID<resname: 'OLC', chain: 'D', resid: 1102, insertion: '', segid: 'P3'>)
moleculekit.tools.autosegment - INFO - autoSegment: break before CYS:19:B (after 200:17:A)
Look at what detect found:
Five
ChainResidueSpecentries for the chain-resident NCAAs -HRG,ALC,OIC,NLE, and200(the C-terminal one). These are exactly the NCAAs we expect from the peptide’s chemistry.Two extra
ChainResidueSpecentries for canonical residues -GLU(resid 10) renamed toXX1withanchor_atom='CD', andLYS(resid 13) renamed toXX2withanchor_atom='NZ'. Those are the canonical residues at the two ends of an isopeptide bond: the apelin agonist is a side-chain macrocycle closed by aGLU.CD - LYS.NZγ-glutamyl / ε-lysyl crosslink. Detect saw the inter-residue bond on the side-chain atoms (not the backbone) and emitted the rename + anchor automatically, so the downstream parameterization gives each end of the isopeptide its own per-residue topology. Thecustombondslist emitted byparameterizeFromSpecs()will close the ring at build time.One
LigandSpecfor the freeOLClipid co-crystallised with the protein.
OLC is monoolein, a crystallization lipid — not part of the system we want to simulate, and we’re about to build our own POPC/cholesterol bilayer — so drop it from both the structure and the spec list:
mol.remove("resname OLC", _logger=False)
specs = [s for s in specs if s.resname != "OLC"]
Now template every chain-resident NCAA - but not GLU / LYS (they’re canonical residues whose only modification is the inter-side-chain bond, which detect handles entirely from connectivity):
smiles = {
"200": "c1cc(ccc1C[C@@H](C(=O)O)N)Cl",
"ALC": "C1CCC(CC1)C[C@@H](C=O)N",
"HRG": "C(CCNC(=N)N)C[C@@H](C=O)N",
"NLE": "CCCC[C@@H](C=O)N",
"OIC": "C1CC[C@H]2[C@@H](C1)C[C@H](N2)C=O",
}
for resname, smi in smiles.items():
mol.templateResidueFromSmiles(f'resname "{resname}"', smi, addHs=True)
prepared, specs = systemPrepare(mol, pH=7.4, detect_specs=specs)
out = parameterizeFromSpecs(
specs, prepared, outdir="./params", charge_method="gasteiger"
)
print(out)
---- Molecule chain report ----
Chain A:
First residue: LYS 1
Final residue: 200 17
Chain B:
First residue: CYS 19
Final residue: ARG 330
Chain C:
residue: ZN 1101
---- End of chain report ----
ClusterOutputs(topo_paths=['./params/cluster_000/XX1.prepi', './params/cluster_000/XX2.prepi', './params/cluster_001/HRG.prepi', './params/cluster_002/ALC.prepi', './params/cluster_003/OIC.prepi', './params/cluster_004/NLE.prepi', './params/cluster_005/200.prepi'], frcmod_paths=['./params/cluster_000/XX1.frcmod', './params/cluster_000/XX2.frcmod', './params/cluster_001/HRG.frcmod', './params/cluster_002/ALC.frcmod', './params/cluster_003/OIC.frcmod', './params/cluster_004/NLE.frcmod', './params/cluster_005/200.frcmod'], custombonds=[('segid "P0" and chain "A" and resid 13 and name "NZ"', 'segid "P0" and chain "A" and resid 10 and name "CD"')], xml_paths=['./params/gaff_combined.xml'])
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'N[C@H](C=O)CCC=O'
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NCCCC[C@H](N)C=O'
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: ZN
moleculekit.tools.preparation - INFO - Modified residue HIS 11 A to HID
moleculekit.tools.preparation - INFO - Modified residue CYS 19 B to CYX
moleculekit.tools.preparation - INFO - Modified residue ASP 75 B to ASH
moleculekit.tools.preparation - INFO - Modified residue CYS 102 B to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 181 B to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 1006 B to CYM
moleculekit.tools.preparation - INFO - Modified residue HIS 265 B to HID
moleculekit.tools.preparation - INFO - Modified residue HIS 278 B to HID
moleculekit.tools.preparation - INFO - Modified residue CYS 281 B to CYX
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 11 A (pKa= 6.54)
moleculekit.tools.preparation - WARNING - Dubious protonation state: ASP 75 B (pKa= 7.78)
moleculekit.tools.preparation - WARNING - Dubious protonation state: CYS 1039 B (pKa= 7.59)
htmd.builder._ambertools - INFO - Fixed residue 200 atom name Cl -> CL to match the input structure.
htmd.builder._ambertools - INFO - Fixed residue 200 atom name Cl -> CL to match the input structure.
Step 3 - Build the membrane around the protein#
memb = buildMembrane(
[80, 80],
ratioupper={"popc": 0.7, "chl1": 0.3},
ratiolower={"popc": 0.7, "chl1": 0.3},
minimize=0,
equilibrate=0,
platform="CPU",
outdir="./memb",
solute=prepared,
)
print(f"membrane mol: {memb.numAtoms} atoms")
membrane mol: 58670 atoms
htmd.membranebuilder.build_membrane - INFO - upper leaflet: solute occupies 24.5% of XY area; placing 86 lipids ({'popc': 60, 'chl1': 26}).
htmd.membranebuilder.build_membrane - INFO - lower leaflet: solute occupies 21.4% of XY area; placing 89 lipids ({'popc': 62, 'chl1': 27}).
htmd.membranebuilder.build_membrane - INFO - Optimized rotation for 78 lipids near solute (within 15.0 A).
htmd.membranebuilder.ringpenetration - INFO - 0 penetrating molecule(s) remaining
htmd.builder.solvate - INFO - Using water pdb file at: /home/runner/work/htmd/htmd/htmd/share/solvate/wat.pdb
htmd.builder.solvate - INFO - Replicating 4 water segments, 2 by 2 by 1
htmd.builder.solvate - INFO - 6416 water molecules were added to the system.
htmd.builder.solvate - INFO - Using water pdb file at: /home/runner/work/htmd/htmd/htmd/share/solvate/wat.pdb
htmd.builder.solvate - INFO - Replicating 4 water segments, 2 by 2 by 1
htmd.builder.solvate - INFO - 6384 water molecules were added to the system.
What the arguments do:
Argument |
Effect |
|---|---|
|
XY footprint in Å. Make it ~10 Å wider than the protein in each direction so the bilayer fully surrounds it. |
|
Per-leaflet lipid mole fractions. Different upper/lower compositions are allowed - useful for asymmetric bilayers. Run |
|
The OPM-aligned protein. |
|
Skip the OpenMM relaxation step - only the initial lipid placement runs. Set both to positive values for production. |
|
OpenMM platform for the lipid placement step. Switch to |
The returned memb is the lipid bilayer + a water buffer above and below - the protein is not included in the return; we’ll merge them in the next step.
Step 4 - Merge protein + membrane (lipids only)#
memb.remove("water", _logger=False)
system = Molecule()
system.append(prepared)
system.append(memb)
print(f"merged system: {system.numAtoms} atoms")
merged system: 26216 atoms
Two things going on here:
buildMembranealready carved the protein footprint out of the lipid placement (that’s whatsolute=prepareddoes in the previous step), so the membrane and protein occupy disjoint XY space and no additional clash-removal is needed - a direct append is enough.We drop
buildMembrane’s own water layer (the builder solvates the bilayer with awaterbuffz-buffer above and below — default 20 Å — using an internal call tosolvate) and re-solvate the full system in the next step so the water box covers the protein’s extramembrane regions too.
Step 5 - Solvate with a membrane-aware box#
lipid_mask = system.atomselect("lipid")
xy_min = system.coords[lipid_mask, :2, 0].min(axis=0)
xy_max = system.coords[lipid_mask, :2, 0].max(axis=0)
z_min = system.coords[:, 2, 0].min() - 15
z_max = system.coords[:, 2, 0].max() + 15
system = solvate(
system,
minmax=[[xy_min[0], xy_min[1], z_min],
[xy_max[0], xy_max[1], z_max]],
)
htmd.builder.solvate - INFO - Using water pdb file at: /home/runner/work/htmd/htmd/htmd/share/solvate/wat.pdb
htmd.builder.solvate - INFO - Replicating 8 water segments, 2 by 2 by 2
htmd.builder.solvate - INFO - 20579 water molecules were added to the system.
The XY box matches the lipid extent (so water doesn’t poke out past the bilayer edge), and Z extends 15 Å above and below the tallest / lowest atom in the system - enough to fully solvate the protein’s intracellular and extracellular domains. solvate will only place waters where there’s space - no waters appear inside the bilayer.
Step 6 - Build under AMBER#
amber.build(
system,
outdir="./build",
custombonds=out.custombonds,
topo=out.topo_paths,
param=out.frcmod_paths,
caps={"P0": ("none", "none")},
ionize=True,
saltconc=0.15,
)
Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'B', resid: 101, insertion: '', segid: 'P1'>
and: UniqueResidueID<resname: 'CYX', chain: 'B', resid: 180, insertion: '', segid: 'P1'>
Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'B', resid: 18, insertion: '', segid: 'P1'>
and: UniqueResidueID<resname: 'CYX', chain: 'B', resid: 321, insertion: '', segid: 'P1'>
htmd.builder.amber - INFO - Detecting disulfide bonds.
htmd.builder.builder - INFO - 2 disulfide bonds were added
htmd.builder.amber - INFO - Starting the build.
htmd.builder.amber - INFO - Finished building.
htmd.builder.ionize - INFO - Adding 6 anions + 0 cations for neutralizing and 116 ions for the given salt concentration 0.15 M.
htmd.builder.amber - INFO - Starting the build.
htmd.builder.amber - INFO - Finished building.
moleculekit.writers - WARNING - Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
moleculekit.tools.sequencestructuralalignment - INFO - Alignment #0 was done on 370 residues: 1-370
<moleculekit.molecule.Molecule object at 0x7fb1b0e3fa10>
Molecule with 87709 atoms and 1 frames
Atom field - altloc shape: (87709,)
Atom field - atomtype shape: (87709,)
Atom field - beta shape: (87709,)
Atom field - chain shape: (87709,)
Atom field - charge shape: (87709,)
Atom field - coords shape: (87709, 3, 1)
Atom field - element shape: (87709,)
Atom field - formalcharge shape: (87709,)
Atom field - insertion shape: (87709,)
Atom field - masses shape: (87709,)
Atom field - name shape: (87709,)
Atom field - occupancy shape: (87709,)
Atom field - record shape: (87709,)
Atom field - resid shape: (87709,)
Atom field - resname shape: (87709,)
Atom field - segid shape: (87709,)
Atom field - serial shape: (87709,)
Atom field - virtualsite shape: (87709,)
angles shape: (50445, 3)
bonds shape: (87711, 2)
bondtype shape: (87711,)
box shape: (3, 1)
boxangles shape: (3, 1)
crystalinfo: None
dihedrals shape: (111540, 4)
fileloc shape: (1, 2)
impropers shape: (1842, 4)
reps:
step shape: (1,)
time shape: (1,)
topoloc: /tmp/tmpd1010s97/build/structure.prmtop
viewname: structure.prmtop
A couple of notes on the arguments:
caps={"P0": ("none", "none")}- the tuple is ordered(N-cap, C-cap). The peptide inhibitor’s C-terminal NCAA (200) carries its ownOXT, so the C-cap is suppressed; we set both to"none"here for symmetry, but only the C-cap is strictly required. Real cap names elsewhere are uppercase (ACE,NME) — the library lookup is case-sensitive. If your autoSegment assigns a different segid to the inhibitor, swap"P0"for the correct value (inspectset(prepared.segid)).ionize=True, saltconc=0.15- tLeap places counter-ions and salt in the waters added bysolvatein the previous step.
The output ./build/structure.prmtop + structure.pdb is now ready for an equilibration protocol (e.g. acemd.protocols.setup_equilibration()).
Gotchas#
buildMembrane()withsolute=assumes the protein already has the bilayer centered atz=0and the membrane normal pointing along+z. If you’re not sure your structure is in that frame, align it onto an OPM reference withget_opm_pdb()(for PDBs present in OPM) oralign_to_opm()(for new structures) first.buildMembrane()returns the bilayer without the protein; merge them explicitly. Same goes for any cofactors or co-crystallised ligands.minimize=0, equilibrate=0produces a raw, unrelaxed membrane straight out of the initial lipid placement. Re-runbuildMembrane()withminimize=300, equilibrate=1(1 ns) before any production simulation; if your machine has a GPU also switchplatform="CUDA".The default lipid library (
listLipids()) covers the PA / PC / PE / PG / PS head-group families across the DA / DL / DM / DO / DP / DS / POP / SDP tail variants (so POPC, POPE, POPG, POPS, POPA, DOPC, DOPE, …, plus cholesterolchl1). There’s noPOPIand no sphingolipids in the shipped library. Mixed-composition leaflets work; if you need a lipid that isn’t listed, add a custom lipid CIF to the lipid directory (or open an issue upstream).
See also#
Build a protein with a ligand - the NCAA workflow this tutorial reuses.
System-building overview - where the membrane builder fits in the wider stack.