Build a protein-RNA complex#

You will learn: how to build a solvated, ionised protein-RNA complex - same flow as a canonical protein build, with the RNA force field already covered by HTMD’s defaults.

Prerequisites:

HTMD installed.
Build a protein covers the canonical build steps used here.

The system#

PDB 1AUD is the NMR solution structure of the human U1A protein bound to its own 3’-UTR polyadenylation inhibition element (PIE). U1A is one of the workhorse models for studying RNP recognition: the protein binds the RNA loop nucleotides through a classic RRM β-sheet, and the RNA backbone wraps around the protein surface.

The only things that change versus tutorial 01 are (a) dropping the NMR ensemble down to a single frame, and (b) the AMBER RNA force field handles the nucleotides automatically because leaprc.RNA.OL3 is already in defaultFf().

Setup#

from moleculekit.molecule import Molecule
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.preparation import systemPrepare
from htmd.builder import amber
from htmd.builder.solvate import solvate

Copyright by Acellera Ltd. By executing you are accepting the License. In order to register, run htmd_register on your terminal.
The registration information must be valid so that it might be verified.

Step 1 - Load and pick a single frame#

mol = Molecule("1AUD")
print(f"loaded {mol.numFrames} NMR conformers, {mol.numAtoms} atoms")

mol.dropFrames(keep=0)

loaded 31 NMR conformers, 1450 atoms

NMR depositions ship as an ensemble: 1AUD has 31 conformers. The downstream prep / parameterization / build pipeline works on a single frame at a time, so we keep frame 0. Pick a different frame, or run the prep on each frame separately, depending on your study design.

Step 2 - Segment#

mol = autoSegment(mol, fields=("segid", "chain"))

moleculekit.tools.autosegment - INFO - autoSegment: break before ALA:1:A (after C:50:B)

autoSegment() walks the connectivity and assigns one segment per chain. For 1AUD the result is two segments: the 30-nt RNA construct as one strand (the residue-number jump from 30 to 33 is not a chain break - the phosphodiester bond from O3’(30) to P(33) is intact at 1.61 Å, and autoSegment’s spatial validation recognises this) and the protein as the other.

Step 3 - Prepare#

prepared, specs = systemPrepare(mol, pH=7.4)
print(f"prepared system has {prepared.numAtoms} atoms")

---- Molecule chain report ----
Chain A:
    First residue:  G      19  
    Final residue:  C      50  
Chain B:
    First residue:  ALA     1  
    Final residue:  VAL   101  
---- End of chain report ----
prepared system has 2627 atoms

moleculekit.tools.preparation - INFO - Modified residue G      19 A to RG5
moleculekit.tools.preparation - INFO - Modified residue G      20 A to RG
moleculekit.tools.preparation - INFO - Modified residue C      21 A to RC
moleculekit.tools.preparation - INFO - Modified residue A      22 A to RA
moleculekit.tools.preparation - INFO - Modified residue G      23 A to RG
moleculekit.tools.preparation - INFO - Modified residue A      24 A to RA
moleculekit.tools.preparation - INFO - Modified residue G      25 A to RG
moleculekit.tools.preparation - INFO - Modified residue U      26 A to RU
moleculekit.tools.preparation - INFO - Modified residue C      27 A to RC
moleculekit.tools.preparation - INFO - Modified residue C      28 A to RC
moleculekit.tools.preparation - INFO - Modified residue U      29 A to RU
moleculekit.tools.preparation - INFO - Modified residue U      30 A to RU
moleculekit.tools.preparation - INFO - Modified residue C      33 A to RC
moleculekit.tools.preparation - INFO - Modified residue G      34 A to RG
moleculekit.tools.preparation - INFO - Modified residue G      35 A to RG
moleculekit.tools.preparation - INFO - Modified residue G      36 A to RG
moleculekit.tools.preparation - INFO - Modified residue A      37 A to RA
moleculekit.tools.preparation - INFO - Modified residue C      38 A to RC
moleculekit.tools.preparation - INFO - Modified residue A      39 A to RA
moleculekit.tools.preparation - INFO - Modified residue U      40 A to RU
moleculekit.tools.preparation - INFO - Modified residue U      41 A to RU
moleculekit.tools.preparation - INFO - Modified residue G      42 A to RG
moleculekit.tools.preparation - INFO - Modified residue C      43 A to RC
moleculekit.tools.preparation - INFO - Modified residue A      44 A to RA
moleculekit.tools.preparation - INFO - Modified residue C      45 A to RC
moleculekit.tools.preparation - INFO - Modified residue C      46 A to RC
moleculekit.tools.preparation - INFO - Modified residue U      47 A to RU
moleculekit.tools.preparation - INFO - Modified residue G      48 A to RG
moleculekit.tools.preparation - INFO - Modified residue C      49 A to RC
moleculekit.tools.preparation - INFO - Modified residue C      50 A to RC3
moleculekit.tools.preparation - INFO - Modified residue HIS     9 B to HIE
moleculekit.tools.preparation - INFO - Modified residue HIS    30 B to HIE
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 1 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state:    HIS     9 B (pKa= 6.44)

systemPrepare() protonates the protein side chains at pH 7.4 the same way it does for a soluble protein. The RNA contributes nothing to the protonation problem (RNA backbones are uniformly deprotonated above pH ~2), but systemPrepare may emit a “dubious protonation state” warning for histidines whose pKa is close to the chosen pH - in 1AUD that’s HIS 9, sitting near the RNA backbone where the local electrostatic field shifts its pKa. Inspect those manually if your study cares about the exact charge state.

Step 4 - Solvate#

solvated = solvate(prepared, pad=12)

htmd.builder.solvate - INFO - Using water pdb file at: /home/runner/work/htmd/htmd/htmd/share/solvate/wat.pdb
htmd.builder.solvate - INFO - Replicating 8 water segments, 2 by 2 by 2
htmd.builder.solvate - INFO - 10347 water molecules were added to the system.

pad=12 extends the simulation box by 12 Å on each side of the solute’s bounding box (waters that would clash with the solute are removed, so the actual water shell can be a bit thinner than 12 Å around bulges). RNA backbones are flexible and need a bit more padding than a globular protein - pad=10 would also work for a tutorial-sized system but pad=12 is safer for the typical RNA breathing motions.

Step 5 - Build under AMBER#

amber.build(solvated, outdir="./build", ionize=True, saltconc=0.15)

htmd.builder.amber - INFO - Detecting disulfide bonds.
htmd.builder.amber - INFO - Starting the build.
htmd.builder.amber - INFO - Finished building.
htmd.builder.ionize - INFO - Adding 0 anions + 21 cations for neutralizing and 58 ions for the given salt concentration 0.15 M.
htmd.builder.amber - INFO - Starting the build.
htmd.builder.amber - INFO - Finished building.
moleculekit.writers - WARNING - Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
moleculekit.tools.sequencestructuralalignment - INFO - Alignment #0 was done on 101 residues: 32-132
htmd.builder.amber - ERROR - Error aligning nucleic for residue mapping: number of optimal alignments is larger than 9223372036854775807

<moleculekit.molecule.Molecule object at 0x7efc660fbbf0>
Molecule with 33519 atoms and 1 frames
Atom field - altloc shape: (33519,)
Atom field - atomtype shape: (33519,)
Atom field - beta shape: (33519,)
Atom field - chain shape: (33519,)
Atom field - charge shape: (33519,)
Atom field - coords shape: (33519, 3, 1)
Atom field - element shape: (33519,)
Atom field - formalcharge shape: (33519,)
Atom field - insertion shape: (33519,)
Atom field - masses shape: (33519,)
Atom field - name shape: (33519,)
Atom field - occupancy shape: (33519,)
Atom field - record shape: (33519,)
Atom field - resid shape: (33519,)
Atom field - resname shape: (33519,)
Atom field - segid shape: (33519,)
Atom field - serial shape: (33519,)
Atom field - virtualsite shape: (33519,)
angles shape: (4913, 3)
bonds shape: (33528, 2)
bondtype shape: (33528,)
box shape: (3, 1)
boxangles shape: (3, 1)
crystalinfo: None
dihedrals shape: (10724, 4)
fileloc shape: (1, 2)
impropers shape: (526, 4)
reps: 
step shape: (1,)
time shape: (1,)
topoloc: /tmp/tmp216ie4xq/build/structure.prmtop
viewname: structure.prmtop

That single call:

Loads leaprc.protein.ff14SB + leaprc.RNA.OL3 + leaprc.water.tip3p (all part of defaultFf()), so the protein, the RNA, and the water are each typed by their dedicated force field.
Adds Na⁺ counter-ions to neutralise the RNA backbone. RNA carries -1 per nucleotide, so a 30-nt construct contributes -30 in total - expect ~20-25 neutralising Na⁺ (the protein also carries some net charge that offsets a few). Then enough NaCl is added on top to reach 0.15 M.
No disulfide bonds in 1AUD; auto-detection finds nothing and moves on.

The output ./build/structure.prmtop + structure.pdb (and structure.crd, the AMBER inpcrd that setup_equilibration actually reads) is now ready for an equilibration protocol (e.g. acemd.protocols.setup_equilibration()).

Gotchas#

For an NMR ensemble, decide upfront whether you want one frame (default) or all of them. Each frame is a slightly different conformer, so running prep + build on each gives you an ensemble of starting structures for replica MD.
RNA has roughly one negative backbone charge per nucleotide, so even a small protein-RNA system needs a substantial number of neutralising Na⁺ ions on top of the 0.15 M NaCl. Don’t be surprised if Adding N cations is in the dozens.
For protein-DNA complexes the flow is identical - leaprc.DNA.bsc1 is also in defaultFf(), so a DNA-only or DNA-bound build works the same way (Molecule("1BNA") is the Dickerson dodecamer fixture used elsewhere as a reference).
If you have a modified nucleotide (5mC, m6A, …) treat it as a non-canonical residue: detect → SMILES template → parameterizeFromSpecs → amber.build, exactly as in tutorial 02. The cluster pipeline handles nucleic-acid backbones the same way it handles peptide backbones.