---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Build a protein-RNA complex

**You will learn:** how to build a solvated, ionised protein-RNA complex - same flow as a canonical protein build, with the RNA force field already covered by HTMD's defaults.

**Prerequisites:**
- HTMD installed.
- {doc}`Build a protein <01-protein>` covers the canonical build steps used here.

## The system

PDB `1AUD` is the **NMR solution structure of the human U1A protein bound to its own 3'-UTR polyadenylation inhibition element (PIE)**. U1A is one of the workhorse models for studying RNP recognition: the protein binds the RNA loop nucleotides through a classic RRM β-sheet, and the RNA backbone wraps around the protein surface.

The only things that change versus {doc}`tutorial 01 <01-protein>` are (a) dropping the NMR ensemble down to a single frame, and (b) the AMBER RNA force field handles the nucleotides automatically because `leaprc.RNA.OL3` is already in {py:func}`~htmd.builder.amber.defaultFf`.

## Setup

```{code-cell} python
from moleculekit.molecule import Molecule
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.preparation import systemPrepare
from htmd.builder import amber
from htmd.builder.solvate import solvate
```

```{code-cell} python
:tags: [remove-input]
from acellera_docs_theme.molstar import show3d
```

## Step 1 - Load and pick a single frame

```{code-cell} python
mol = Molecule("1AUD")
print(f"loaded {mol.numFrames} NMR conformers, {mol.numAtoms} atoms")

mol.dropFrames(keep=0)
```

```{code-cell} python
:tags: [remove-input]
show3d(mol)
```

NMR depositions ship as an **ensemble**: 1AUD has 31 conformers. The downstream prep / parameterization / build pipeline works on a single frame at a time, so we keep frame 0. Pick a different frame, or run the prep on each frame separately, depending on your study design.

## Step 2 - Segment

```{code-cell} python
mol = autoSegment(mol, fields=("segid", "chain"))
```

{py:func}`~moleculekit.tools.autosegment.autoSegment` walks the connectivity and assigns one segment per chain. For 1AUD the result is two segments: the 30-nt RNA construct as one strand (the residue-number jump from 30 to 33 is not a chain break - the phosphodiester bond from O3'(30) to P(33) is intact at 1.61 Å, and autoSegment's spatial validation recognises this) and the protein as the other.

## Step 3 - Prepare

```{code-cell} python
prepared, specs = systemPrepare(mol, pH=7.4)
print(f"prepared system has {prepared.numAtoms} atoms")
```

{py:func}`~moleculekit.tools.preparation.systemPrepare` protonates the protein side chains at pH 7.4 the same way it does for a soluble protein. The RNA contributes nothing to the protonation problem (RNA backbones are uniformly deprotonated above pH ~2), but `systemPrepare` may emit a *"dubious protonation state"* warning for histidines whose pKa is close to the chosen pH - in 1AUD that's HIS 9, sitting near the RNA backbone where the local electrostatic field shifts its pKa. Inspect those manually if your study cares about the exact charge state.

## Step 4 - Solvate

```{code-cell} python
solvated = solvate(prepared, pad=12)
```

`pad=12` extends the simulation box by 12 Å on each side of the solute's bounding box (waters that would clash with the solute are removed, so the *actual* water shell can be a bit thinner than 12 Å around bulges). RNA backbones are flexible and need a bit more padding than a globular protein - `pad=10` would also work for a tutorial-sized system but `pad=12` is safer for the typical RNA breathing motions.

## Step 5 - Build under AMBER

```{code-cell} python
amber.build(solvated, outdir="./build", ionize=True, saltconc=0.15)
```

That single call:

- Loads `leaprc.protein.ff14SB` + `leaprc.RNA.OL3` + `leaprc.water.tip3p` (all part of {py:func}`~htmd.builder.amber.defaultFf`), so the protein, the RNA, and the water are each typed by their dedicated force field.
- Adds **Na⁺ counter-ions** to neutralise the RNA backbone. RNA carries -1 per nucleotide, so a 30-nt construct contributes -30 in total - expect ~20-25 neutralising Na⁺ (the protein also carries some net charge that offsets a few). Then enough NaCl is added on top to reach 0.15 M.
- No disulfide bonds in 1AUD; auto-detection finds nothing and moves on.

The output `./build/structure.prmtop` + `structure.pdb` (and `structure.crd`, the AMBER inpcrd that `setup_equilibration` actually reads) is now ready for an equilibration protocol (e.g. {py:func}`acemd.protocols.setup_equilibration`).

## Gotchas

- For an NMR ensemble, decide upfront whether you want one frame (default) or all of them. Each frame is a slightly different conformer, so running prep + build on each gives you an ensemble of starting structures for replica MD.
- RNA has **roughly one negative backbone charge per nucleotide**, so even a small protein-RNA system needs a substantial number of neutralising Na⁺ ions on top of the 0.15 M NaCl. Don't be surprised if `Adding N cations` is in the dozens.
- For protein-DNA complexes the flow is identical - `leaprc.DNA.bsc1` is also in {py:func}`~htmd.builder.amber.defaultFf`, so a DNA-only or DNA-bound build works the same way (`Molecule("1BNA")` is the Dickerson dodecamer fixture used elsewhere as a reference).
- If you have a modified nucleotide (5mC, m6A, ...) treat it as a non-canonical residue: detect → SMILES template → `parameterizeFromSpecs` → `amber.build`, exactly as in {doc}`tutorial 02 <02-protein-ligand>`. The cluster pipeline handles nucleic-acid backbones the same way it handles peptide backbones.

## See also

- {doc}`Build a protein <01-protein>` - the canonical build this tutorial mirrors.
- {doc}`Build a protein with a ligand <02-protein-ligand>` - what to add when the nucleic acid carries non-canonical modifications.
- {doc}`System-building overview <../../explanation/system-building>` - where the nucleic-acid force fields fit in the wider stack.