How to align structures#

Goal#

Superimpose one structure or trajectory onto a reference by minimising RMSD over a chosen atom selection.

Minimal example#

from moleculekit.molecule import Molecule
from moleculekit.util import uniformRandomRotation, molRMSD

ref = Molecule("3PTB")
mol = ref.copy()

# Apply a uniform random rotation around the centroid of mol
centroid = mol.coords[:, :, 0].mean(axis=0)
mol.rotateBy(uniformRandomRotation(), center=centroid)

ca = mol.atomselect("protein and name CA", indexes=True)
ca_ref = ref.atomselect("protein and name CA", indexes=True)
print(f"RMSD before align: {molRMSD(mol, ref, ca, ca_ref):.2f} Å")

mol.align("protein and name CA", refmol=ref)
print(f"RMSD after align:  {molRMSD(mol, ref, ca, ca_ref):.2f} Å")

The RMSD drops from several Ångström (random rotation) to essentially zero because the two structures share the same coordinates; alignment can recover the original superposition exactly.

Parameters that matter#

Parameter

Type

Default

What it does

sel

str or np.ndarray

required

Atoms in mol used for the alignment

refmol

Molecule

None (self)

Reference molecule; if None, align to mol’s first frame

refsel

str or np.ndarray

same as sel

Atoms in refmol to align onto; must have the same count as sel

frames

list or range

all frames

Which frames of mol to align

mode

str

"index"

Atom-correspondence rule. "index" pairs sel and refsel in increasing atom-index order (the two selections must yield the same atom count). "structure" uses TM-align internally to find the best structural correspondence and is robust to mismatched sequences.

Common variations#

# Structural alignment via TM-align — robust to mismatched sequences,
# does not require sel and refsel to have the same atom count
mol.align("protein", refmol=ref, mode="structure")
# Sequence-based alignment — handles mismatched residue numbering
# by first aligning sequences and then calling .align on matched residues
mol.alignBySequence(ref)

Gotchas#

  • With mode="index" (the default), align() requires the same number of atoms in sel and refsel; a mismatch raises an error.

  • With mode="structure", the correspondence is found by TM-align internally — sel and refsel can have different atom counts and the routine is robust to large conformational differences.

  • alignBySequence() handles residue count mismatches by first finding a sequence alignment, then calling align() (mode="index") on the matched residues.

  • If refmol is None, mol is aligned to its own first frame, which removes rigid-body drift across frames.

See also#