How to assign segments and chains#
Goal#
Derive segid and/or chain fields for a structure that lacks them, splitting the system into segments by following each polymer’s physical backbone.
Minimal example#
from moleculekit.molecule import Molecule
from moleculekit.tools.autosegment import autoSegment
mol = Molecule("3PTB")
mol = autoSegment(mol)
print(set(mol.segid))
How segments are decided#
A new segment starts between two consecutive residues when any of these holds: the backbone link distance exceeds the cutoff (protein C(i)–N(i+1), nucleic O3'(i)–P(i+1)), the chain or segid already in the file changes, or the polymer type changes. Water collapses into one segment, ions into another, and the remaining (“other”) molecules are split one segment per bonded molecule. Because continuity is read from coordinates, a gap in residue numbering with an intact backbone stays one segment, while a real spatial break is split.
Parameters that matter#
Parameter |
Type |
Default |
What it does |
|---|---|---|---|
|
required |
Input molecule (a copy is returned; original is unchanged) |
|
|
|
|
Restrict segmentation to this atom selection; atoms outside keep their existing |
|
|
|
Prefix for generated segment names, e.g. |
|
|
|
Which field(s) to write: any combination of |
|
|
|
Max |
|
|
|
Max |
|
|
|
Max |
|
|
|
Max |
|
|
|
Put all non-polymer, non-water, non-ion molecules into one segment instead of one per molecule |
Common variations#
# Assign segments to protein chains only
mol = autoSegment(mol, sel="protein")
# Write both chain and segid in one call
mol = autoSegment(mol, fields=("chain", "segid"))
# Lump every ligand/cofactor into a single "other" segment
mol = autoSegment(mol, single_other_segment=True)
Gotchas#
autoSegment()returns a newMolecule; it does not mutate the input.Only coordinates and atom names are needed — explicit bonds are not required (they are guessed only for the “other” bucket).
segidcan be up to 4 characters (MD force-field convention);chainis a single character (PDB convention).When writing to PDB, only the
chainfield is stored in the standard CHAIN column;segidgoes into the SEGID column, which many programs ignore.autoSegment2is deprecated and forwards toautoSegment()with aDeprecationWarning; useautoSegmentdirectly.