How to assign segments and chains#

Goal#

Derive segid and/or chain fields for a structure that lacks them, using gap detection to split continuous segments automatically.

Minimal example#

from moleculekit.molecule import Molecule
from moleculekit.tools.autosegment import autoSegment

mol = Molecule("3PTB")
mol = autoSegment(mol)
print(set(mol.segid))

Parameters that matter#

Parameter

Type

Default

What it does

mol

Molecule

required

Input molecule (a copy is returned; original is unchanged)

sel

str

"all"

Restrict gap detection to this atom selection

basename

str

"P"

Prefix for generated segment names, e.g. "P""P0", "P1", …

spatial

bool

True

Treat a residue-numbering gap as a real gap only if Cα distance > spatialgap Å

spatialgap

float

4.0

Distance threshold in Å for spatial gap detection

Common variations#

# Assign segments to protein chains only
mol = autoSegment(mol, sel="protein")

Gotchas#

  • autoSegment() returns a new Molecule; it does not mutate the input.

  • segid can be up to 4 characters (MD force-field convention); chain is a single character (PDB convention).

  • Auto-assignment is topology-driven and can fail on structures with non-contiguous or missing residue numbers — inspect the result before use.

  • When writing to PDB, only the chain field is stored in the standard CHAIN column; segid goes into the SEGID column, which many programs ignore.

See also#