moleculekit.tools.autosegment module#

moleculekit.tools.autosegment.autoSegment(mol, sel='all', basename='P', spatial=True, spatialgap=4.0, fields=('segid',), field=None, _logger=True)#

Detects resid gaps in a selection and assigns incrementing segid to each fragment.

A new segment is started whenever a gap in resid numbering is found between consecutive residues in the selection (optionally confirmed by checking that the spatial distance between backbone atoms exceeds spatialgap). Water molecules are handled separately: each run of consecutive water residues forms its own segment with automatically renumbered resid values.

Use autoSegment() when the input has resid-based gaps (e.g. a PDB where residues are numbered with missing stretches). If you want to segment strictly by the covalent bond graph instead, use autoSegment2(). When you need a specific naming scheme that neither function produces, set mol.segid directly with mol.set("segid", "MY_SEG", sel="...").

Parameters:
  • mol (Molecule) – The Molecule object

  • sel (str) – Atom selection string on which to check for gaps. See more here

  • basename (str) – The basename for segment ids. For example if given ‘P’ it will name the segments ‘P1’, ‘P2’, …

  • spatial (bool) – Only considers a discontinuity in resid as a gap if matching backbone atoms of the two residues have distance larger than spatialgap Angstrom

  • spatialgap (float) – The size of a spatial gap which validates a discontinuity (A)

  • fields (list) – Fields in which to set the segments. Must be a combination of “chain”, “segid” or only one of them.

Returns:

newmol – A new Molecule object with modified segids

Return type:

Molecule object

Example

>>> newmol = autoSegment(mol, "chain B", "P", fields=("chain", "segid"))
moleculekit.tools.autosegment.autoSegment2(mol, sel='(protein or resname ACE NME)', basename='P', fields=('segid',), residgaps=False, residgaptol=1, chaingaps=True, _logger=True)#

Detects bonded segments in a selection and assigns incrementing segid to each segment.

Segments are derived from the covalent bond graph: two residues belong to the same segment if and only if they are in the same connected component of the backbone bond graph (computed from mol.bonds supplemented by distance-based guessing over backbone atoms). This is more robust than resid-gap detection (autoSegment()) for structures where resid numbering is irregular or non-continuous.

Use autoSegment2() when you want to follow connectivity rather than resid sequence. Use autoSegment() when the input has predictable resid-based gaps. When you need a specific naming scheme, set mol.segid directly with mol.set("segid", "MY_SEG", sel="...").

Parameters:
  • mol (Molecule object) – The Molecule object

  • sel (str) – Atom selection string on which to check for gaps. See more here

  • basename (str) – The basename for segment ids. For example if given ‘P’ it will name the segments ‘P1’, ‘P2’, …

  • fields (tuple of strings) – Field to fix. Can be “segid” (default) or any other Molecule field or combinations thereof.

  • residgaps (bool) – Set to True to consider gaps in resids as structural gaps. Set to False to ignore resids

  • residgaptol (int) – Above what resid difference is considered a gap. I.e. with residgaptol 1, 235-233 = 2 > 1 hence is a gap. We set default to 2 because in many PDBs single residues are missing in the proteins without any gaps.

  • chaingaps (bool) – Set to True to consider changes in chains as structural gaps. Set to False to ignore chains

Returns:

newmol – A new Molecule object with modified segids

Return type:

Molecule object

Example

>>> newmol = autoSegment2(mol)