moleculekit.distance module#

moleculekit.distance.calculate_contacts(mol, sel1, sel2, periodic, threshold=4)#

Calculate atom contacts within a distance threshold for each frame.

For every frame of mol, finds all pairs of atoms (one from sel1 and one from sel2) whose interatomic distance is below threshold.

Parameters:
  • mol (Molecule) – The Molecule (single or multi-frame) whose coordinates are used.

  • sel1 (ndarray) – A 1D boolean atom-selection mask (length mol.numAtoms) selecting the first group of atoms.

  • sel2 (ndarray) – A 1D boolean atom-selection mask (length mol.numAtoms) selecting the second group of atoms. If it is equal to sel1, self-contacts within the selection are computed.

  • periodic (str | None) – How to treat periodic boundary conditions when computing distances. If None, no periodic wrapping is applied (the molecule box is ignored). If "chains", the minimum image convention is applied across atoms of different chains. If "selections", it is applied between the two selections. When not None, mol must contain non-zero box dimensions for every frame.

  • threshold (float) – Distance cutoff in Angstrom below which a pair of atoms is considered in contact. Default is 4.

Returns:

contacts – One entry per frame of mol. Each entry is a 2D array of shape (N, 2) and dtype uint32 containing the atom-index pairs in contact for that frame.

Return type:

list

Raises:

RuntimeError – If periodic is not None but the molecule has no valid box dimensions, if the number of box frames does not match the number of coordinate frames, or if periodic is not one of None, "chains" or "selections".

moleculekit.distance.cdist(coords1, coords2)#

Compute the pairwise Euclidean distances between two sets of points.

Parameters:
  • coords1 (ndarray) – A 2D array of shape (N, D) with the coordinates of the first set of points.

  • coords2 (ndarray) – A 2D array of shape (M, D) with the coordinates of the second set of points. The second dimension D must match that of coords1.

Returns:

distances – A 2D array of shape (N, M) and dtype float32 where element [i, j] is the Euclidean distance between coords1[i] and coords2[j].

Return type:

ndarray

Examples

>>> distances = cdist(coords1, coords2)
moleculekit.distance.find_clashes(mol, sel1=None, sel2=None, overlap=0.6, exclude_bonded=True, exclude_14=True, guess_bonds=True)#

Find pairs of atoms that sterically clash with each other.

A clash is defined as a pair of atoms whose interatomic distance is less than r_vdw_1 + r_vdw_2 - overlap where VdW radii come from moleculekit.periodictable. Uses the bundled cKDTree (ported from SciPy) for fast neighbor lookup.

Parameters:
  • mol (Molecule) – The molecule to analyze.

  • sel1 (str | ndarray | None) – First selection (atom-selection string, boolean mask, or integer index array). If None, all atoms are used.

  • sel2 (str | ndarray | None) – Second selection (atom-selection string, boolean mask, or integer index array). If None, uses sel1 (self-clashes).

  • overlap (float) – How much VdW overlap is tolerated before flagging as a clash, in Angstroms. Default 0.6 – i.e. atoms clash when they overlap by more than 0.6 Å of their combined VdW radii. Set to 0 for strict contact (any overlap counts), or negative for looser definitions.

  • exclude_bonded (bool) – If True, 1-2 (directly bonded) and 1-3 (angle) neighbors are excluded from the clash search. Default True.

  • exclude_14 (bool) – If True, 1-4 (dihedral) neighbors are also excluded. Default True.

  • guess_bonds (bool) – If True, supplements mol.bonds with moleculekit’s distance/covalent-radius based bond guesser. This catches inter-residue peptide bonds, disulfides, etc. that are often absent from mol.bonds on PDB-loaded structures. Set to False if mol.bonds is already complete (e.g. for systems built from a topology file) to skip the guessing overhead and avoid false positives from overlapping atoms. Default True.

Returns:

  • clashes (numpy.ndarray of shape (N, 2), dtype int) – Pairs of atom indices that clash. Pairs are ordered so the first index is always < the second. Empty array if no clashes.

  • distances (numpy.ndarray of shape (N,), dtype float32) – Distance (Å) for each clash pair.

  • overlaps (numpy.ndarray of shape (N,), dtype float32) – Overlap amount (r_vdw_1 + r_vdw_2) - distance for each pair. Pairs are returned sorted by overlap (most severe first).

Examples

>>> mol = Molecule("3ptb")
>>> clashes, distances, overlaps = find_clashes(mol)
>>> for (a, b), d, o in zip(clashes, distances, overlaps):
...     print(f"{mol.name[a]}({a}) <-> {mol.name[b]}({b}): "
...           f"d={d:.2f} overlap={o:.2f}")
moleculekit.distance.pdist(coords)#

Compute the pairwise Euclidean distances within a single set of points.

Parameters:

coords (ndarray) – A 2D array of shape (N, D) with the coordinates of the points.

Returns:

distances – A 1D array of length N * (N - 1) / 2 and dtype float32 containing the condensed upper-triangular pairwise distances. Use squareform() to convert it to a full (N, N) distance matrix.

Return type:

ndarray

Examples

>>> distances = pdist(coords)
moleculekit.distance.squareform(distances)#

Convert a condensed pairwise distance vector into a square distance matrix.

Parameters:

distances (ndarray) – A 1D condensed distance vector of length N * (N - 1) / 2, such as the one produced by pdist().

Returns:

matrix – A 2D symmetric distance matrix of shape (N, N) with a zero diagonal.

Return type:

ndarray

Examples

>>> matrix = squareform(pdist(coords))