The atom-selection language#
Moleculekit ships a VMD-inspired atom-selection language that lets you describe
subsets of atoms in a Molecule using a concise, readable syntax. The same
selection string is accepted wherever an atom selection is expected — by
atomselect(), filter(), remove(), copy(), set(), wrap(), align(), and every
other method that takes a sel argument.
What a selection produces#
Every selection evaluates to a boolean mask — a NumPy array of bool with
length mol.numAtoms, where True marks selected atoms. You can also ask for
an array of integer indices instead:
from moleculekit.molecule import Molecule
mol = Molecule("3ptb")
# Boolean mask (default)
mask = mol.atomselect("protein and backbone")
print(mask.dtype, mask.shape) # bool (numAtoms,)
# Integer indices
idx = mol.atomselect("resname BEN", indexes=True)
print(idx) # array of uint32 atom indices
The mask can be used everywhere a string is accepted — pass it directly to
filter, copy, etc. to skip re-parsing (faster when reusing the same
selection many times):
prot_mask = mol.atomselect("protein")
mol_prot = mol.copy(sel=prot_mask) # reuses precomputed mask
Keyword selections#
The following keywords select entire chemical classes based on residue-name lookups and element checks:
Keyword |
What it selects |
|---|---|
|
All protein residues (canonical amino acids) |
|
All nucleic acid residues (DNA and RNA) |
|
Water molecules ( |
|
Common lipid residues |
|
Common monatomic ions |
|
Protein backbone atoms ( |
|
Protein sidechain atoms (non-backbone, non-hydrogen heavy atoms) |
|
All atoms with |
|
All non-hydrogen atoms |
|
Every atom in the molecule |
|
No atoms |
Per-atom field comparisons#
You can test any per-atom field against a value or list of values:
# Single value
mol.atomselect("resname ALA")
mol.atomselect("chain A")
mol.atomselect("element C")
# List of values (space-separated, no commas)
mol.atomselect("name CA N C O")
mol.atomselect("resname ALA GLY VAL")
mol.atomselect("chain A B")
Fields available for selection strings:
Field |
Description |
|---|---|
|
Atom name |
|
Residue name |
|
Residue sequence number |
|
Zero-based atom index |
|
Chain identifier |
|
Segment identifier |
|
Element symbol |
|
Occupancy value |
|
B-factor |
|
Partial charge |
|
Insertion code |
Comparison operators and ranges#
Numeric fields support comparison operators and range syntax:
# Comparisons
mol.atomselect("resid > 50")
mol.atomselect("occupancy >= 0.5")
mol.atomselect("beta < 20")
# Range (inclusive on both ends)
mol.atomselect("resid 40 to 60")
mol.atomselect("index 0 to 99")
# Negation with !=
mol.atomselect("chain != B")
Boolean composition#
Combine selections with and, or, not, and parentheses:
mol.atomselect("protein and chain A")
mol.atomselect("resname ALA or resname GLY")
mol.atomselect("not water")
mol.atomselect("(protein and backbone) or (resname BEN and not hydrogen)")
Operator precedence from highest to lowest: not > and > or. Use
parentheses whenever the precedence could be ambiguous.
Distance operators#
Distance-based selections are evaluated at the current frame (mol.frame):
# All atoms within 5 Å of the ligand (including the ligand itself)
mol.atomselect("within 5 of resname BEN")
# All atoms within 5 Å of the ligand, excluding the ligand
mol.atomselect("exwithin 5 of resname BEN")
same … as operators#
Expand a selection to cover complete residues, chains, or bond-graph fragments:
# All atoms in any residue that has at least one backbone atom within 5 Å
mol.atomselect("same residue as (backbone and within 5 of resname BEN)")
# All atoms in any chain that contains a titratable histidine
mol.atomselect("same chain as resname HID HIE HIP")
# All atoms in the same covalently bonded fragment as the ligand
mol.atomselect("same fragment as resname BEN")
fragment groups atoms by connected components of the bond graph. For this to
work correctly, mol.bonds must be populated (see
Guess bonds).
Cheat-sheet#
Expression |
Example |
Meaning |
|---|---|---|
keyword |
|
Predefined chemical class |
|
|
Field equals value |
|
|
Field equals any of the values |
|
|
Numeric range (inclusive) |
|
|
Numeric comparison |
|
|
Boolean logic |
|
|
Distance from selection |
|
|
Distance, excluding selection |
|
|
Whole-residue/chain/fragment expansion |
Mask and index substitution#
Any method that accepts a selection string also accepts:
A boolean NumPy array of length
mol.numAtoms— passed through without parsing, ideal for reusing expensive selections.An integer NumPy array of atom indices — converted automatically.
import numpy as np
# Precompute once, reuse many times
prot_mask = mol.atomselect("protein")
mol.copy(sel=prot_mask)
mol.filter(prot_mask)
mol.set("beta", 0, sel=prot_mask)
Note that precomputed masks and index arrays go stale if the number or order
of atoms changes (e.g. after filter, remove, or adding hydrogens). Always
recompute after such operations.
What is not supported#
VMD’s
index from < Nrange variant for loading trajectory subsets is not exposed.Complex regex on atom names (VMD’s
=~regex operator) is not implemented.The
pbwithinperiodic-boundary-aware distance selection is not available; usewrapfirst if working with periodic systems.
Further reading#
Tutorial: Atom selection
How-to: Select atoms