moleculekit.molecule module#
- class moleculekit.molecule.Molecule(filename=None, name=None, **kwargs)#
Bases:
objectClass to read, write and manipulate molecular structures.
Molecule is the main class of MoleculeKit. It stores all the relevant molecular information for a system and allows many different operations to be performed on it, including adding or removing atoms, bonds, calculating various properties of the system, and visualizing the molecule. Molecule can read a large variety of molecular file formats and convert between them, therefore it can also be used as a molecular file format converter.
- Parameters:
filename (
str|list|None) – Optionally load a PDB file from the specified file. If there’s no file and the value is four characters long assume it is a PDB accession code and try to download from the RCSB web server.name (
str|None) – Give a name to the Molecule that will be used for visualization
Examples
>>> mol = Molecule('./test/data/dhfr/dhfr.pdb') >>> mol = Molecule('3PTB') >>> print(mol) Molecule with 1701 atoms and 1 frames Atom field - altloc shape: (1701,) Atom field - atomtype shape: (1701,) ...
- addBond(idx1, idx2, btype)#
Add a new bond to a pair of atoms
If the bond already exists it will only update it’s type
- Parameters:
Examples
>>> mol.addBond(13, 18, "2") # Adds a double bond
- align(sel, refmol=None, refsel=None, frames=None, matchingframes=False, mode='index', _logger=True)#
Align conformations.
Align a given set of frames of this molecule to either the current active frame of this molecule (mol.frame) or the current frame of a different reference molecule. To align to any frame other than the current active one modify the refmol.frame property before calling this method.
- Parameters:
sel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. Atom selection string for aligning. See more hererefmol (
Molecule|None) – Optionally pass a reference Molecule on which to align. If None is given, it will align on the first frame of the same Moleculerefsel (
str|ndarray|None) – An atom selection string, a boolean mask, or an integer index array. Atom selection for the refmol if one is given. Default: same as sel. See more hereframes (
list|range|ndarray|None) – A list of frames which to align. By default it will align all frames of the Moleculematchingframes (
bool) – If set to True it will align the selected frames of this molecule to the corresponding frames of the refmol. This requires both molecules to have the same number of frames.mode (
str) – Options are (‘index’, ‘structure’). Setting to ‘index’ will align two structures on the atoms selected in sel and refsel in increasing order of their indices. Meaning that if sel is name CA and resid 5 3 and refsel is name CA and resid 7 8, assuming that resid 3 comes before 5, it will align the CA or resid 3 to resid 7 in refmol and 5 to 8 instead of 5-7, 3-8 as one might expect from the atomselection strings. Setting mode to ‘structure’ will perform pure structural alignment regardless of atom order using the TM-Align method.
Examples
>>> mol=tryp.copy() >>> mol.align('protein') >>> mol.align('name CA', refmol=Molecule('3PTB'))
- alignBySequence(ref, molseg=None, refseg=None, molsel='all', refsel='all', nalignfragment=1, returnAlignments=False, maxalignments=1)#
Aligns the Molecule to a reference Molecule by their longest sequence alignment
- Parameters:
ref (
Molecule) – The reference Molecule to which we want to alignmolseg (
str|None) – The segment of this Molecule we want to align. If None it will be guessed.refseg (
str|None) – The segment of ref we want to align to. If None it will be guessed.molsel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. The atom selection of this Molecule we want to alignrefsel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. The atom selection of ref we want to align tonalignfragment (
int) – The number of fragments used for the alignment.returnAlignments (
bool) – Return all alignments as a list of Moleculesmaxalignments (
int) – The maximum number of alignments we want to produce
- Returns:
mols – If returnAlignments is True it returns a list of Molecules each containing a different alignment. Otherwise it modifies the current Molecule with the best single alignment.
- Return type:
- altloc: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numAtoms,)']#
The alternative location flag of the atoms if read from a PDB.
- angles: Annotated[ndarray[tuple[Any, ...], dtype[uint32]], 'Shape: (numAngles, 3)']#
Atom triplets corresponding to angle terms.
- append(mol, collisions=False, coldist=1.3, removesel='all', invertcollisions=False)#
Append a molecule at the end of the current molecule
- Parameters:
mol (
Molecule) – Target Molecule which to append to the end of the current Moleculecollisions (
bool) – If set to True it will remove residues of mol which collide with atoms of this Molecule object.coldist (
float) – Collision distance in Angstrom between atoms of the two molecules. Anything closer will be considered a collision.removesel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. Atomselection for atoms to be removed from the passed molecule in case of collisions.invertcollisions (
bool) – If invertcollisions is set to True it will remove residues of this Molecule which collide with atoms of the passed mol molecule.
Example
>>> mol=tryp.copy() >>> mol.filter("not resname BEN") array([1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637, 1638], dtype=int32) >>> lig=tryp.copy() >>> lig.filter("resname BEN") array([ 0, 1, 2, ..., 1698, 1699, 1700], dtype=int32) >>> mol.append(lig)
- appendFrames(mol)#
Appends the frames of another Molecule object to this object.
- Parameters:
mol (
Molecule) – A Molecule object.
- atomselect(sel, indexes=False, strict=False, fileBonds=True, guessBonds=True, _debug=False)#
Get a boolean mask or the indexes of a set of selected atoms
- Parameters:
sel (
str|ndarray|None) – Either an atom selection string (see more here), a boolean mask of lengthnumAtoms, or an integer array of atom indices. Non-string inputs short-circuit the selection engine entirely: the array is returned (or converted to the requested form) without any parsing, which lets you reuse a precomputed selection. The same trick works for any otherMoleculemethod that takes an atom selection string (copy,filter,remove,get, etc.) — pass a precomputed boolean mask or index array in place of the string and the call skips re-parsing, which is significantly faster when the same selection is reused many times. The mask or indices must match the current state of the molecule though: any change to the number or order of atoms (e.g. afterfilter,remove,append, sorting, or adding/removing hydrogens) makes them stale and they will silently refer to the wrong atoms.indexes (
bool) – If True returns the indexes instead of a bitmapstrict (
bool) – If True it will raise an error if no atoms were selected.fileBonds (
bool) – If True will use bonds read from files.guessBonds (
bool) – If True will use guessed bonds.
- Returns:
asel – Either a boolean mask of selected atoms or their indexes
- Return type:
Examples
>>> mol = Molecule("3ptb") >>> mol.atomselect('resname BEN') array([False, False, False, ..., False, False, False], dtype=bool) >>> mask = mol.resname == "BEN" # equivalent to 'resname BEN' >>> mol.atomselect(mask, indexes=True) # boolean mask -> indices array([1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637, 1638], dtype=uint32) >>> mask = (mol.resname == "BEN") & (mol.name == "C4") # equivalent to 'resname BEN and name C4' >>> mol.atomselect(mask) array([False, False, False, ..., False, False, False]) >>> assert np.array_equal(mol.atomselect(mask), mol.atomselect("resname BEN and name C4")) >>> mol.atomselect(np.array([0, 1, 2])) # integer indices -> boolean mask array([ True, True, True, ..., False, False, False]) >>> mol2 = mol.copy() >>> _ = mol2.filter(mol2.resname == "BEN") # same short-circuit works for any sel-taking method >>> mol2.numAtoms 9 >>> mol.copy(sel=mol.resname == "BEN").numAtoms # and for copy(sel=...) 9
- atomtype: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numAtoms,)']#
The atom type of each atom.
- beta: Annotated[ndarray[tuple[Any, ...], dtype[float32]], 'Shape: (numAtoms,)']#
The beta factor value of each atom.
- bonds: Annotated[ndarray[tuple[Any, ...], dtype[uint32]], 'Shape: (numBonds, 2)']#
Atom pairs corresponding to bond terms.
- bondtype: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numBonds,)']#
The type of each bond in Molecule.bonds if available.
- box: Annotated[ndarray[tuple[Any, ...], dtype[float32]], 'Shape: (3, numFrames)']#
The box dimensions of the molecule.
- boxangles: Annotated[ndarray[tuple[Any, ...], dtype[float32]], 'Shape: (3, numFrames)']#
The box angles of the molecule.
- property boxvectors: Annotated[ndarray, 'Shape: (3, 3, numFrames), dtype: float64']#
The box vectors of the Molecule
- center(loc=(0, 0, 0), sel='all')#
Moves the geometric center of the Molecule to a given location
- Parameters:
- Returns:
translation – 3D coordinates of the translation applied to the Molecule
- Return type:
Examples
>>> mol=tryp.copy() >>> mol.center() >>> mol.center([10, 10, 10], 'name CA')
- chain: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numAtoms,)']#
The chain name of each atom.
- charge: Annotated[ndarray[tuple[Any, ...], dtype[float32]], 'Shape: (numAtoms,)']#
The charge of each atom.
- coords: Annotated[ndarray[tuple[Any, ...], dtype[float32]], 'Shape: (numAtoms, 3, numFrames)']#
The coordinates of the atoms.
- copy(frames=None, sel=None)#
Create a copy of the Molecule object
- Parameters:
- Returns:
newmol – A copy of the object
- Return type:
- crystalinfo: dict#
A dictionary containing crystallographic information. It has fields [‘sGroup’, ‘numcopies’, ‘rotations’, ‘translations’]
- deleteBonds(sel, inter=True)#
Deletes all bonds that contain atoms in sel or between atoms in sel.
- Parameters:
sel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. Atom selection string of atoms whose bonds will be deleted. See more hereinter (
bool) – When True it will delete also bonds between atoms in sel with bonds to atoms outside of sel. When False it will only delete bonds between atoms in sel.
- dihedrals: Annotated[ndarray[tuple[Any, ...], dtype[uint32]], 'Shape: (numDihedrals, 4)']#
Atom quadruplets corresponding to dihedral terms.
- dropFrames(drop=None, keep=None)#
Removes trajectory frames from the Molecule
- Parameters:
drop (
int|list|ndarray|None) – Index of frame, or list of frame indexes which we want to drop (and keep all others). By default it will remove all frames from the Molecule.keep (
int|list|ndarray|str|None) – Index of frame, or list of frame indexes which we want to keep (and drop all others).
Examples
>>> mol = Molecule('1sb0') >>> mol.dropFrames(keep=[1,2]) >>> mol.numFrames == 2 True >>> mol.dropFrames(drop=[0]) >>> mol.numFrames == 1 True
- element: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numAtoms,)']#
The element of each atom.
- empty(numAtoms, numFrames=0)#
Creates an empty molecule of numAtoms atoms.
- Parameters:
Example
>>> newmol = Molecule().empty(100)
- extendResidueFromSmiles(sel, extension_smiles=None, target_atom_sel=None, new_smiles=None, sanitizeSmiles=True, minimize=False, _logger=True)#
Extend a residue with a SMILES string
Two modes are supported (mutually exclusive):
Extension SMILES mode: provide
extension_smiles(with a dummy atom*) andtarget_atom_selto attach a moiety at a specific atom.New SMILES mode: provide
new_smileswith the complete SMILES of the modified molecule. MCS matching is used to identify unchanged atoms and generate 3D coordinates for the new ones.
This function requires that the residue already has bond orders assigned and is protonated, which can be achieved by using templateResidueFromSmiles if needed.
- Parameters:
sel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. The atom selection of the residue which we want to extendextension_smiles (
str|None) – The SMILES string of the extension moiety with a dummy atom*target_atom_sel (
str|ndarray|None) – An atom selection string, a boolean mask, or an integer index array. The atom selection of the target atom to which the extension will be attached (required with extension_smiles)new_smiles (
str|None) – Complete SMILES of the modified molecule (alternative to extension_smiles)sanitizeSmiles (
bool) – If True the SMILES string will be sanitizedminimize (
bool) – If True and OpenMM is available, run a soft-potential energy minimization of the residue against its surroundings after insertion.
Examples
>>> mol = Molecule('3ptb') >>> mol.templateResidueFromSmiles("resname BEN", "[NH2+]=C(N)c1ccccc1", addHs=True) >>> mol.extendResidueFromSmiles("resname BEN", extension_smiles="*C(C)(C)C", target_atom_sel="resname BEN and name H6")
Or equivalently using the full modified SMILES:
>>> mol = Molecule('3ptb') >>> mol.templateResidueFromSmiles("resname BEN", "[NH2+]=C(N)c1ccccc1", addHs=True) >>> mol.extendResidueFromSmiles("resname BEN", new_smiles="[NH2+]=C(N)c1cc(C(C)(C)C)ccc1")
- fileloc: Annotated[list, 'Shape: (numFrames, 2)']#
The location of the files used to read this Molecule
- filter(sel, _logger=True)#
Removes all atoms not included in the selection
This modifies the current Molecule in-place.
- Parameters:
sel (
str|ndarray) – Atom selection string, or a boolean array (one element per atom) flagging the atoms to keep. See more here- Returns:
removed – An array of all atoms which did not belong to sel and were removed from the Molecule object
- Return type:
Examples
>>> mol=tryp.copy() >>> mol.filter('protein')
- formalcharge: Annotated[ndarray[tuple[Any, ...], dtype[int32]], 'Shape: (numAtoms,)']#
The formal charge of each atom.
- property frame: int#
The currently active frame of the Molecule on which methods will be applied.
- Returns:
frame – The index of the currently active frame.
- Return type:
- Raises:
RuntimeError – If the active frame is out of range for the current number of frames.
- static fromDict(moldict)#
Create a Molecule from a dictionary representation.
This is the inverse of
toDict().
- static fromOpenMMTopology(topology, positions)#
Converts an OpenMM topology and positions to a Molecule object
- Parameters:
topology (
Topology) – The OpenMM topology to convertpositions (
Quantity) – The positions of the atoms in the topology
Examples
>>> from openmm import app >>> from openmm import unit >>> pdbfile = app.PDBFile("3ptb.pdb") >>> mol = Molecule.fromOpenMMTopology(pdbfile.topology, pdbfile.positions)
- static fromRDKitMol(rmol)#
Converts an RDKit molecule to a Molecule object
- Parameters:
rmol (
Mol) – The RDKit molecule to convert
Examples
>>> from rdkit import Chem >>> from rdkit.Chem import AllChem >>> rmol = Chem.MolFromSmiles("C1CCCCC1") >>> rmol = Chem.AddHs(rmol) >>> AllChem.EmbedMolecule(rmol) >>> AllChem.MMFFOptimizeMolecule(rmol) >>> mol = Molecule.fromRDKitMol(rmol)
- property fstep#
The frame-step of the trajectory
- get(field, sel=None, fileBonds=True, guessBonds=True)#
Retrieve a specific PDB field based on the selection
- Parameters:
field (
str) – The field we want to get. To see a list of all available fields do print(Molecule._atom_and_coord_fields). The special value ‘index’ returns the 0-based atom indices of the selected atoms.sel (
str|ndarray|None) – An atom selection string, a boolean mask, or an integer index array. Atom selection string for which atoms we want to get the field from. Default all. See more herefileBonds (
bool) – If True will use bonds read from files.guessBonds (
bool) – If True will use guessed bonds. Both fileBonds and guessBonds can be combined.
- Returns:
vals – Array of values of field for all atoms in the selection. If field is ‘index’ it returns the 0-based indices of the selected atoms.
- Return type:
Examples
>>> mol=tryp.copy() >>> mol.get('resname') array(['ILE', 'ILE', 'ILE', ..., 'HOH', 'HOH', 'HOH'], dtype=object) >>> mol.get('resname', sel='resid 158') array(['LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU'], dtype=object)
- getCenter(sel='all', com=False)#
Get the center of an atom selection
- Parameters:
- Returns:
center – The center of mass or geometric center of the selection
- Return type:
- getDihedral(atom_quad)#
Get the value of a dihedral angle.
- Parameters:
atom_quad (
list) – Four atom indexes corresponding to the atoms defining the dihedral- Returns:
angle – The angle in radians
- Return type:
Examples
>>> mol.getDihedral([0, 5, 8, 12])
- getNeighbors(idx, bonds=None)#
Returns all atoms bonded to a specific atom
- getResidues(fields=('resid', 'insertion', 'chain'), sel='all', return_idx=True)#
Get unique ids for the residues of the Molecule
- Parameters:
fields (
tuple|ndarray) – An array of Molecule attributes. Once a change in any of the fields happens, a new ID will be created in residues. The default fields are (“resid”, “insertion”, “chain”) which means that a new residue ID will be created if the resid or insertion or chain changes from the previous atom to the next one.sel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. Atomselection for which to return the residues. Default is “all”. See more herereturn_idx (
bool) – If set to True, the method will return the indices of each unique residue
- Returns:
residues (
numpy.ndarray) – An array of unique ids for the residuesidx (
list) – A list of arrays, each containing the indices corresponding to the unique values in residues. Will only be returned if return_idx is set to True.
Examples
>>> mol = Molecule('5zmz') >>> residues, idx = mol.getResidues() >>> residues array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4]) >>> idx [array([0, 1, 2, 3, 4, 5, 6, 7]), array([ 8, 9, 10, 11, 12, 13, 14, 15, 16]), array([17, 18, 19, 20, 21, 22, 23, 24]), array([25, 26, 27, 28, 29]), array([30])]
- getSequence(one_letter=True, dict_key='chain', return_idx=False, sel='all', _logger=True)#
Return the aminoacid sequence of the Molecule.
- Parameters:
one_letter (
bool) – Whether to return one-letter or three-letter AA codes. There should be only one atom per residue.dict_key (
str|None) – If None, the function will return a dictionary with keys “protein” and “nucleic” (if they exist) and the concatenated sequence as the value. If “chain” or “segid” is passed, the function will return a dictionary with the sequence of each chain or segment.return_idx (
bool) – If True, the function also returns the indexes of the atoms of each residue in the sequencesel (
str|ndarray|None) – Atomselection for which to return the sequence
- Returns:
sequence – The primary sequence as a dictionary.
- Return type:
Examples
>>> mol = Molecule("3PTB") >>> mol.getSequence() {'A': 'IVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEGNEQFISASKSIVHPSYNSNTLNNDIMLIKLKSAASLNSRVASISLPTSCASAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILSDSSCKSAYPGQITSNMFCAGYLEGGKDSCQGDSGGPVVCSGKLQGIVSWGSGCAQKNKPGVYTKVCNYVSWIKQTIASN'} >>> mol.getSequence(sel="resid 16 to 50") {'A': 'IVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQ'} >>> mol = Molecule("1LKK") >>> seq = mol.getSequence(one_letter=False, dict_key="chain") >>> seq.keys() dict_keys(['A', 'B']) >>> seq['B'] ['PTR', 'GLU', 'GLU', 'ILE'] >>> seq = mol.getSequence(one_letter=True, dict_key="chain") >>> seq['B'] 'XEEI' >>> seq = mol.getSequence(one_letter=True, dict_key="segid") >>> seq.keys() dict_keys(['1', '2']) >>> seq, idx = mol.getSequence(return_idx=True) >>> idx['B'][-1] # The atom indexes of the last residue in chain B array([1718, 1719, 1720, 1721, 1722, 1723, 1724, 1725, 1726, 1727, 1728, 1729, 1730, 1731, 1732, 1733, 1734, 1735, 1736, 1737])
- guessBonds(rdkit=False)#
Guess the bonds of the Molecule and store them in-place.
Computes the bonds (and bond types) from the atom coordinates of the currently active frame and overwrites the bonds and bondtype fields of this Molecule with the result.
- Parameters:
rdkit (
bool) – If True, use RDKit to guess bonds, which also assigns bond orders. If False, use the distance-based bond guesser and assign all bonds an unknown (‘un’) bond type.
Examples
>>> mol = Molecule("3PTB") >>> mol.guessBonds()
- hasBond(idx1, idx2)#
Checks if the Molecule has a bond between two atom indexes
- impropers: Annotated[ndarray[tuple[Any, ...], dtype[uint32]], 'Shape: (numImpropers, 4)']#
Atom quadruplets corresponding to improper dihedral terms.
- insert(mol, index, collisions=False, coldist=1.3, removesel='all', invertcollisions=False)#
Insert the atoms of one molecule into another at a specific index.
This modifies the current Molecule in-place.
- Parameters:
mol (
Molecule) – Molecule to be insertedindex (
int) – The atom index at which the passed molecule will be insertedcollisions (
bool) – If set to True it will remove residues which collide with atoms of the other molecule.coldist (
float) – Collision distance in Angstrom between atoms of the two molecules. Anything closer will be considered a collision.removesel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. Atomselection restricting which atoms are considered when detecting collisions. When invertcollisions is False it selects atoms of the passed mol; when invertcollisions is True it selects atoms of this Molecule.invertcollisions (
bool) – If False (default), residues of the passed mol which collide with atoms of this Molecule are removed before insertion. If True, residues of this Molecule which collide with atoms of the passed mol are removed instead.
Example
>>> mol=tryp.copy() >>> mol.numAtoms 1701 >>> mol.insert(tryp, 0) >>> mol.numAtoms 3402
- insertion: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numAtoms,)']#
The insertion flag of the atoms if read from a PDB.
- masses: Annotated[ndarray[tuple[Any, ...], dtype[float32]], 'Shape: (numAtoms,)']#
The mass of each atom.
- moveBy(vector, sel=None)#
Move a selection of atoms by a given vector.
Alias of
translateBy().- Parameters:
Examples
>>> mol=tryp.copy() >>> mol.moveBy([3, 45 , -8])
- mutateResidue(sel, newres, reconstruct=True, rotamer_mode='best', minimize=False)#
Mutate a residue, optionally reconstructing the side-chain.
When reconstruct is True (the default) the old side-chain is removed and a new one is built from ideal geometry, placed using the Dunbrack backbone-dependent rotamer library, and optionally refined with an OpenMM soft-potential energy minimization.
- Parameters:
sel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. Atom selection string for the residue we want to mutate. The selection needs to include all atoms of the residue. See more herenewres (
str) – The three-letter code of the target residue.reconstruct (
bool) – If True (default), fully reconstruct the new side-chain. If False, use legacy behaviour (strip side-chain and rename).rotamer_mode (
str) – How to choose the rotamer."best"(default) picks the rotamer with the lowest VdW clash energy against surrounding atoms."first"picks the most probable rotamer."random"samples a rotamer weighted by probability.minimize (
bool) – If True and OpenMM is installed, run a soft-potential energy minimization after side-chain placement. Default False.
Examples
>>> mol=tryp.copy() >>> mol.mutateResidue('resid 158', 'ARG')
- name: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numAtoms,)']#
The name of each atom.
- property numAtoms#
Number of atoms in the molecule
- property numBonds#
Number of bonds in the molecule
- property numFrames#
Number of coordinate frames in the molecule
- occupancy: Annotated[ndarray[tuple[Any, ...], dtype[float32]], 'Shape: (numAtoms,)']#
The occupancy value of each atom if read from a PDB.
- read(filename, type=None, skip=None, frames=None, append=False, overwrite='all', keepaltloc='A', guess=None, guessNE=None, _logger=True, **kwargs)#
Read topology, coordinates and trajectory files in multiple formats.
Detects from the extension the file type and loads it into Molecule
- Parameters:
filename (
str) – Name of the file we want to readtype (
str|None) – File type of the file. If None, it’s automatically determined by the extensionskip (
int|None) – If the file is a trajectory, skip every skip framesframes (
list|range|ndarray|None) – If the file is a trajectory, read only the given framesappend (
bool) – If the file is a trajectory or coor file, append the coordinates to the previous coordinates. Note append is slow.overwrite (
str|list) – A list of the existing fields in Molecule that we wish to overwrite when reading this file. Set to None if you don’t want to overwrite any existing fields.keepaltloc (
str) – Set to any string to only keep that specific altloc. Set to ‘all’ if you want to keep all alternative atom positions.guess (
list|None) – Properties of the molecule to guess. Can be any combination of (‘bonds’, ‘angles’, ‘dihedrals’)guessNE (
list|None) – Properties of the molecule to guess if it’s Non-Existent. Can be any combination of (‘bonds’, ‘angles’, ‘dihedrals’)
- record: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numAtoms,)']#
The record field of a PDB file if the topology was read from a PDB.
- remove(selection, _logger=True)#
Remove atoms from the Molecule
- Parameters:
selection (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. Atom selection string of the atoms we want to remove. See more here- Returns:
removed – The list of atoms removed
- Return type:
Example
>>> mol=tryp.copy() >>> mol.remove('name CA') array([ 1, 9, 16, 20, 24, 36, 43, 49, 53, 58,...
- removeBond(idx1, idx2)#
Remove an existing bond between a pair of atoms
- renumberResidues(returnMapping=False, start=0, modulo=None)#
Renumbers protein residues incrementally.
It checks for changes in either of the resid, insertion, chain or segid fields and in case of a change it creates a new residue number.
The renumbering is applied in-place and the insertion codes are cleared.
- Parameters:
returnMapping (
bool) – If set to True, the method will also return the mapping between the old and new residuesstart (
int) – The residue number assigned to the first residue. Subsequent residues are numbered incrementally.modulo (
int|None) – If given, the new residue numbers are wrapped using modulo arithmetic (resid % modulo).
- Returns:
mapping – Only returned if returnMapping is True. A DataFrame mapping the new residue numbers to the old resid, insertion, resname, chain and segid of each residue.
- Return type:
pandas.DataFrame
Examples
>>> mapping = mol.renumberResidues(returnMapping=True)
- reorderAtoms(order)#
Reorder atoms in Molecule
Changes the order of atoms in the Molecule to the defined order.
Examples
>>> mol = Molecule() >>> _ = mol.empty(4) >>> mol.name[:] = ['N', 'C', 'H', 'S'] >>> neworder = [1, 3, 2, 0] >>> mol.reorderAtoms(neworder) >>> print(mol.name) ['C' 'S' 'H' 'N']
- reps: Representations#
A list of representations that is used when visualizing the molecule
- resid: Annotated[ndarray[tuple[Any, ...], dtype[int64]], 'Shape: (numAtoms,)']#
The residue ID of each atom.
- resname: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numAtoms,)']#
The residue name of each atom.
- rotateBy(M, center=(0, 0, 0), sel='all')#
Rotate a selection of atoms by a given rotation matrix around a center
- Parameters:
Examples
>>> from moleculekit.util import rotationMatrix >>> mol = tryp.copy() >>> mol.rotateBy(rotationMatrix([0, 1, 0], 1.57))
- segid: Annotated[ndarray[tuple[Any, ...], dtype[object_]], 'Shape: (numAtoms,)']#
The segment ID of each atom.
- sequence(oneletter=True, noseg=False, return_idx=False, sel='all', _logger=True)#
Get the protein/nucleic sequence of the Molecule.
Deprecated since version Use:
getSequence()instead. Note thatgetSequencereturns by default a dictionary keyed by chain IDs rather than segment IDs.- Parameters:
oneletter (
bool) – If True, return the one-letter sequence. Otherwise return three-letter residue names.noseg (
bool) – If True, ignore segments and return a single combined sequence.return_idx (
bool) – If True, also return the atom indexes corresponding to each residue of the sequence.sel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. Atom selection string restricting which atoms are used to build the sequence. See more here
- Returns:
sequence – A dictionary of sequences keyed by segment ID (or a single sequence if noseg is True).
- Return type:
- serial: Annotated[ndarray[tuple[Any, ...], dtype[int64]], 'Shape: (numAtoms,)']#
The serial number of each atom.
- set(field, value, sel=None)#
Set the values of a Molecule field based on the selection
- Parameters:
field (
str) – The field we want to set. To see a list of all available fields do print(Molecule._atom_and_coord_fields).value (
stringorinteger) – All atoms that match the atom selection will have the field field set to this scalar value (or 3-vector if setting the coordinates)sel (
str|ndarray|None) – An atom selection string, a boolean mask, or an integer index array. Atom selection string for atom which to set. See more here
Examples
>>> mol=tryp.copy() >>> mol.set('segid', 'P', sel='protein')
- setDihedral(atom_quad, radians, bonds=None, guessBonds=False)#
Sets the angle of a dihedral.
- Parameters:
atom_quad (
list) – Four atom indexes corresponding to the atoms defining the dihedralradians (
float) – The angle in radians to which we want to set the dihedralbonds (
ndarray|None) – An array containing all bonds of the molecule. This is needed if multiple modifications are done as the bond guessing can get messed up if atoms come very close after the rotation.guessBonds (
bool) – Set to True if you want to guess bonds based on atom distances if they are not defined
Examples
>>> mol.setDihedral([0, 5, 8, 12], 0.16) >>> # If we perform multiple modifications, calculate bonds first and pass them as argument to be safe >>> bonds = mol._getBonds() >>> mol.setDihedral([0, 5, 8, 12], 0.16, bonds=bonds) >>> mol.setDihedral([18, 20, 24, 30], -1.8, bonds=bonds)
- step: Annotated[ndarray[tuple[Any, ...], dtype[uint64]], 'Shape: (numFrames)']#
The step for each frame of the simulation
- templateResidueFromMolecule(sel, refmol, addHs=False, onlyOnAtoms=None, guessBonds=False, _logger=True)#
Assign bonds, bond orders, formal charges and (optionally) hydrogens to a residue from a reference Molecule template.
Like
templateResidueFromSmiles(), but the template is a reference Molecule (e.g. loaded from a CIF) that already carries bonds, bond orders and formal charges. When the reference’s heavy-atom names are unique and equal to the residue’s, the atoms are mapped by NAME and the reference’s bond orders and formal charges are transferred verbatim (ideal for CIF references: unambiguous under molecular symmetry). When the names do not match (for example a reference read from an SDF file, whose atom names are only element symbols), the reference is converted to a SMILES and matched by element and connectivity instead; for a symmetric residue this can place a charge or double bond on an equivalent atom differently while giving the same molecule. Either way the reference is used only as a template and is never appended, and the molecule is mutated in place. The reference must describe the same residue: it may carry extra terminal atoms (such as a free amino acid’s OXT or a covalent leaving group), which are stripped, but a reference missing heavy atoms of the residue raises.- Parameters:
sel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array of the residue(s) to template. May span multiple copies.refmol (
Molecule) – The reference template. Its bonds, bond orders and formal charges must already be correct. When its heavy-atom names are unique and match the residue’s they are used directly; otherwise it is matched by SMILES, so unnamed references (e.g. from SDF) are supported.addHs (
bool) – If True, add hydrogens after bond orders are transferred.onlyOnAtoms (
str|ndarray|None) – Restrict which heavy atoms get hydrogens (only used with addHs).guessBonds (
bool) – If True, distance-guess the residue’s bonds before templating.
Examples
>>> mol = Molecule("complex.pdb") >>> mol.templateResidueFromMolecule("resname LIG", Molecule("LIG.cif"), addHs=True, guessBonds=True)
- templateResidueFromSmiles(sel, smiles, sanitizeSmiles=True, addHs=False, onlyOnAtoms=None, guessBonds=False, _logger=True)#
Assign bonds, bond orders, formal charges and (optionally) hydrogens to a residue from a SMILES template.
Uses RDKit’s Maximum Common Substructure (MCS) matching between the residue’s current bond graph and the SMILES template to transfer bond orders and formal charges. The molecule is mutated in place: the matched residue’s atoms are replaced by the templated copy.
Multiple residues can be templated in one call when
selspans several copies (e.g."resname NAG"for several NAG residues, or a boolean mask covering multiple chain residues). Each residue is templated individually with the same SMILES, in residue order.Cross-residue covalent bonds (peptide N-C, nucleic acid phosphodiester P-O3’, and any bond already present in
mol.bondsthat crosses the residue boundary) are detected and the boundary atoms’ H counts are reduced accordingly soaddHs=Truedoes not over-protonate them. Ifmol.bondsis empty for the residue, passguessBonds=Trueto populate it via distance-based guessing.If the SMILES template has heavy atoms that don’t map onto the residue (typically a leaving group displaced by a covalent link, or the terminal -OH / -OXT on a mid-chain amino acid SMILES), the function attempts to strip those terminal atoms automatically and retry the MCS match. If the mismatch can’t be resolved this way, a
RuntimeErroris raised.- Parameters:
sel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. VMD-style atom selection or boolean mask of the residue(s) to template. May span multiple residues with the same chemistry.smiles (
str) – SMILES string of the template residue. RCSB-style ligand SMILES (i.e. fully protonated, with explicit formal charges and the full set of heavy atoms) work best.sanitizeSmiles (
bool) – If True, the SMILES is sanitized by RDKit before matching.addHs (
bool) – If True, hydrogens are added to the residue using RDKit’sAddHsafter bond orders are transferred, respecting the boundary-bond H-count corrections described above.onlyOnAtoms (
str|ndarray|None) – An atom selection string, a boolean mask, or an integer index array. VMD-style selection within the residue restricting which heavy atoms get hydrogens added. Only used whenaddHs=True.guessBonds (
bool) – If True, run distance-based bond guessing on the residue before templating. Use this whenmol.bondsis empty for the selection (e.g. PDB inputs without CONECT records).
- Raises:
RuntimeError – If the selection is empty, spans multiple residues with gaps in atom indexes, has no bonds, or the SMILES contains heavy atoms that cannot be matched (even after auto-stripping recognized terminal atoms).
Examples
>>> mol = Molecule("3ptb") >>> mol.templateResidueFromSmiles("resname BEN", "[NH2+]=C(N)c1ccccc1", addHs=True) >>> mol.templateResidueFromSmiles("resname GLY and resid 18", "C(C(=O))N", addHs=True)
Template every copy of a sugar at once (each residue templated individually with the same SMILES):
>>> mol.templateResidueFromSmiles( ... "resname NAG", ... "CC(=O)NC1C(O)C(O)C(CO)OC1O", ... addHs=True, ... )
- time: Annotated[ndarray[tuple[Any, ...], dtype[float64]], 'Shape: (numFrames)']#
The time for each frame of the simulation
- toDict(fields=None)#
Returns a dictionary representation of the molecule
- toGraph(fields=None, distances=False)#
Converts the Molecule to a networkx graph.
Each node corresponds to an atom and edges correspond to bonds.
- Parameters:
- Returns:
graph – A graph whose nodes are atom indexes (carrying the selected fields as attributes) and whose edges are the bonds (carrying a ‘type’ attribute and, optionally, a ‘distance’ attribute).
- Return type:
networkx.Graph
- toOpenFFMolecule(sanitize=False, kekulize=False, assignStereo=True)#
Convert the Molecule to an OpenFF Molecule.
The conversion goes through an RDKit molecule. Partial charges are taken from the charge field and the per-atom residue identity (resname, resid, chain, insertion) is propagated into the OpenFF Molecule metadata so that its residue/chain hierarchy schemes can reproduce it.
- Parameters:
- Returns:
offmol – The OpenFF Molecule representation of this Molecule.
- Return type:
openff.toolkit.topology.Molecule
- toRDKitMol(sanitize=False, kekulize=False, assignStereo=True, guessBonds=False, _logger=True)#
Converts the Molecule to an RDKit molecule
- translateBy(vector, sel=None)#
Move a selection of atoms by a given vector
- Parameters:
Examples
>>> mol=tryp.copy() >>> mol.translateBy([3, 45 , -8])
- view(sel=None, style=None, color=None, guessBonds=True, viewer=None, hold=False, name=None, viewerhandle=None, gui=False)#
Visualizes the molecule in a molecular viewer
- Parameters:
sel (
str|ndarray|None) – An atom selection string, a boolean mask, or an integer index array. Atom selection string for the representation. See more herecolor (
str|int|None) – Coloring mode (str) or ColorID (int). See more here.guessBonds (
bool) – Allow the viewer to guess bonds for the moleculeviewer (
str|None) – Choose viewer backend. Resolution order: explicitviewer=argument, then theMOLECULEKIT_VIEWERenvironment variable, thenmoleculekit.config["viewer"], then auto-detection ofvmd/pymolinPATH, thenmolstaras the fallback.hold (
bool) – If set to True, it will not visualize the molecule but instead collect representations until set back to False.name (
str|None) – A name to give to the molecule in the viewerviewerhandle (
VMDobject, optional) – A specific viewer in which to visualize the molecule. If None it will use the current default viewer.gui (
bool) – If set to True, show the graphical user interface of the viewer (only used by the webgl/ngl backend).
- Returns:
The viewer handle for the webgl/ngl and molstar backends. For other backends and when hold is True nothing is returned.
- Return type:
viewer
- virtualsite: Annotated[ndarray[tuple[Any, ...], dtype[bool]], 'Shape: (numAtoms,)']#
Whether the atom is a virtual site.
- wrap(wrapsel='all', fileBonds=True, guessBonds=False, wrapcenter=None, unitcell='rectangular')#
Wraps the coordinates of the molecule into the simulation box
It assumes that all bonded groups are sequential in Molecule. I.e. you don’t have a molecule B in between the atoms of molecule A. It also requires correct bonding (ideally read from a topology file).
- Parameters:
wrapsel (
str|ndarray) – An atom selection string, a boolean mask, or an integer index array. Atom selection string of atoms on which to center the wrapping box. See more herefileBonds (
bool) – Whether to use the bonds read from the file or to guess themguessBonds (
bool) – Whether to guess the bonds. If fileBonds is True these will get appendedwrapcenter (
array_like, optional) – The center around which to wrap. If not provided, it will be calculated from the atoms in wrapsel. Normally you want to use the wrapsel option and not the wrapcenter as the coordinates of the selection can change during a simulation and wrapsel will keep that selection in the center for all frames.unitcell (
str) – This option can be used for choosing between different unit cell representations of triclinic boxes. It doesn’t have any effect on rectangular boxes. The wrapping of a triclinic cell can be “rectangular”, “triclinic” or “compact”. Rectangular wrapping is the default and it wraps the box into a parallelepiped. Triclinic wrapping wraps the box into a triclinic box. Compact wrapping wraps the box into a shape that has the minimum volume. This can be useful for visualizing e.g. truncated octahedra or rhombic dodecahedra.
Examples
>>> mol=tryp.copy() >>> mol.wrap() >>> mol.wrap('protein')
- write(filename, sel=None, type=None, **kwargs)#
Writes the topology and coordinates of the Molecule in any of the supported formats.
- Parameters:
filename (
str) – The filename of the file we want to write to disksel (
str|ndarray|None) – An atom selection string, a boolean mask, or an integer index array. Atom selection string of the atoms we want to write. If None, it will write all atoms. See more heretype (
str|None) – The filetype we want to write. By default, detected from the file extension
- property x#
Get the x coordinates at the current frame
- property y#
Get the y coordinates at the current frame
- property z#
Get the z coordinates at the current frame
- exception moleculekit.molecule.TopologyInconsistencyError(value)#
Bases:
ExceptionRaised when the topology of a Molecule is inconsistent or invalid.
- class moleculekit.molecule.UniqueAtomID(**kwargs)#
Bases:
object- static fromMolecule(mol, sel=None, idx=None)#
Create a UniqueAtomID from a single atom of a Molecule.
Exactly one of sel or idx must be given, and it must resolve to exactly one atom.
- Parameters:
- Returns:
uqid – A unique identifier for the selected atom.
- Return type:
- selectAtom(mol, indexes=True, ignore=None)#
Locate the atom matching this identifier in a Molecule.
- Parameters:
- Returns:
atom – The index of the matching atom if indexes is True, otherwise a boolean mask flagging it.
- Return type:
intornumpy.ndarray- Raises:
RuntimeError – If the atom is no longer unique or no longer present in the Molecule.
- class moleculekit.molecule.UniqueResidueID(**kwargs)#
Bases:
object- static fromMolecule(mol, sel=None, idx=None)#
- Return type:
- selectAtoms(mol, indexes=True, ignore=None)#
Locate the atoms of the residue matching this identifier in a Molecule.
- Parameters:
- Returns:
atoms – The indexes of the matching atoms if indexes is True, otherwise a boolean mask flagging them.
- Return type:
- Raises:
RuntimeError – If no atoms of the residue are present in the Molecule.
- moleculekit.molecule.calculateUniqueBonds(bonds, bondtype)#
Given bonds and bondtypes calculates unique bonds dropping any duplicates
- Parameters:
- Returns:
uqbonds (
numpy.ndarray) – The unique bonds of the moleculeuqbondtype (
numpy.ndarray) – The corresponding bond types for uqbonds
Examples
>>> from moleculekit.molecule import Molecule >>> mol = Molecule('3PTB') >>> mol.bonds, mol.bondtype = calculateUniqueBonds(mol.bonds, mol.bondtype) # Overwrite the bonds and bondtypes with the unique ones
- moleculekit.molecule.getBondedGroups(mol, bonds=None)#
Calculates all bonded groups in a Molecule
- Parameters:
- Returns:
groups (
numpy.ndarray) – Groups is an array which contains the starting index of each group.group (
numpy.ndarray) – An array with the group index of each atom
Examples
>>> mol = Molecule("structure.prmtop") >>> mol.read("output.xtc") >>> groups, _ = getBondedGroups(mol) >>> for i in range(len(groups)-1): ... print(f"Group {i} starts at index {groups[i]} and ends at index {groups[i+1]-1}")
- moleculekit.molecule.mol_equal(mol1, mol2, checkFields=('record', 'serial', 'name', 'altloc', 'resname', 'chain', 'resid', 'insertion', 'occupancy', 'beta', 'segid', 'element', 'charge', 'masses', 'atomtype', 'formalcharge', 'virtualsite', 'coords'), exceptFields=None, fieldPrecision=None, dtypes=False, uqBonds=False, _logger=True)#
Compare two Molecules for equality.
- Parameters:
mol1 (
Molecule) – The first molecule to comparemol2 (
Molecule) – The second molecule to compare to the firstcheckFields (
list) – A list of fields to compare. By default compares all atom information and coordinates in the moleculeexceptFields (
list|None) – A list of fields to not compare.fieldPrecision (
dict|None) – A dictionary of field, precision key-value pairs which defines the numerical precision of the value comparisons of two arraysdtypes (
bool) – Set to True to also compare datatypes of the fieldsuqBonds (
bool) – Set to True to compare unique bonds instead of all bonds
- Returns:
equal – Returns True if the molecules are equal or False if they are not.
- Return type:
Examples
>>> mol_equal(mol1, mol2, checkFields=['resname', 'resid', 'segid']) >>> mol_equal(mol1, mol2, exceptFields=['record', 'name']) >>> mol_equal(mol1, mol2, fieldPrecision={'coords': 1e-5})