moleculekit.molecule module#
- class moleculekit.molecule.Molecule(filename=None, name=None, **kwargs)#
Bases:
object
Class to manipulate molecular structures.
Molecule contains all the fields of a PDB and it is independent of any force field. It can contain multiple conformations and trajectories, however all operations are done on the current frame. The following PDB fields are accessible as attributes (record, serial, name, altloc, resname, chain, resid, insertion, coords, occupancy, beta, segid, element, charge). The coordinates are accessible via the coords attribute ([number of atoms x 3 x number of frames] where [x,y,z] are the second dimension.
- Parameters:
filename (str or list of str) – Optionally load a PDB file from the specified file. If there’s no file and the value is four characters long assume it is a PDB accession code and try to download from the RCSB web server.
name (str) – Give a name to the Molecule that will be used for visualization
kwargs – Accepts any further arguments that should be passed to the Molecule.read method.
Examples
>>> mol = Molecule( './test/data/dhfr/dhfr.pdb' ) >>> mol = Molecule( '3PTB', name='Trypsin' ) >>> print(mol) Molecule with 1701 atoms and 1 frames Atom field - altloc shape: (1701,) Atom field - atomtype shape: (1701,) ...
Methods
Attributes
- record#
The record field of a PDB file if the topology was read from a PDB.
- Type:
np.ndarray
- serial#
The serial number of each atom.
- Type:
np.ndarray
- name#
The name of each atom.
- Type:
np.ndarray
- altloc#
The alternative location flag of the atoms if read from a PDB.
- Type:
np.ndarray
- resname#
The residue name of each atom.
- Type:
np.ndarray
- chain#
The chain name of each atom.
- Type:
np.ndarray
- resid#
The residue ID of each atom.
- Type:
np.ndarray
- insertion#
The insertion flag of the atoms if read from a PDB.
- Type:
np.ndarray
- occupancy#
The occupancy value of each atom if read from a PDB.
- Type:
np.ndarray
- beta#
The beta factor value of each atom if read from a PDB.
- Type:
np.ndarray
- segid#
The segment ID of each atom.
- Type:
np.ndarray
- element#
The element of each atom.
- Type:
np.ndarray
- charge#
The charge of each atom.
- Type:
np.ndarray
- masses#
The mass of each atom.
- Type:
np.ndarray
- atomtype#
The atom type of each atom.
- Type:
np.ndarray
- formalcharge#
The formal charge of each atom
- Type:
np.ndarray
- coords#
A float32 array with shape (natoms, 3, nframes) containing the coordinates of the Molecule.
- Type:
np.ndarray
- box#
A float32 array with shape (3, nframes) containing the periodic box dimensions of an MD trajectory.
- Type:
np.ndarray
- boxangles#
The angles of the box. If none are set they are assumed to be 90 degrees.
- Type:
np.ndarray
- bonds#
Atom pairs corresponding to bond terms.
- Type:
np.ndarray
- bondtype#
The type of each bond in Molecule.bonds if available.
- Type:
np.ndarray
- angles#
Atom triplets corresponding to angle terms.
- Type:
np.ndarray
- dihedrals#
Atom quadruplets corresponding to dihedral terms.
- Type:
np.ndarray
- impropers#
Atom quadruplets corresponding to improper dihedral terms.
- Type:
np.ndarray
- crystalinfo#
A dictionary containing crystallographic information. It has fields [‘sGroup’, ‘numcopies’, ‘rotations’, ‘translations’]
- Type:
- frame#
The current frame. atomselection and get commands will be calculated on this frame.
- Type:
- reps#
A list of representations that is used when visualizing the molecule
- Type:
Representations
object
- addBond(idx1, idx2, btype)#
Add a new bond to a pair of atoms
If the bond already exists it will only update it’s type
- Parameters:
Examples
>>> mol.addBond(13, 18, "2") # Adds a double bond
- align(sel, refmol=None, refsel=None, frames=None, matchingframes=False, mode='index', _logger=True)#
Align conformations.
Align a given set of frames of this molecule to either the current active frame of this molecule (mol.frame) or the current frame of a different reference molecule. To align to any frame other than the current active one modify the refmol.frame property before calling this method.
- Parameters:
sel (str) – Atom selection string for aligning. See more here
refmol (
Molecule
, optional) – Optionally pass a reference Molecule on which to align. If None is given, it will align on the first frame of the same Moleculerefsel (str, optional) – Atom selection for the refmol if one is given. Default: same as sel. See more here
frames (list or range) – A list of frames which to align. By default it will align all frames of the Molecule
matchingframes (bool) – If set to True it will align the selected frames of this molecule to the corresponding frames of the refmol. This requires both molecules to have the same number of frames.
mode (str) – Options are (‘index’, ‘structure’). Setting to ‘index’ will align two structures on the atoms selected in sel and refsel in increasing order of their indices. Meaning that if sel is name CA and resid 5 3 and refsel is name CA and resid 7 8, assuming that resid 3 comes before 5, it will align the CA or resid 3 to resid 7 in refmol and 5 to 8 instead of 5-7, 3-8 as one might expect from the atomselection strings. Setting mode to ‘structure’ will perform pure structural alignment regardless of atom order using the TM-Align method.
Examples
>>> mol=tryp.copy() >>> mol.align('protein') >>> mol.align('name CA', refmol=Molecule('3PTB'))
- alignBySequence(ref, molseg=None, refseg=None, molsel='all', refsel='all', nalignfragment=1, returnAlignments=False, maxalignments=1)#
Aligns the Molecule to a reference Molecule by their longest sequence alignment
- Parameters:
ref (
Molecule
object) – The reference Molecule to which we want to alignmolsel (str) – The atom selection of this Molecule we want to align
refsel (str) – The atom selection of ref we want to align to
nalignfragments (int) – The number of fragments used for the alignment.
returnAlignments (bool) – Return all alignments as a list of Molecules
maxalignments (int) – The maximum number of alignments we want to produce
- Returns:
mols – If returnAlignments is True it returns a list of Molecules each containing a different alignment. Otherwise it modifies the current Molecule with the best single alignment.
- Return type:
- append(mol, collisions=False, coldist=1.3, removesel='all')#
Append a molecule at the end of the current molecule
- Parameters:
mol (
Molecule
) – Target Molecule which to append to the end of the current Moleculecollisions (bool) – If set to True it will remove residues of mol which collide with atoms of this Molecule object.
coldist (float) – Collision distance in Angstrom between atoms of the two molecules. Anything closer will be considered a collision.
removesel (str) – Atomselection for atoms to be removed from the passed molecule in case of collisions.
Example
>>> mol=tryp.copy() >>> mol.filter("not resname BEN") array([1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637, 1638], dtype=int32) >>> lig=tryp.copy() >>> lig.filter("resname BEN") array([ 0, 1, 2, ..., 1698, 1699, 1700], dtype=int32) >>> mol.append(lig)
- appendFrames(mol)#
Appends the frames of another Molecule object to this object.
- Parameters:
mol (
Molecule
) – A Molecule object.
- atomselect(sel, indexes=False, strict=False, fileBonds=True, guessBonds=True, _debug=False)#
Get a boolean mask or the indexes of a set of selected atoms
- Parameters:
- Returns:
asel – Either a boolean mask of selected atoms or their indexes
- Return type:
np.ndarray
Examples
>>> mol=tryp.copy() >>> mol.atomselect('resname MOL') array([False, False, False, ..., False, False, False], dtype=bool)
- center(loc=(0, 0, 0), sel='all')#
Moves the geometric center of the Molecule to a given location
- Parameters:
Examples
>>> mol=tryp.copy() >>> mol.center() >>> mol.center([10, 10, 10], 'name CA')
- copy(frames=None, sel=None)#
Create a copy of the Molecule object
- Returns:
newmol (
Molecule
) – A copy of the objectframes (list of int) – If specified, only the selected frames will be copied.
sel (str) – Atom selection for atoms to keep in the copy.
- deleteBonds(sel, inter=True)#
Deletes all bonds that contain atoms in sel or between atoms in sel.
- dropFrames(drop=None, keep=None)#
Removes trajectory frames from the Molecule
- Parameters:
Examples
>>> mol = Molecule('1sb0') >>> mol.dropFrames(keep=[1,2]) >>> mol.numFrames == 2 True >>> mol.dropFrames(drop=[0]) >>> mol.numFrames == 1 True
- empty(numAtoms)#
Creates an empty molecule of numAtoms atoms.
- Parameters:
numAtoms (int) – Number of atoms to create in the molecule.
Example
>>> newmol = Molecule().empty(100)
- filter(sel, _logger=True)#
Removes all atoms not included in the selection
- Parameters:
- Returns:
removed – An array of all atoms which did not belong to sel and were removed from the Molecule object
- Return type:
np.ndarray
Examples
>>> mol=tryp.copy() >>> mol.filter('protein')
- property frame#
The currently active frame of the Molecule on which methods will be applied
- property fstep#
The frame-step of the trajectory
- get(field, sel=None)#
Retrieve a specific PDB field based on the selection
- Parameters:
- Returns:
vals – Array of values of field for all atoms in the selection.
- Return type:
np.ndarray
Examples
>>> mol=tryp.copy() >>> mol.get('resname') array(['ILE', 'ILE', 'ILE', ..., 'HOH', 'HOH', 'HOH'], dtype=object) >>> mol.get('resname', sel='resid 158') array(['LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU'], dtype=object)
- getCenter(sel='all', com=False)#
Get the center of an atom selection
- getDihedral(atom_quad)#
Get the value of a dihedral angle.
- Parameters:
atom_quad (list) – Four atom indexes corresponding to the atoms defining the dihedral
- Returns:
angle – The angle in radians
- Return type:
Examples
>>> mol.getDihedral([0, 5, 8, 12])
- getNeighbors(idx, bonds=None)#
Returns all atoms bonded to a specific atom
- guessBonds(rdkit=False)#
- hasBond(idx1, idx2)#
Checks if the Molecule has a bond between two atom indexes
- insert(mol, index, collisions=0, coldist=1.3, removesel='all')#
Insert the atoms of one molecule into another at a specific index.
- Parameters:
mol (
Molecule
) – Molecule to be insertedindex (integer) – The atom index at which the passed molecule will be inserted
collisions (bool) – If set to True it will remove residues of mol which collide with atoms of this Molecule object.
coldist (float) – Collision distance in Angstrom between atoms of the two molecules. Anything closer will be considered a collision.
removesel (str) – Atomselection for atoms to be removed from the passed molecule in case of collisions.
Example
>>> mol=tryp.copy() >>> mol.numAtoms 1701 >>> mol.insert(tryp, 0) >>> mol.numAtoms 3402
- moveBy(vector, sel=None)#
- mutateResidue(sel, newres)#
Mutates a residue by deleting its sidechain and renaming it
- Parameters:
Examples
>>> mol=tryp.copy() >>> mol.mutateResidue('resid 158', 'ARG')
- property numAtoms#
Number of atoms in the molecule
- property numBonds#
Number of bonds in the molecule
- property numFrames#
Number of coordinate frames in the molecule
- property numResidues#
The number of residues in the Molecule
- read(filename, type=None, skip=None, frames=None, append=False, overwrite='all', keepaltloc='A', guess=None, guessNE=None, _logger=True, **kwargs)#
Read topology, coordinates and trajectory files in multiple formats.
Detects from the extension the file type and loads it into Molecule
- Parameters:
filename (str) – Name of the file we want to read
type (str, optional) – File type of the file. If None, it’s automatically determined by the extension
skip (int, optional) – If the file is a trajectory, skip every skip frames
frames (list, optional) – If the file is a trajectory, read only the given frames
append (bool, optional) – If the file is a trajectory or coor file, append the coordinates to the previous coordinates. Note append is slow.
overwrite (str, list of str) – A list of the existing fields in Molecule that we wish to overwrite when reading this file. Set to None if you don’t want to overwrite any existing fields.
keepaltloc (str) – Set to any string to only keep that specific altloc. Set to ‘all’ if you want to keep all alternative atom positions.
guess (list of str) – Properties of the molecule to guess. Can be any combination of (‘bonds’, ‘angles’, ‘dihedrals’)
guessNE (list of str) – Properties of the molecule to guess if it’s Non-Existent. Can be any combination of (‘bonds’, ‘angles’, ‘dihedrals’)
- remove(selection, _logger=True)#
Remove atoms from the Molecule
- Parameters:
selection (str) – Atom selection string of the atoms we want to remove. See more here
- Returns:
removed – The list of atoms removed
- Return type:
np.array
Example
>>> mol=tryp.copy() >>> mol.remove('name CA') array([ 1, 9, 16, 20, 24, 36, 43, 49, 53, 58,...
- removeBond(idx1, idx2)#
Remove an existing bond between a pair of atoms
- renumberResidues(returnMapping=False, start=0, modulo=None)#
Renumbers protein residues incrementally.
It checks for changes in either of the resid, insertion, chain or segid fields and in case of a change it creates a new residue number.
- Parameters:
returnMapping (bool) – If set to True, the method will also return the mapping between the old and new residues
Examples
>>> mapping = mol.renumberResidues(returnMapping=True)
- reorderAtoms(order)#
Reorder atoms in Molecule
Changes the order of atoms in the Molecule to the defined order.
- Parameters:
order (list) – A list containing the new order of atoms
Examples
>>> mol = Molecule() >>> _ = mol.empty(4) >>> mol.name[:] = ['N', 'C', 'H', 'S'] >>> neworder = [1, 3, 2, 0] >>> mol.reorderAtoms(neworder) >>> print(mol.name) ['C' 'S' 'H' 'N']
- rotateBy(M, center=(0, 0, 0), sel='all')#
Rotate a selection of atoms by a given rotation matrix around a center
- Parameters:
Examples
>>> from moleculekit.util import rotationMatrix >>> mol = tryp.copy() >>> mol.rotateBy(rotationMatrix([0, 1, 0], 1.57))
- sequence(oneletter=True, noseg=False, return_idx=False, sel='all', _logger=True)#
Return the aminoacid sequence of the Molecule.
- Parameters:
oneletter (bool) – Whether to return one-letter or three-letter AA codes. There should be only one atom per residue.
noseg (bool) – Ignore segments and return the whole sequence as single string.
return_idx (bool) – If True, the function also returns the indexes of the atoms of each residue in the sequence
sel (str) – Atomselection for which to return the sequence
- Returns:
sequence – The primary sequence as a dictionary segid - string (if oneletter is True) or segid - list of strings (otherwise).
- Return type:
Examples
>>> mol=tryp.copy() >>> mol.sequence() {'0': 'IVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEGNEQFISASKSIVHPSYNSNTLNNDIMLIKLKSAASLNSRVASISLPTSCASAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILSDSSCKSAYPGQITSNMFCAGYLEGGKDSCQGDSGGPVVCSGKLQGIVSWGSGCAQKNKPGVYTKVCNYVSWIKQTIASN'} >>> sh2 = Molecule("1LKK") >>> pYseq = sh2.sequence(oneletter=False) >>> pYseq['1'] ['PTR', 'GLU', 'GLU', 'ILE'] >>> pYseq = sh2.sequence(oneletter=True) >>> pYseq['1'] 'XEEI'
- set(field, value, sel=None)#
Set the values of a Molecule field based on the selection
- Parameters:
field (str) – The field we want to set. To see a list of all available fields do print(Molecule._atom_and_coord_fields).
value (string or integer) – All atoms that match the atom selection will have the field field set to this scalar value (or 3-vector if setting the coordinates)
sel (str) – Atom selection string for atom which to set. See more here
Examples
>>> mol=tryp.copy() >>> mol.set('segid', 'P', sel='protein')
- setDihedral(atom_quad, radians, bonds=None, guessBonds=False)#
Sets the angle of a dihedral.
- Parameters:
atom_quad (list) – Four atom indexes corresponding to the atoms defining the dihedral
radians (float) – The angle in radians to which we want to set the dihedral
bonds (np.ndarray) – An array containing all bonds of the molecule. This is needed if multiple modifications are done as the bond guessing can get messed up if atoms come very close after the rotation.
guessBonds (bool) – Set to True if you want to guess bonds based on atom distances if they are not defined
Examples
>>> mol.setDihedral([0, 5, 8, 12], 0.16) >>> # If we perform multiple modifications, calculate bonds first and pass them as argument to be safe >>> bonds = mol._getBonds() >>> mol.setDihedral([0, 5, 8, 12], 0.16, bonds=bonds) >>> mol.setDihedral([18, 20, 24, 30], -1.8, bonds=bonds)
- toDict(fields=None)#
Returns a dictionary representation of the molecule
- toGraph(fields=None, distances=False)#
Converts the Molecule to a networkx graph.
Each node corresponds to an atom and edges correspond to bonds
- toOpenFFMolecule()#
- translateBy(vector, sel=None)#
Move a selection of atoms by a given vector
- Parameters:
Examples
>>> mol=tryp.copy() >>> mol.moveBy([3, 45 , -8])
- view(sel=None, style=None, color=None, guessBonds=True, viewer=None, hold=False, name=None, viewerhandle=None, gui=False, pmviewurl='http://localhost:8090')#
Visualizes the molecule in a molecular viewer
- Parameters:
sel (str) – Atom selection string for the representation. See more here
color (str or int) – Coloring mode (str) or ColorID (int). See more here.
guessBonds (bool) – Allow the viewer to guess bonds for the molecule
viewer (str ('pmview', 'pymol', 'vmd', 'webgl')) – Choose viewer backend. Default is taken from either moleculekit.config or if it doesn’t exist from moleculekit.config
hold (bool) – If set to True, it will not visualize the molecule but instead collect representations until set back to False.
name (str, optional) – A name to give to the molecule in the viewer
viewerhandle (
VMD
object, optional) – A specific viewer in which to visualize the molecule. If None it will use the current default viewer.pmviewurl (string) – URL of pmview REST server
- wrap(wrapsel='all', fileBonds=True, guessBonds=False)#
Wraps the coordinates of the molecule into the simulation box
It assumes that all bonded groups are sequential in Molecule. I.e. you don’t have a molecule B in between the atoms of molecule A. It also requires correct bonding (ideally read from a topology file).
- Parameters:
wrapsel (str) – Atom selection string of atoms on which to center the wrapping box. See more here
Examples
>>> mol=tryp.copy() >>> mol.wrap() >>> mol.wrap('protein')
- write(filename, sel=None, type=None, **kwargs)#
Writes the topology and coordinates of the Molecule in any of the supported formats.
- Parameters:
- property x#
Get the x coordinates at the current frame
- property y#
Get the y coordinates at the current frame
- property z#
Get the z coordinates at the current frame
- class moleculekit.molecule.Representations(mol)#
Bases:
object
Class that stores representations for Molecule.
Examples
>>> from moleculekit.molecule import Molecule >>> mol = tryp.copy() >>> mol.reps.add('protein', 'NewCartoon') >>> print(mol.reps) rep 0: sel='protein', style='NewCartoon', color='Name' >>> mol.view() >>> mol.reps.remove()
- add(sel=None, style=None, color=None, frames=None, opacity=None)#
Adds a new representation for Molecule.
- Parameters:
sel (str) – Atom selection string for the representation. See more here
color (str or int) – Coloring mode (str) or ColorID (int). See more here.
frames (list) – List of frames to visualize with this representation. If None it will visualize the current frame only.
opacity (float) – Opacity of the representation. 0 is fully transparent and 1 is fully opaque.
- append(reps)#
- list()#
Lists all representations. Equivalent to using print.
- class moleculekit.molecule.UniqueAtomID(**kwargs)#
Bases:
object
- static fromMolecule(mol, sel=None, idx=None)#
- selectAtom(mol, indexes=True, ignore=None)#
- class moleculekit.molecule.UniqueResidueID(**kwargs)#
Bases:
object
- static fromMolecule(mol, sel=None, idx=None)#
- selectAtoms(mol, indexes=True, ignore=None)#
- moleculekit.molecule.calculateUniqueBonds(bonds, bondtype)#
Given bonds and bondtypes calculates unique bonds dropping any duplicates
- Parameters:
bonds (np.ndarray) – The bonds of a molecule
bondtype (np.ndarray) – The bond type of each bond in the bonds array
- Returns:
uqbonds (np.ndarray) – The unique bonds of the molecule
uqbondtype (np.ndarray) – The corresponding bond types for uqbonds
Examples
>>> from moleculekit.molecule import Molecule >>> mol = Molecule('3PTB') >>> mol.bonds, mol.bondtype = calculateUniqueBonds(mol.bonds, mol.bondtype) # Overwrite the bonds and bondtypes with the unique ones
- moleculekit.molecule.getBondedGroups(mol, bonds=None)#
Calculates all bonded groups in a Molecule
- Parameters:
mol (Molecule) – A Molecule object
bonds (np.ndarray) – Optionally pass a different array of bonds. If None it will take the bonds from mol.bonds.
- Returns:
groups (np.ndarray) – Groups is an array which contains the starting index of each group.
group (np.ndarray) – An array with the group index of each atom
Examples
>>> mol = Molecule("structure.prmtop") >>> mol.read("output.xtc") >>> groups, _ = getBondedGroups(mol) >>> for i in range(len(groups)-1): ... print(f"Group {i} starts at index {groups[i]} and ends at index {groups[i+1]-1}")
- moleculekit.molecule.mol_equal(mol1, mol2, checkFields=('record', 'serial', 'name', 'altloc', 'resname', 'chain', 'resid', 'insertion', 'occupancy', 'beta', 'segid', 'element', 'charge', 'masses', 'atomtype', 'formalcharge', 'coords'), exceptFields=None, fieldPrecision=None, dtypes=False, uqBonds=False, _logger=True)#
Compare two Molecules for equality.
- Parameters:
mol1 (Molecule) – The first molecule to compare
mol2 (Molecule) – The second molecule to compare to the first
checkFields (list) – A list of fields to compare. By default compares all atom information and coordinates in the molecule
exceptFields (list) – A list of fields to not compare.
fieldPrecision (dict) – A dictionary of field, precision key-value pairs which defines the numerical precision of the value comparisons of two arrays
dtypes (bool) – Set to True to also compare datatypes of the fields
uqBonds (bool) – Set to True to compare unique bonds instead of all bonds
_logger (bool) – Set to False to disable the printing of the differences in the two Molecules
- Returns:
equal – Returns True if the molecules are equal or False if they are not.
- Return type:
Examples
>>> mol_equal(mol1, mol2, checkFields=['resname', 'resid', 'segid']) >>> mol_equal(mol1, mol2, exceptFields=['record', 'name']) >>> mol_equal(mol1, mol2, fieldPrecision={'coords': 1e-5})