moleculekit.molecule module

class moleculekit.molecule.Molecule(filename=None, name=None, **kwargs)

Bases: object

Class to manipulate molecular structures.

Molecule contains all the fields of a PDB and it is independent of any force field. It can contain multiple conformations and trajectories, however all operations are done on the current frame. The following PDB fields are accessible as attributes (record, serial, name, altloc, resname, chain, resid, insertion, coords, occupancy, beta, segid, element, charge). The coordinates are accessible via the coords attribute ([number of atoms x 3 x number of frames] where [x,y,z] are the second dimension.

Parameters
  • filename (str or list of str) – Optionally load a PDB file from the specified file. If there’s no file and the value is four characters long assume it is a PDB accession code and try to download from the RCSB web server.

  • name (str) – Give a name to the Molecule that will be used for visualization

  • kwargs – Accepts any further arguments that should be passed to the Molecule.read method.

Examples

>>> mol = Molecule( './test/data/dhfr/dhfr.pdb' )  
>>> mol = Molecule( '3PTB', name='Trypsin' )
>>> print(mol)                                     
Molecule with 1701 atoms and 1 frames
Atom field - altloc shape: (1701,)
Atom field - atomtype shape: (1701,)
...

Methods

Attributes

numAtoms

Number of atoms in the Molecule

Type

int

numFrames

Number of conformers / simulation frames in the Molecule

Type

int

numResidues

Number of residues in the Molecule

Type

int

record

The record field of a PDB file if the topology was read from a PDB.

Type

np.ndarray

serial

The serial number of each atom.

Type

np.ndarray

name

The name of each atom.

Type

np.ndarray

altloc

The alternative location flag of the atoms if read from a PDB.

Type

np.ndarray

resname

The residue name of each atom.

Type

np.ndarray

chain

The chain name of each atom.

Type

np.ndarray

resid

The residue ID of each atom.

Type

np.ndarray

insertion

The insertion flag of the atoms if read from a PDB.

Type

np.ndarray

occupancy

The occupancy value of each atom if read from a PDB.

Type

np.ndarray

beta

The beta factor value of each atom if read from a PDB.

Type

np.ndarray

segid

The segment ID of each atom.

Type

np.ndarray

element

The element of each atom.

Type

np.ndarray

charge

The charge of each atom.

Type

np.ndarray

masses

The mass of each atom.

Type

np.ndarray

atomtype

The atom type of each atom.

Type

np.ndarray

coords

A float32 array with shape (natoms, 3, nframes) containing the coordinates of the Molecule.

Type

np.ndarray

box

A float32 array with shape (3, nframes) containing the periodic box dimensions of an MD trajectory.

Type

np.ndarray

boxangles

The angles of the box. If none are set they are assumed to be 90 degrees.

Type

np.ndarray

bonds

Atom pairs corresponding to bond terms.

Type

np.ndarray

bondtype

The type of each bond in Molecule.bonds if available.

Type

np.ndarray

angles

Atom triplets corresponding to angle terms.

Type

np.ndarray

dihedrals

Atom quadruplets corresponding to dihedral terms.

Type

np.ndarray

impropers

Atom quadruplets corresponding to improper dihedral terms.

Type

np.ndarray

crystalinfo

A dictionary containing crystallographic information. It has fields [‘sGroup’, ‘numcopies’, ‘rotations’, ‘translations’]

Type

dict

frame

The current frame. atomselection and get commands will be calculated on this frame.

Type

int

fileloc

The location of the files used to read this Molecule

Type

list

time

The time for each frame of the simulation

Type

list

fstep

The step for each frame of the simulation

Type

list

reps

A list of representations that is used when visualizing the molecule

Type

Representations object

viewname

The name used for the molecule in the viewer

Type

str

align(sel, refmol=None, refsel=None, frames=None, matchingframes=False)

Align conformations.

Align a given set of frames of this molecule to either the current active frame of this molecule (mol.frame) or the current frame of a different reference molecule. To align to any frame other than the current active one modify the refmol.frame property before calling this method.

Parameters
  • sel (str) – Atom selection string for aligning. See more here

  • refmol (Molecule, optional) – Optionally pass a reference Molecule on which to align. If None is given, it will align on the first frame of the same Molecule

  • refsel (str, optional) – Atom selection for the refmol if one is given. Default: same as sel. See more here

  • frames (list or range) – A list of frames which to align. By default it will align all frames of the Molecule

  • matchingframes (bool) – If set to True it will align the selected frames of this molecule to the corresponding frames of the refmol. This requires both molecules to have the same number of frames.

Examples

>>> mol=tryp.copy()
>>> mol.align('protein')
>>> mol.align('name CA', refmol=Molecule('3PTB'))
alignBySequence(ref, molseg=None, refseg=None, nalignfragment=1, returnAlignments=False, maxalignments=1)

Aligns the Molecule to a reference Molecule by their longest sequence alignment

Parameters
  • ref (Molecule object) – The reference Molecule to which we want to align

  • molseg (str) – The segment of this Molecule we want to align

  • refseg (str) – The segment of ref we want to align to

  • nalignfragments (int) – The number of fragments used for the alignment.

  • returnAlignments (bool) – Return all alignments as a list of Molecules

  • maxalignments (int) – The maximum number of alignments we want to produce

Returns

mols – If returnAlignments is True it returns a list of Molecules each containing a different alignment. Otherwise it modifies the current Molecule with the best single alignment.

Return type

list

append(mol, collisions=False, coldist=1.3)

Append a molecule at the end of the current molecule

Parameters
  • mol (Molecule) – Target Molecule which to append to the end of the current Molecule

  • collisions (bool) – If set to True it will remove residues of mol which collide with atoms of this Molecule object.

  • coldist (float) – Collision distance in Angstrom between atoms of the two molecules. Anything closer will be considered a collision.

Example

>>> mol=tryp.copy()
>>> mol.filter("not resname BEN")
>>> lig=tryp.copy()
>>> lig.filter("resname BEN")
>>> mol.append(lig)
appendFrames(mol)

Appends the frames of another Molecule object to this object.

Parameters

mol (Molecule) – A Molecule object.

atomselect(sel, indexes=False, strict=False, fileBonds=True, guessBonds=True)

Get a boolean mask or the indexes of a set of selected atoms

Parameters
  • sel (str) – Atom selection string. See more here

  • indexes (bool) – If True returns the indexes instead of a bitmap

  • strict (bool) – If True it will raise an error if no atoms were selected.

  • fileBonds (bool) – If True will use bonds read from files.

  • guessBonds (bool) – If True will use guessed bonds.

Returns

asel – Either a boolean mask of selected atoms or their indexes

Return type

np.ndarray

Examples

>>> mol=tryp.copy()
>>> mol.atomselect('resname MOL')
array([False, False, False, ..., False, False, False], dtype=bool)
center(loc=(0, 0, 0), sel='all')

Moves the geometric center of the Molecule to a given location

Parameters
  • loc (list, optional) – The location to which to move the geometric center

  • sel (str) – Atom selection string of the atoms whose geometric center we want to center on the loc position. See more here

Examples

>>> mol=tryp.copy()
>>> mol.center()
>>> mol.center([10, 10, 10], 'name CA')
copy()

Create a copy of the Molecule object

Returns

newmol – A copy of the object

Return type

Molecule

deleteBonds(sel, inter=True)

Deletes all bonds that contain atoms in sel or between atoms in sel.

Parameters
  • sel (str) – Atom selection string of atoms whose bonds will be deleted. See more here

  • inter (bool) – When True it will delete also bonds between atoms in sel with bonds to atoms outside of sel. When False it will only delete bonds between atoms in sel.

dropFrames(keep='all', drop=None)

Removes trajectory frames from the Molecule

Parameters
  • keep (int or list of ints) – Index of frame, or list of frame indexes which we want to keep (and drop all others).

  • drop (int or list of ints) – Index of frame, or list of frame indexes which we want to drop (and keep all others).

Examples

>>> mol = Molecule('1sb0')
>>> mol.dropFrames(keep=[1,2])
>>> mol.numFrames == 2
>>> mol.dropFrames(drop=[0])
>>> mol.numFrames == 1
empty(numAtoms)

Creates an empty molecule of numAtoms atoms.

Parameters

numAtoms (int) – Number of atoms to create in the molecule.

Example

>>> newmol = Molecule().empty(100)
filter(sel, _logger=True)

Removes all atoms not included in the selection

Parameters

sel (str) – Atom selection string. See more here

Examples

>>> mol=tryp.copy()
>>> mol.filter('protein')
property frame

The currently active frame of the Molecule on which methods will be applied

property fstep

The frame-step of the trajectory

get(field, sel=None)

Retrieve a specific PDB field based on the selection

Parameters
  • field (str) – The field we want to get. To see a list of all available fields do print(Molecule._atom_and_coord_fields).

  • sel (str) – Atom selection string for which atoms we want to get the field from. Default all. See more here

Returns

vals – Array of values of field for all atoms in the selection.

Return type

np.ndarray

Examples

>>> mol=tryp.copy()
>>> mol.get('resname')
array(['ILE', 'ILE', 'ILE', ..., 'HOH', 'HOH', 'HOH'], dtype=object)
>>> mol.get('resname', sel='resid 158')
array(['LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU'], dtype=object)
getDihedral(atom_quad)

Get the value of a dihedral angle.

Parameters

atom_quad (list) – Four atom indexes corresponding to the atoms defining the dihedral

Returns

angle – The angle in radians

Return type

float

Examples

>>> mol.getDihedral([0, 5, 8, 12])
insert(mol, index, collisions=0, coldist=1.3)

Insert the atoms of one molecule into another at a specific index.

Parameters
  • mol (Molecule) – Molecule to be inserted

  • index (integer) – The atom index at which the passed molecule will be inserted

  • collisions (bool) – If set to True it will remove residues of mol which collide with atoms of this Molecule object.

  • coldist (float) – Collision distance in Angstrom between atoms of the two molecules. Anything closer will be considered a collision.

Example

>>> mol=tryp.copy()
>>> mol.numAtoms
1701
>>> mol.insert(tryp, 0)
>>> mol.numAtoms
3402
moveBy(vector, sel=None)

Move a selection of atoms by a given vector

Parameters
  • vector (list) – 3D coordinates to add to the Molecule coordinates

  • sel (str) – Atom selection string of atoms which we want to move. See more here

Examples

>>> mol=tryp.copy()
>>> mol.moveBy([3, 45 , -8])
mutateResidue(sel, newres)

Mutates a residue by deleting its sidechain and renaming it

Parameters
  • sel (str) – Atom selection string for the residue we want to mutate. The selection needs to include all atoms of the residue. See more here

  • newres (str) – The name of the new residue

Examples

>>> mol=tryp.copy()
>>> mol.mutateResidue('resid 158', 'ARG')
property numAtoms

Number of atoms in the molecule

property numFrames

Number of coordinate frames in the molecule

property numResidues

The number of residues in the Molecule

read(filename, type=None, skip=None, frames=None, append=False, overwrite='all', keepaltloc='A', guess=None, guessNE=None, _logger=True, **kwargs)

Read topology, coordinates and trajectory files in multiple formats.

Detects from the extension the file type and loads it into Molecule

Parameters
  • filename (str) – Name of the file we want to read

  • type (str, optional) – File type of the file. If None, it’s automatically determined by the extension

  • skip (int, optional) – If the file is a trajectory, skip every skip frames

  • frames (list, optional) – If the file is a trajectory, read only the given frames

  • append (bool, optional) – If the file is a trajectory or coor file, append the coordinates to the previous coordinates. Note append is slow.

  • overwrite (str, list of str) – A list of the existing fields in Molecule that we wish to overwrite when reading this file. Set to None if you don’t want to overwrite any existing fields.

  • keepaltloc (str) – Set to any string to only keep that specific altloc. Set to ‘all’ if you want to keep all alternative atom positions.

  • guess (list of str) – Properties of the molecule to guess. Can be any combination of (‘bonds’, ‘angles’, ‘dihedrals’)

  • guessNE (list of str) – Properties of the molecule to guess if it’s Non-Existent. Can be any combination of (‘bonds’, ‘angles’, ‘dihedrals’)

remove(selection, _logger=True)

Remove atoms from the Molecule

Parameters

selection (str) – Atom selection string of the atoms we want to remove. See more here

Returns

removed – The list of atoms removed

Return type

np.array

Example

>>> mol=tryp.copy()
>>> mol.remove('name CA')               
array([   1,    9,   16,   20,   24,   36,   43,   49,   53,   58,...
renumberResidues(returnMapping=False)

Renumbers protein residues incrementally.

It checks for changes in either of the resid, insertion, chain or segid fields and in case of a change it creates a new residue number.

Parameters

returnMapping (bool) – If set to True, the method will also return the mapping between the old and new residues

Examples

>>> mapping = mol.renumberResidues(returnMapping=True)
reorderAtoms(order)

Reorder atoms in Molecule

Changes the order of atoms in the Molecule to the defined order.

Parameters

order (list) – A list containing the new order of atoms

Examples

>>> mol = Molecule()
>>> mol.empty(4)
>>> mol.name[:] = ['N', 'C', 'H', 'S']
>>> neworder = [1, 3, 2, 0]
>>> mol.reorderAtoms(neworder)
>>> print(mol.name)
array(['C', 'S', 'H', 'N'], dtype=object)
rotateBy(M, center=(0, 0, 0), sel='all')

Rotate a selection of atoms by a given rotation matrix around a center

Parameters
  • M (np.ndarray) – The rotation matrix

  • center (list) – The rotation center

  • sel (str) – Atom selection string for atoms to rotate. See more here

Examples

>>> mol = tryp.copy()
>>> mol.rotateBy(rotationMatrix([0, 1, 0], 1.57))
sequence(oneletter=True, noseg=False)

Return the aminoacid sequence of the Molecule.

Parameters
  • oneletter (bool) – Whether to return one-letter or three-letter AA codes. There should be only one atom per residue.

  • noseg (bool) – Ignore segments and return the whole sequence as single string.

Returns

sequence – The primary sequence as a dictionary segid - string (if oneletter is True) or segid - list of strings (otherwise).

Return type

str

Examples

>>> mol=tryp.copy()
>>> mol.sequence()
{'0': 'IVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEGNEQFISASKSIVHPSYNSNTLNNDIMLIKLKSAASLNSRVASISLPTSCASAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILSDSSCKSAYPGQITSNMFCAGYLEGGKDSCQGDSGGPVVCSGKLQGIVSWGSGCAQKNKPGVYTKVCNYVSWIKQTIASN'}
>>> sh2 = Molecule("1LKK")
>>> pYseq = sh2.sequence(oneletter=False)
>>> pYseq['1']
['PTR', 'GLU', 'GLU', 'ILE']
>>> pYseq = sh2.sequence(oneletter=True)
>>> pYseq['1']
'XEEI'
set(field, value, sel=None)

Set the values of a Molecule field based on the selection

Parameters
  • field (str) – The field we want to set. To see a list of all available fields do print(Molecule._atom_and_coord_fields).

  • value (string or integer) – All atoms that match the atom selection will have the field field set to this scalar value (or 3-vector if setting the coordinates)

  • sel (str) – Atom selection string for atom which to set. See more here

Examples

>>> mol=tryp.copy()
>>> mol.set('segid', 'P', sel='protein')
setDihedral(atom_quad, radians, bonds=None)

Sets the angle of a dihedral.

Parameters
  • atom_quad (list) – Four atom indexes corresponding to the atoms defining the dihedral

  • radians (float) – The angle in radians to which we want to set the dihedral

  • bonds (np.ndarray) – An array containing all bonds of the molecule. This is needed if multiple modifications are done as the bond guessing can get messed up if atoms come very close after the rotation.

Examples

>>> mol.setDihedral([0, 5, 8, 12], 0.16)
>>> # If we perform multiple modifications, calculate bonds first and pass them as argument to be safe
>>> bonds = mol._getBonds()
>>> mol.setDihedral([0, 5, 8, 12], 0.16, bonds=bonds)
>>> mol.setDihedral([18, 20, 24, 30], -1.8, bonds=bonds)
view(sel=None, style=None, color=None, guessBonds=True, viewer=None, hold=False, name=None, viewerhandle=None, gui=False)

Visualizes the molecule in a molecular viewer

Parameters
  • sel (str) – Atom selection string for the representation. See more here

  • style (str) – Representation style. See more here.

  • color (str or int) – Coloring mode (str) or ColorID (int). See more here.

  • guessBonds (bool) – Allow VMD to guess bonds for the molecule

  • viewer (str ('vmd', 'webgl')) – Choose viewer backend. Default is taken from either moleculekit.config or if it doesn’t exist from moleculekit.config

  • hold (bool) – If set to True, it will not visualize the molecule but instead collect representations until set back to False.

  • name (str, optional) – A name to give to the molecule in VMD

  • viewerhandle (VMD object, optional) – A specific viewer in which to visualize the molecule. If None it will use the current default viewer.

wrap(wrapsel=None, fileBonds=True, guessBonds=False)

Wraps the coordinates of the molecule into the simulation box

Parameters

wrapsel (str) – Atom selection string of atoms on which to center the wrapping box. See more here

Examples

>>> mol=tryp.copy()
>>> mol.wrap()
>>> mol.wrap('protein')
write(filename, sel=None, type=None, **kwargs)

Writes the topology and coordinates of the Molecule in any of the supported formats.

Parameters
  • filename (str) – The filename of the file we want to write to disk

  • sel (str, optional) – Atom selection string of the atoms we want to write. If None, it will write all atoms. See more here

  • type (str, optional) – The filetype we want to write. By default, detected from the file extension

property x

Get the x coordinates at the current frame

property y

Get the y coordinates at the current frame

property z

Get the z coordinates at the current frame

class moleculekit.molecule.Representations(mol)

Bases: object

Class that stores representations for Molecule.

Examples

>>> from moleculekit.molecule import Molecule
>>> mol = tryp.copy()
>>> mol.reps.add('protein', 'NewCartoon')
>>> print(mol.reps)                     
rep 0: sel='protein', style='NewCartoon', color='Name'
>>> mol.view() 
>>> mol.reps.remove() 
add(sel=None, style=None, color=None)

Adds a new representation for Molecule.

Parameters
  • sel (str) – Atom selection string for the representation. See more here

  • style (str) – Representation style. See more here.

  • color (str or int) – Coloring mode (str) or ColorID (int). See more here.

append(reps)
list()

Lists all representations. Equivalent to using print.

remove(index=None)

Removed one or all representations.

Parameters

index (int) – The index of the representation to delete. If none is given it deletes all.

exception moleculekit.molecule.TopologyInconsistencyError(value)

Bases: Exception

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class moleculekit.molecule.UniqueAtomID(**kwargs)

Bases: object

static fromMolecule(mol, sel=None, idx=None)
selectAtom(mol, indexes=True, ignore=None)
class moleculekit.molecule.UniqueResidueID(**kwargs)

Bases: object

static fromMolecule(mol, sel=None, idx=None)
selectAtoms(mol, indexes=True, ignore=None)
moleculekit.molecule.calculateUniqueBonds(bonds, bondtype)

Given bonds and bondtypes calculates unique bonds dropping any duplicates

Parameters
  • bonds (np.ndarray) – The bonds of a molecule

  • bondtype (np.ndarray) – The bond type of each bond in the bonds array

Returns

  • uqbonds (np.ndarray) – The unique bonds of the molecule

  • uqbondtype (np.ndarray) – The corresponding bond types for uqbonds

Examples

>>> from moleculekit.molecule import Molecule
>>> mol = Molecule('3PTB')
>>> mol.bonds, mol.bondtype = calculateUniqueBonds(mol.bonds, mol.bondtype)  # Overwrite the bonds and bondtypes with the unique ones
moleculekit.molecule.mol_equal(mol1, mol2, checkFields=('record', 'serial', 'name', 'altloc', 'resname', 'chain', 'resid', 'insertion', 'occupancy', 'beta', 'segid', 'element', 'charge', 'masses', 'atomtype', 'coords'), exceptFields=None, fieldPrecision=None, _logger=True)

Compare two Molecules for equality.

Parameters
  • mol1 (Molecule) – The first molecule to compare

  • mol2 (Molecule) – The second molecule to compare to the first

  • checkFields (list) – A list of fields to compare. By default compares all atom information and coordinates in the molecule

  • exceptFields (list) – A list of fields to not compare.

  • fieldPrecision (dict) – A dictionary of field, precision key-value pairs which defines the numerical precision of the value comparisons of two arrays

  • _logger (bool) – Set to False to disable the printing of the differences in the two Molecules

Returns

equal – Returns True if the molecules are equal or False if they are not.

Return type

bool

Examples

>>> mol_equal(mol1, mol2, checkFields=['resname', 'resid', 'segid'])
>>> mol_equal(mol1, mol2, exceptFields=['record', 'name'])
>>> mol_equal(mol1, mol2, fieldPrecision={'coords': 1e-5})