# htmd.model module¶

Markov state models are a statistical tool for analysing molecular simulations which has met with lots of success. The Model class here, encapsulates all functionallity for the calculation of Markov models while hiding unnecessary details under the hood. It uses PyEMMA [1] internally to calculate Markov models.

References

1

PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. Martin K. Scherer et al. JCTC 2015.

class htmd.model.Model(data=None, file=None)

Bases: object

Constructor for the Model class.

Parameters

data (MetricData object) – A MetricData object containing the discretized trajectories

Example

>>> model = Model(data)


Methods

Attributes

property P

The transition probability matrix

cktest()

Conducts a Chapman-Kolmogorow test.

property cluster_ofmicro

Mapping of microstates to clusters

Numpy array which at index i has the index of the cluster corresponding to microstate i.

copy()

Produces a deep copy of the object

Returns

data – A copy of the current object

Return type

Model object

Examples

>>> model2 = model.copy()

createCoreSetModel(threshold=0.5)

Given an MSM this function detects the states belonging to a core set and returns a new model consisting only of these states.

Parameters

threshold (float) – Membership probability threshold over which microstates belong to the core of a macrostate

Returns

• newmodel – A new model object

• frames (list) – A list of the frames that were kept in the new model

createState(microstates=None, indexpairs=None)

Creates a new state. Works both for new clusters and macrostates.

If creating a new cluster, it just reassigns the given frames to the new cluster. If creating a new macrostate, it removes the given microstates from their previous macrostate, creates a new one and assigns them to it.

Parameters
• microstates (list) – The microstates to split out into a new macrostate.

• indexpairs (list) – List of lists. Each row is a simulation index-frame pair which should be added to a new cluster.

eqDistribution(plot=True, save=None)

Obtain and plot the equilibrium probabilities of each macrostate

Parameters
• plot (bool, optional, default=True) – Disable plotting of the equilibrium distribution by setting it to False

• save (str) – Path of the file in which to save the figure

Returns

eq – An array of equilibrium probabilities of the macrostates

Return type

ndarray

Examples

>>> model = Model(data)
>>> model.markovModel(100, 5)
>>> model.eqDistribution()

getStates(states=None, statetype='macro', wrapsel='protein', alignsel='name CA', alignmol=None, samplemode='weighted', numsamples=50, simlist=None)

Get samples of MSM states in Molecule classes

Parameters
• states (ndarray, optional) – A list of states to visualize

• statetype (['macro','micro','cluster'], optional) – The type of state to visualize

• wrapsel (str, optional, default='protein') – Atom selection string to use for wrapping. See more here

• alignsel (str, optional, default='name CA') – Atom selection string used for aligning all frames. Set to None to disable aligning. See more here

• alignmol (Molecule object) – A reference molecule onto which to align all others

• samplemode (['weighted','random'], optional, default='weighted') – How to obtain the samples from the states

• numsamples (int) – Number of samples (conformations) for each state.

• simlist (numpy.ndarray of Sim objects) – Optionally pass a different (but matching, i.e. filtered) simlist for creating the Molecules.

Returns

mols – A list of Molecule objects containing the samples of each state

Return type

ndarray of Molecule objects

Examples

>>> model = Model(data)
>>> model.markovModel(100, 5)
>>> mols = model.getStates()
>>> for m in mols:
>>>     m.view()

load(filename)

Load a MetricData object from disk

Parameters

filename (str) – Path to the saved MetricData object

Examples

>>> model = Model()

property macro_ofcluster

Mapping of clusters to microstates

Numpy array which at index i has the index of the macrostate corresponding to cluster i. Clusters which were not connected and thus are not in the model have a macrostate value of -1.

property macro_ofmicro

Mapping of microstates to macrostates

Numpy array which at index i has the index of the macrostate corresponding to microstate i.

property macronum

Number of macrostates

markovModel(lag, macronum, units='frames', sparse=False, hmm=False)

Build a Markov model at a given lag time and calculate metastable states

Parameters
• lag (int) – The lag time at which to calculate the Markov state model. The units are specified with the units argument.

• macronum (int) – The number of macrostates (metastable states) to produce

• units (str) – The units of lag. Can be ‘frames’ or any time unit given as a string.

• sparse (bool) – Make the transition matrix sparse. Useful if lots (> 4000) states are used for the MSM. Warning: untested.

Examples

>>> model = Model(data)
>>> model.markovModel(150, 4)  # 150 frames lag, 4 macrostates

maxConnectedLag(lags)

Heuristic for getting the lagtime before a timescale drops.

It calculates the last lagtime before a drop occurs in the first implied timescale due to disconnected states. If the top timescale is closer to the second top timescale at the previous lagtime than to itself at the previous lagtime it means that a drop occured. The lagtime before the drop is returned.

Parameters

lags (np.ndarray or list) – A list of lag times for which to calculate the implied timescales

Returns

ml – The maximum lagtime before a drop occurs in the top timescale

Return type

int

Examples

>>> model = Model(data)
>>> model.maxConnectedLag(list(range(1, 100, 5)))

property micro_ofcluster

Mapping of clusters to microstates

Numpy array which at index i has the index of the microstate corresponding to cluster i. Clusters which were not connected and thus are not in the model have a microstate value of -1.

property micronum

Number of microstates

plotFES(dimX, dimY, temperature, states=False, s=10, cmap=None, fescmap=None, statescmap=None, plot=True, save=None, data=None)

Plots the free energy surface on any given two dimensions. Can also plot positions of states on top.

Parameters
• dimX (int) – Index of projected dimension to use for the X axis.

• dimY (int) – Index of projected dimension to use for the Y axis.

• temperature (float) – Simulation temperature.

• states (bool) – If True, will plot scatter plot of microstates coloured by macro state on top of FES.

• s (float) – Marker size for states.

• cmap – Sets the Matplotlib colormap for both fescmap and statescmap

• fescmap – Matplotlib colormap for the free energy surface

• statescmap – Matplotlib colormap for the states

• plot (bool) – If the method should display the plot of the FES

• save (str) – Path of the file in which to save the figure

• data (MetricData object) – Optionally you can pass a different MetricData object than the one used to build the model. For example if the user wants to build a model on distances but wants to plot the FES on top of RMSD values. The MetricData object needs to have the same simlist as the Model.

Examples

>>> import matplotlib as plt
>>> model.plotFES(0, 1, 300)
>>> model.plotFES(2, 3, 300, data=otherdata, states=True, fescmap=plt.cm.gray)

plotTimescales(lags=None, minlag=None, maxlag=None, numlags=25, units='frames', errors=None, nits=None, results=False, plot=True, save=None, njobs=-2)

Plot the implied timescales of MSMs of various lag times

Parameters
• lags (list) – Specify explicitly at which lag times to compute the timescales.

• minlag (float) – The minimum lag time for the timescales. Used in combination with maxlag and numlags.

• maxlag (float) – The maximum lag time for the timescales. If None will default to the mode length of the trajectories.

• numlags (int) – The number of points to place between minlag and maxlag.

• units (str) – The units of lag. Can be ‘frames’ or any time unit given as a string.

• errors (errors) – Calculate errors using Bayes (Refer to pyEMMA documentation)

• nits (int) – Number of implied timescales to calculate. Default: all

• results (bool) – If the method should return the calculated implied timescales

• plot (bool) – If the method should display the plot of implied timescales

• save (str) – Path of the file in which to save the figure

• njobs (int) – Number of parallel jobs to spawn for calculation of timescales. Negative numbers are used for spawning jobs as many as CPU threads. -1: for all CPUs -2: for all except one etc.

Returns

• If given results=True this method will return the following data

• its (np.ndarray) – The calculated implied timescales. 2D array with dimensions (len(lags), nits)

• lags (np.ndarray) – A list of the lag times that were used to calculate the implied timescales

Examples

>>> model = Model(data)
>>> model.plotTimescales()
>>> model.plotTimescales(lags=list(range(1,100,5)))
>>> model.plotTimescales(minlag=0.1, maxlag=20, numlags=25, units='ns')

sampleStates(states=None, frames=20, statetype='macro', replacement=False, samplemode='random', allframes=False)

Samples frames from a set of states

Parameters
• states (Union[list, int]) – A list of state indexes from which we want to sample

• frames (Union[None, int, list]) – An integer with the number of frames we want to sample per state or a list of same length as states which contains the number of frames we want from each of the states. If set to None it will return all frames of the states.

• statetype (['micro','macro','cluster'], optional) – The type of state we want to sample from.

• replacement (bool) – If we want to sample with or without replacement.

• samplemode (['random','even','weighted']) – What sampling strategy to use. For statetype == ‘macro’ this can be set to ‘even’ to sample evenly from all microstates in the macrostate or to ‘weighted’ to sample proportional to the equilibium probability of each microstate in the macrostate.

• allframes (bool) – Deprecated. Use frames=None instead.

Returns

• absframes (numpy.ndarray) – An array which contains for each state an array containing absolute trajectory frames

• relframes (numpy.ndarray) – An array which contains for each state a 2D array containing the trajectory ID and frame number for each of the sampled frames

Examples

>>> model = Model(data)
>>> model.markovModel(100, 5)
>>> model.sampleStates(range(5), [10, 3, 2, 50, 1])  # Sample from all 5 macrostates
>>> model.sampleStates(range(model.micronum), samplesnum, statetype='micro')  # Sample from all microstates

save(filename)

Save a Model object to disk

Parameters

filename (str) – Path of the file in which to save the object

Examples

>>> model = Model(data)
>>> model.markovModel(100, 5)
>>> model.save('./model.dat')

viewStates(states=None, statetype='macro', protein=None, ligand=None, viewer=None, mols=None, numsamples=50, wrapsel='protein', alignsel='name CA', gui=False, simlist=None)

Visualize macro/micro/cluster states in VMD

Parameters
• states (ndarray, optional) – A list of states to visualize

• statetype (['macro','micro','cluster'], optional) – The type of state to visualize

• protein (bool, optional) – Set to True to enable pure protein system visualization

• ligand (str, optional) – Atom selection string for the ligand. See more here

• viewer (VMD object, optional) – A viewer in which to visualize the states

• mols (ndarray, optional) – An array of Molecule objects to visualize

• numsamples (int) – Number of samples (conformations) for each state.

• wrapsel (str, optional, default='protein') – Atom selection string to use for wrapping. See more here

• alignsel (str, optional, default='name CA') – Atom selection string used for aligning all frames. See to None to disable aligning. See more here

• simlist (numpy.ndarray of Sim objects) – Optionally pass a different (but matching, i.e. filtered) simlist for visualizing the states.

Examples

>>> model = Model(data)
>>> model.markovModel(100, 5)
>>> model.viewStates(protein=True)

>>> model.viewStates(ligand='resname MOL')

htmd.model.getStateStatistic(reference, data, states, statetype='macro', weighted=False, method=<function mean>, axis=0, existing=False, target=None)

Calculates properties of the states.

Calculates properties of data corresponding to states. Can calculate for example the mean distances of atoms in a state, or the standard deviation of the RMSD in a state.

Parameters
• reference (Model object or MetricData object) – A model containing the state definitions or a MetricData object containing cluster definitions

• data (MetricData object) – A projection corresponding to the conformations in the states of the model. The data and the model need to share the same simlist

• states (list) – A list of the states for which we want to calculate the properties

• statetype (['macro','micro','cluster']) – The state type

• weighted (bool) – If the properties of the macrostates should be calculated as the weighted average of their microstates, where the weights are the equilibrium probabilities of the microstates

• method – A function pointer for the method that calculates our desired property. e.g. np.mean, np.std

• axis (int) – On which axis of the data the method should operate on

• existing (bool) – Not used

• target – Not used

Returns

statistic – A list which contains in each element the desired statistic of a specific state specifies in states

Return type

list

Examples

>>> data = MetricDistance.project(sims, 'protein and name CA', 'resname MOL')
>>> model = Model(data)
>>> model.markovModel(100, 5)
>>> # Get the standard deviation of distances in all macrostates
>>> getStateStatistic(model, data, list(range(5)), method=np.std)

htmd.model.macroAccumulate(model, microvalue)

Accumulate values of macrostates from a microstate array

Parameters
• model (Model object) – The model which to use to accumulate

• microvalue (ndarray) – An array of values corresponding to the microstates of the model

Returns

macrovalue – An array of values corresponding to the macrostates of the model

Return type

ndarray