htmd.model module#
Markov state models are a statistical tool for analysing molecular simulations which has met with lots of success. The Model class here, encapsulates all functionallity for the calculation of Markov models while hiding unnecessary details under the hood. It uses DeepTime [1] internally to calculate Markov models.
References
- class htmd.model.Model(data=None, file=None)#
Bases:
objectConstructor for the Model class.
- Parameters:
data (
MetricData|None) – AMetricDataobject containing the discretized trajectoriesfile (
str|None) – Path to a previously saved Model file to load. Provide either data or file.
Examples
>>> model = Model(data)
Methods
Attributes
- cktest(lags=None, minlag=None, maxlag=None, numlags=25, units='frames', plot=True, save=None, errors=None)#
Conducts a Chapman-Kolmogorov test.
- Parameters:
lags (
list|range|ndarray|None) – List of lag times to testminlag (
float|None) – Minimum lag time to test. Used if lags is None. By default it will use 10 if maxlag is greater than 20, otherwise it will use 2.maxlag (
float|None) – Maximum lag time to test. Used if lags is None. By default it will use the mode of the trajectory lengths.numlags (
int) – Number of lag times to test between minlag and maxlag. Used if lags is None.units (
str) – Units of the lag times. By default it will use frames.plot (
bool) – If the method should display the plot of the CK testsave (
str|None) – Path of the file in which to save the figureerrors (
int|None) – Number of Bayesian samples to use for the error bars. If set to None, it will not plot error bars and use a Maximum Likelihood MSM.
- property cluster_ofmicro: ndarray#
Mapping of microstates to clusters
Numpy array which at index i has the index of the cluster corresponding to microstate i.
- copy()#
Produces a deep copy of the object
- Returns:
data – A copy of the current object
- Return type:
Examples
>>> model2 = model.copy()
- createCoreSetModel(threshold=0.5)#
Given an MSM this function detects the states belonging to a core set and returns a new model consisting only of these states.
- createState(microstates=None, indexpairs=None)#
Creates a new state. Works both for new clusters and macrostates.
If creating a new cluster, it just reassigns the given frames to the new cluster. If creating a new macrostate, it removes the given microstates from their previous macrostate, creates a new one and assigns them to it.
- eqDistribution(plot=True, save=None)#
Obtain and plot the equilibrium probabilities of each macrostate
- Parameters:
- Returns:
eq – An array of equilibrium probabilities of the macrostates
- Return type:
Examples
>>> model = Model(data) >>> model.markovModel(100, 5) >>> model.eqDistribution()
- getStates(states=None, statetype='macro', wrapsel='protein', alignsel='name CA', alignmol=None, samplemode='weighted', numsamples=50, simlist=None)#
Get samples of MSM states in Molecule classes
- Parameters:
states (
list|int|range|ndarray|None) – A list of states to visualizestatetype (
str) – The type of state to visualize. Can be ‘macro’, ‘micro’ or ‘cluster’.wrapsel (
str|ndarray|None) – Atom selection used for wrapping, as an atom selection string, a boolean mask, or an integer index array (seeMolecule.atomselect). Set to None to disable wrapping. See more herealignsel (
str|ndarray|None) – Atom selection used for aligning all frames, as an atom selection string, a boolean mask, or an integer index array. Set to None to disable aligning. See more herealignmol (
Molecule|None) – A reference molecule onto which to align all otherssamplemode (
str) – How to obtain the samples from the states. Can be ‘weighted’ or ‘random’.numsamples (
int) – Number of samples (conformations) for each state.simlist (
ndarray|None) – Optionally pass a different (but matching, i.e. filtered) simlist for creating the Molecules.
- Returns:
mols – A list of
Moleculeobjects containing the samples of each state- Return type:
Examples
>>> model = Model(data) >>> model.markovModel(100, 5) >>> mols = model.getStates() >>> for m in mols: >>> m.view()
- load(filename)#
Load a
MetricDataobject from disk- Parameters:
filename (
str) – Path to the saved MetricData object
Examples
>>> model = Model() >>> model.load('./model.dat')
- property macro_ofcluster: ndarray#
Mapping of clusters to macrostates
Numpy array which at index i has the index of the macrostate corresponding to cluster i. Clusters which were not connected and thus are not in the model have a macrostate value of -1.
- property macro_ofmicro: ndarray#
Mapping of microstates to macrostates
Numpy array which at index i has the index of the macrostate corresponding to microstate i.
- markovModel(lag, macronum, units='frames', sparse=False)#
Build a Markov model at a given lag time and calculate metastable states
- Parameters:
lag (
float) – The lag time at which to calculate the Markov state model. The units are specified with the units argument.macronum (
int) – The number of macrostates (metastable states) to produceunits (
str) – The units of lag. Can be ‘frames’ or any time unit given as a string.sparse (
bool) – Make the transition matrix sparse. Useful if lots (> 4000) states are used for the MSM. Warning: untested.
Examples
>>> model = Model(data) >>> model.markovModel(150, 4) # 150 frames lag, 4 macrostates
- maxConnectedLag(lags, njobs=1)#
Heuristic for getting the lagtime before a timescale drops.
It calculates the last lagtime before a drop occurs in the first implied timescale due to disconnected states. If the top timescale is closer to the second top timescale at the previous lagtime than to itself at the previous lagtime it means that a drop occured. The lagtime before the drop is returned.
- Parameters:
- Returns:
ml – The maximum lagtime before a drop occurs in the top timescale
- Return type:
Examples
>>> model = Model(data) >>> model.maxConnectedLag(list(range(1, 100, 5)))
- property micro_ofcluster: ndarray#
Mapping of clusters to microstates
Numpy array which at index i has the index of the microstate corresponding to cluster i. Clusters which were not connected and thus are not in the model have a microstate value of -1.
- plotFES(dimX, dimY, temperature, states=False, s=10, cmap=None, fescmap=None, statescmap=None, plot=True, save=None, data=None, levels=7)#
Plots the free energy surface on any given two dimensions. Can also plot positions of states on top.
- Parameters:
dimX (
int) – Index of projected dimension to use for the X axis.dimY (
int) – Index of projected dimension to use for the Y axis.temperature (
float) – Simulation temperature.states (
bool) – If True, will plot scatter plot of microstates coloured by macro state on top of FES.s (
float) – Marker size for states.cmap (
strormatplotlib colormap) – Sets the Matplotlib colormap for both fescmap and statescmapfescmap (
strormatplotlib colormap) – Matplotlib colormap for the free energy surfacestatescmap (
strormatplotlib colormap) – Matplotlib colormap for the statesplot (
bool) – If the method should display the plot of the FES. If both plot=False and save=None, the method will return the figure and axes.save (
str|None) – Path of the file in which to save the figure. If both plot=False and save=None, the method will return the figure and axes.data (
MetricData|None) – Optionally you can pass a different MetricData object than the one used to build the model. For example if the user wants to build a model on distances but wants to plot the FES on top of RMSD values. The MetricData object needs to have the same simlist as the Model.levels (
int) – Number of contour levels to use for the free energy surface.
- Returns:
f (
matplotlib.figure.Figure) – The matplotlib figure objectax (
matplotlib.axes.Axes) – The matplotlib axes object
Examples
>>> import matplotlib as plt >>> model.plotFES(0, 1, 300) >>> model.plotFES(2, 3, 300, data=otherdata, states=True, fescmap=plt.cm.gray)
- plotTimescales(lags=None, minlag=None, maxlag=None, numlags=25, units='frames', errors=None, nits=None, results=False, plot=True, save=None, njobs=1, ylog=True)#
Plot the implied timescales of MSMs of various lag times
- Parameters:
lags (
list|range|ndarray|None) – Specify explicitly at which lag times to compute the timescales.minlag (
float|None) – The minimum lag time for the timescales. Used in combination with maxlag and numlags.maxlag (
float|None) – The maximum lag time for the timescales. If None will default to the mode length of the trajectories.numlags (
int) – The number of points to place between minlag and maxlag.units (
str) – The units of lag. Can be ‘frames’ or any time unit given as a string.errors (
int|None) – Number of Bayesian samples used to calculate errors with Bayesian MSMs. If None, no errors are calculated.nits (
int|None) – Number of implied timescales to calculate. If None, all are calculated.results (
bool) – If the method should return the calculated implied timescalesplot (
bool) – If the method should display the plot of implied timescalessave (
str|None) – Path of the file in which to save the figurenjobs (
int) – DEPRECATED: Number of parallel jobs to spawn for calculation of timescales. Negative numbers are used for spawning jobs as many as CPU threads. -1: for all CPUs -2: for all except one etc.ylog (
bool) – Set to False to get linear y axis instead of logarithmic
- Returns:
If given results=True this method will return the following dataits (
numpy.ndarray) – The calculated implied timescales. 2D array with dimensions (len(lags), nits)lags (
numpy.ndarray) – A list of the lag times that were used to calculate the implied timescales
Examples
>>> model = Model(data) >>> model.plotTimescales() >>> model.plotTimescales(lags=list(range(1,100,5))) >>> model.plotTimescales(minlag=0.1, maxlag=20, numlags=25, units='ns')
- sampleStates(states=None, frames=20, statetype='macro', replacement=False, samplemode='random', allframes=False)#
Samples frames from a set of states
- Parameters:
states (
list|int|range|ndarray|None) – A list of state indexes from which we want to sampleframes (
int|list|ndarray|None) – An integer with the number of frames we want to sample per state or a list of same length as states which contains the number of frames we want from each of the states. If set to None it will return all frames of the states.statetype (
str) – The type of state we want to sample from. Can be ‘micro’, ‘macro’ or ‘cluster’.replacement (
bool) – If we want to sample with or without replacement.samplemode (
str) – What sampling strategy to use. Can be ‘random’, ‘even’ or ‘weighted’. For statetype == ‘macro’ this can be set to ‘even’ to sample evenly from all microstates in the macrostate or to ‘weighted’ to sample proportional to the equilibrium probability of each microstate in the macrostate.allframes (
bool) – Deprecated. Use frames=None instead.
- Returns:
absframes (
numpy.ndarray) – An array which contains for each state an array containing absolute trajectory framesrelframes (
numpy.ndarray) – An array which contains for each state a 2D array containing the trajectory ID and frame number for each of the sampled frames
Examples
>>> model = Model(data) >>> model.markovModel(100, 5) >>> model.sampleStates(range(5), [10, 3, 2, 50, 1]) # Sample from all 5 macrostates >>> model.sampleStates(range(model.micronum), samplesnum, statetype='micro') # Sample from all microstates
- save(filename)#
Save a
Modelobject to disk- Parameters:
filename (
str) – Path of the file in which to save the object
Examples
>>> model = Model(data) >>> model.markovModel(100, 5) >>> model.save('./model.dat')
- viewStates(states=None, statetype='macro', protein=None, ligand=None, mols=None, numsamples=50, wrapsel='protein', alignsel='name CA', gui=False, simlist=None)#
Visualize macro/micro/cluster states in VMD
- Parameters:
states (
list|int|range|ndarray|None) – A list of states to visualizestatetype (
str) – The type of state to visualize. Can be ‘macro’, ‘micro’ or ‘cluster’.protein (
bool|None) – Set to True to enable pure protein system visualizationligand (
str|ndarray|None) – Atom selection for the ligand, as an atom selection string, a boolean mask, or an integer index array (seeMolecule.atomselect). See more heremols (
list|None) – A list ofMoleculeobjects to visualizenumsamples (
int) – Number of samples (conformations) for each state.wrapsel (
str|ndarray|None) – Atom selection used for wrapping, as an atom selection string, a boolean mask, or an integer index array (seeMolecule.atomselect). Set to None to disable wrapping. See more herealignsel (
str|ndarray|None) – Atom selection used for aligning all frames, as an atom selection string, a boolean mask, or an integer index array. Set to None to disable aligning. See more heregui (
bool) – Set to True to enable the GUI in the NGL viewer.simlist (
ndarray|None) – Optionally pass a different (but matching, i.e. filtered) simlist for visualizing the states.
Examples
>>> model = Model(data) >>> model.markovModel(100, 5) >>> model.viewStates(protein=True)
>>> model.viewStates(ligand='resname MOL')
- htmd.model.getStateStatistic(reference, data, states, statetype='macro', weighted=False, method=<function mean>, axis=0)#
Calculates properties of the states.
Calculates properties of data corresponding to states. Can calculate for example the mean distances of atoms in a state, or the standard deviation of the RMSD in a state.
- Parameters:
reference (
Model|MetricData) – A model containing the state definitions or a MetricData object containing cluster definitionsdata (
MetricData) – A projection corresponding to the conformations in the states of the model. The data and the model need to share the same simliststates (
list|range|ndarray) – A list of the states for which we want to calculate the propertiesstatetype (
str) – The state type. Can be ‘macro’, ‘micro’ or ‘cluster’.weighted (
bool) – If the properties of the macrostates should be calculated as the weighted average of their microstates, where the weights are the equilibrium probabilities of the microstatesmethod – A function pointer for the method that calculates our desired property. e.g. np.mean, np.std
axis (
int|None) – On which axis of the data the method should operate on
- Returns:
statistic – A list which contains in each element the desired statistic of a specific state specifies in states
- Return type:
Examples
>>> data = MetricDistance.project(sims, 'protein and name CA', 'resname MOL') >>> model = Model(data) >>> model.markovModel(100, 5) >>> # Get the standard deviation of distances in all macrostates >>> getStateStatistic(model, data, list(range(5)), method=np.std)
- htmd.model.macroAccumulate(model, microvalue)#
Accumulate values of macrostates from a microstate array