htmd.metricdata module

class htmd.metricdata.MetricData(dat=None, ref=None, description=None, simlist=None, fstep=0, parent=None, file=None, trajectories=None, cluster=None)

Bases: object

Class used for storing projected trajectories, their clustering and state assignments. Objects of this class are constructed by the project methods of the other projection classes. Only construct this class if you want to load saved data.

dat

The projected metrics

Type

numpy.ndarray

ref

Reference indices to the simulations and frames that generated the metrics

Type

numpy.ndarray

simlist

A simulation list generated by the simlist function

Type

numpy.ndarray of Sim objects

fstep

Size of simulation step in ns

Type

float

map

Contains the mapping from columns in dat to atom indices

Type

numpy.ndarray

parent

The MetricData object that was used to generate this object

Type

MetricData object

St

Assignment of simulation frames to clusters

Type

numpy.ndarray

K

Number of clusters

Type

int

N

Populations of clusters

Type

numpy.ndarray

Centers

Centers of clusters

Type

numpy.ndarray

property St
abs2rel(absFrames)

Convert absolute frame indexes into trajectory index-frame pairs

Useful when doing calculations on a concatenated data array of all trajectories. When you find a frame of interest you can deconcatenate the frame index to the corresponding trajectory index-frame pair.

Parameters

absFrames (list of int) – A list of absolute index frames

Returns

pairs – A array where each row is a trajectory index-frame pair

Return type

np.ndarray

Examples

>>> relidx = data.abs2rel(536)
abs2sim(absFrames)

Converts absolute frame indexes into Sim-frame pairs

Parameters

absFrames (list of int) – A list of absolute index frames

Returns

frames – An array of Frame objects containing the simulation object, the trajectory piece ID and the frame index.

Return type

np.ndarray

Examples

>>> simframes = data.abs2sim(563)  # 563rd frame to simulation/frame pairs
property aggregateTime

The total aggregate simulation time

Examples

>>> data.aggTime
append(other)
bootstrap(ratio, replacement=False)

Randomly sample a set of trajectories

Parameters
  • ratio (float) – What ratio of trajectories to keep. e.g. 0.8

  • replacement (bool) – If we should sample with replacement

Returns

bootdata – A new MetricData object containing only the sampled trajectories

Return type

MetricData object

Examples

>>> data = MetricSelfDistance.project(sims, 'protein and name CA')
>>> databoot = data.bootstrap(0.8)
cluster(clusterobj, mergesmall=None, batchsize=False)

Cluster the metrics

Parameters
  • clusterobj (ClusterMixin object) – The object of a clustering class from sklearn or with the same interface

  • mergesmall (int) – Clusters containing less than mergesmall conformations will be joined into their closest well-populated neighbour.

  • batchsize (int) – Batch sizes bigger than 0 will enable batching.

Examples

>>> from sklearn.cluster import MiniBatchKMeans
>>> data = MetricDistance.project(sims, 'protein and name CA', 'resname MOL')
>>> data.cluster(MiniBatchKMeans(n_clusters=1000), mergesmall=5)
combine(otherdata)

Combines two different metrics into one by concatenating them.

Parameters

otherdata (MetricData object) – Concatenates the metrics of otherdata to the current objects metrics

Examples

>>> dataRMSD = MetricRmsd.project(sims)
>>> dataDist = MetricSelfDistance.project(sims, 'protein and name CA')
>>> dataRMSD.combine(dataDist)
copy()

Produces a deep copy of the object

Returns

data – A copy of the current object

Return type

MetricData object

Examples

>>> data = MetricSelfDistance.project(sims, 'protein and name CA')
>>> data2 = data.copy()
property dat
deconcatenate(array)
dropDimensions(drop=None, keep=None)

Drop some dimensions of the data given their indexes

Parameters
  • drop (list) – A list of integer indexes of the dimensions to drop

  • keep (list) – A list of integer indexes of the dimensions to keep

Examples

>>> data.dropDimensions([1, 24, 3])
>>> data.dropDimensions(keep=[2, 10])
dropFrames(idx, frames)
dropTraj(limits=None, multiple=None, partial=None, idx=None, keepsims=None)

Drops trajectories based on their lengths

By default, drops all trajectories which are not of statistical mode (most common) length.

Parameters
  • limits (list, optional) – Lower and upper limits of trajectory lengths we want to keep. e.g. [100, 500]

  • multiple (list, optional) – Drops trajectories whose length is not a multiple of lengths in the list. e.g. [50, 80]

  • partial (bool) – Not implemented yet

  • idx (list, optional) – A list of trajectory indexes to drop

  • keepsims (list of Sim objects) – A list of sims which we want to keep

Examples

>>> data = MetricSelfDistance.project(sims, 'protein and name CA')
>>> data.dropTraj()
>>> data.dropTraj(multiple=[100])
load(filename)

Load a MetricData object from disk

Parameters

filename (str) – Path to the saved MetricData object

Examples

>>> data = MetricData()
>>> data.load('./data.dat')
property map
property numDimensions

The number of dimensions

Examples

>>> data.numDimensions
property numFrames

Get the total number of frames in all trajectories

Returns

nframes – Total number of frames in all trajectories

Return type

int

Examples

>>> data.numFrames
property numTrajectories

The number of trajectories

Examples

>>> data.numTrajectories
plotClusters(dimX, dimY, resolution=100, s=4, c=None, cmap=None, logplot=False, plot=True, save=None, data=None)

Plot a scatter-plot of the locations of the clusters on top of the count histogram.

Parameters
  • dimX (int) – Index of projected dimension to use for the X axis.

  • dimY (int) – Index of projected dimension to use for the Y axis.

  • resolution (int) – Resolution of bincount grid.

  • s (float) – Marker size for clusters.

  • c (list) – Colors or indexes for each cluster.

  • cmap (matplotlib.colors.Colormap) – Matplotlib colormap for the scatter plot.

  • logplot (bool) – Set True to plot the logarithm of counts.

  • plot (bool) – If the method should display the plot

  • save (str) – Path of the file in which to save the figure

  • data (MetricData object) – Optionally you can pass a different MetricData object than the one used for clustering. For example if the user wants to cluster on distances but wants to plot the centers on top of RMSD values. The new MetricData object needs to have the same simlist as this object.

plotCounts(dimX, dimY, resolution=100, logplot=False, plot=True, save=None)

Plots a histogram of counts on any two given dimensions.

Parameters
  • dimX (int) – Index of projected dimension to use for the X axis.

  • dimY (int) – Index of projected dimension to use for the Y axis.

  • resolution (int) – Resolution of bincount grid.

  • logplot (bool) – Set True to plot the logarithm of counts.

  • plot (bool) – If the method should display the plot

  • save (str) – Path of the file in which to save the figure

plotTrajSizes()

Plot the lengths of all trajectories in a sorted bar plot

Examples

>>> data = MetricSelfDistance.project(sims, 'protein and name CA')
>>> data.plotTrajSizes()
property ref
rel2sim(relFrames, simlist=None)

Converts trajectory index-frame pairs into Sim-frame pairs

Parameters
  • relFrames (2D np.ndarray) – An array containing in each row trajectory index and frame pairs

  • simlist (numpy.ndarray of Sim objects) – Optionally pass a different (but matching, i.e. filtered) simlist for creating the Frames.

Returns

frames – An array of Frame objects containing the simulation object, the trajectory piece ID and the frame index.

Return type

np.ndarray

Examples

>>> simframes = data.rel2sim([100, 56])  # 100th simulation frame 56
sampleClusters(clusters=None, frames=20, replacement=False, allframes=False)

Samples frames from a set of clusters

Parameters
  • clusters (Union[None, list]) – A list of cluster indexes from which we want to sample

  • frames (Union[None, int, list]) – An integer with the number of frames we want to sample from each state or a list of same length as clusters which contains the number of frames we want from each of the clusters. If None is given it will return all frames.

  • replacement (bool) – If we want to sample with or without replacement.

  • allframes (bool) – Deprecated. Use frames=None instead.

Returns

  • absframes (numpy.ndarray) – An array which contains for each state an array containing absolute trajectory frames

  • relframes (numpy.ndarray) – An array which contains for each state a 2D array containing the trajectory ID and frame number for each of the sampled frames

Examples

>>> data.sampleClusters(range(5), [10, 3, 2, 50, 1])  # Sample from first 5 clusters, 10, 3, etc frames respectively
sampleRegion(point=None, radius=None, limits=None, nsamples=20, singlemol=False)

Samples conformations from a region in the projected space.

Parameters
  • point (list or np.ndarray) – A point in the projected space. Undefined dimensions should have None value.

  • radius (float) – The radius in around the point in which to sample conformations.

  • limits (np.ndarray) – A (2, ndim) dimensional array containing the min (1st row) and max (2nd row) limits for each dimension. None values will be interpreted as no limit in that dimension, or min/max value.

  • nsamples (int) – The number of conformations to sample.

  • singlemol (bool) – If True it will return all samples within a single Molecule instead of a list of Molecules.

Returns

  • absFrames (list) – A list of the absolute frame indexes sampled

  • relFrames (list of tuples) – A list of (trajNum, frameNum) tuples sampled

  • mols (Molecule or list of Molecules) – The conformations stored in a Molecule or a list of Molecules

Examples

>>> # Working with 4 dimensional data for example
>>> abs, rel, mols = data.sampleRegion(point=(0.5, 3, None, None), radius=0.1)  # Point undefined in dim 3, 4
>>> minlims = [-1, None, None, 4]  # No min limit for 2, 3 dim
>>> maxlims = [2,     3, None, 7]  # No max limit for 3 dim
>>> abs, rel, mols = data.sampleRegion(limits=np.array([minlims, maxlims]))
save(filename)

Save a MetricData object to disk

Parameters

filename (str) – Path of the file in which to save the object

Examples

>>> data = MetricSelfDistance.project(sims, 'protein and name CA')
>>> data.save('./data.dat')
property simlist
splitCols()
property trajLengths

Get the lengths of all trajectories

Returns

lens – The lengths of all trajectories in the object

Return type

list

Examples

>>> data.trajLengths
class htmd.metricdata.Trajectory(projection=None, reference=None, sim=None, cluster=None)

Bases: object

property cluster
copy()
dropFrames(frames)
property numDimensions
property numFrames
property projection
property reference