htmd.metricdata module#
- class htmd.metricdata.MetricData(dat=None, ref=None, description=None, simlist=None, fstep=0, parent=None, file=None, trajectories=None, cluster=None)#
Bases:
object
Class used for storing projected trajectories, their clustering and state assignments. Objects of this class are constructed by the project methods of the other projection classes. Only construct this class if you want to load saved data.
- dat#
The projected metrics
- Type
numpy.ndarray
- ref#
Reference indices to the simulations and frames that generated the metrics
- Type
numpy.ndarray
- map#
Contains the mapping from columns in dat to atom indices
- Type
numpy.ndarray
- parent#
The MetricData object that was used to generate this object
- Type
MetricData
object
- St#
Assignment of simulation frames to clusters
- Type
numpy.ndarray
- N#
Populations of clusters
- Type
numpy.ndarray
- Centers#
Centers of clusters
- Type
numpy.ndarray
- property St#
- abs2rel(absFrames)#
Convert absolute frame indexes into trajectory index-frame pairs
Useful when doing calculations on a concatenated data array of all trajectories. When you find a frame of interest you can deconcatenate the frame index to the corresponding trajectory index-frame pair.
- Parameters
absFrames (list of int) – A list of absolute index frames
- Returns
pairs – A array where each row is a trajectory index-frame pair
- Return type
np.ndarray
Examples
>>> relidx = data.abs2rel(536)
- abs2sim(absFrames)#
Converts absolute frame indexes into Sim-frame pairs
- Parameters
absFrames (list of int) – A list of absolute index frames
- Returns
frames – An array of
Frame
objects containing the simulation object, the trajectory piece ID and the frame index.- Return type
np.ndarray
Examples
>>> simframes = data.abs2sim(563) # 563rd frame to simulation/frame pairs
- property aggregateTime#
The total aggregate simulation time
Examples
>>> data.aggTime
- append(other)#
- bootstrap(ratio, replacement=False)#
Randomly sample a set of trajectories
- Parameters
- Returns
bootdata – A new
MetricData
object containing only the sampled trajectories- Return type
MetricData
object
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> databoot = data.bootstrap(0.8)
- cluster(clusterobj, mergesmall=None, batchsize=False)#
Cluster the metrics
- Parameters
clusterobj (
ClusterMixin
object) – The object of a clustering class from sklearn or with the same interfacemergesmall (int) – Clusters containing less than mergesmall conformations will be joined into their closest well-populated neighbour.
batchsize (int) – Batch sizes bigger than 0 will enable batching.
Examples
>>> from sklearn.cluster import MiniBatchKMeans >>> data = MetricDistance.project(sims, 'protein and name CA', 'resname MOL') >>> data.cluster(MiniBatchKMeans(n_clusters=1000), mergesmall=5)
- combine(otherdata)#
Combines two different metrics into one by concatenating them.
- Parameters
otherdata (
MetricData
object) – Concatenates the metrics of otherdata to the current objects metrics
Examples
>>> dataRMSD = MetricRmsd.project(sims) >>> dataDist = MetricSelfDistance.project(sims, 'protein and name CA') >>> dataRMSD.combine(dataDist)
- copy()#
Produces a deep copy of the object
- Returns
data – A copy of the current object
- Return type
MetricData
object
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> data2 = data.copy()
- property dat#
- deconcatenate(array)#
- dropDimensions(drop=None, keep=None)#
Drop some dimensions of the data given their indexes
- Parameters
Examples
>>> data.dropDimensions([1, 24, 3]) >>> data.dropDimensions(keep=[2, 10])
- dropFrames(idx, frames)#
- dropTraj(limits=None, multiple=None, partial=None, idx=None, keepsims=None)#
Drops trajectories based on their lengths
By default, drops all trajectories which are not of statistical mode (most common) length.
- Parameters
limits (list, optional) – Lower and upper limits of trajectory lengths we want to keep. e.g. [100, 500]
multiple (list, optional) – Drops trajectories whose length is not a multiple of lengths in the list. e.g. [50, 80]
partial (bool) – Not implemented yet
idx (list, optional) – A list of trajectory indexes to drop
keepsims (list of
Sim
objects) – A list of sims which we want to keep
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> data.dropTraj() >>> data.dropTraj(multiple=[100])
- static fromHDF5(filename=None, h5group=None)#
- load(filename)#
Load a
MetricData
object from disk- Parameters
filename (str) – Path to the saved MetricData object
Examples
>>> data = MetricData() >>> data.load('./data.dat')
- property map#
- property numDimensions#
The number of dimensions
Examples
>>> data.numDimensions
- property numFrames#
Get the total number of frames in all trajectories
- Returns
nframes – Total number of frames in all trajectories
- Return type
Examples
>>> data.numFrames
- property numTrajectories#
The number of trajectories
Examples
>>> data.numTrajectories
- plotClusters(dimX, dimY, resolution=100, s=4, c=None, cmap='Greys', logplot=False, plot=True, save=None, data=None, levels=7)#
Plot a scatter-plot of the locations of the clusters on top of the count histogram.
- Parameters
dimX (int) – Index of projected dimension to use for the X axis.
dimY (int) – Index of projected dimension to use for the Y axis.
resolution (int) – Resolution of bincount grid.
s (float) – Marker size for clusters.
c (list) – Colors or indexes for each cluster.
cmap (matplotlib.colors.Colormap) – Matplotlib colormap for the scatter plot.
logplot (bool) – Set True to plot the logarithm of counts.
plot (bool) – If the method should display the plot
save (str) – Path of the file in which to save the figure
data (
MetricData
object) – Optionally you can pass a different MetricData object than the one used for clustering. For example if the user wants to cluster on distances but wants to plot the centers on top of RMSD values. The new MetricData object needs to have the same simlist as this object.
- plotCounts(dimX, dimY, resolution=100, logplot=False, plot=True, save=None, levels=7, cmap='viridis')#
Plots a histogram of counts on any two given dimensions.
- Parameters
dimX (int) – Index of projected dimension to use for the X axis.
dimY (int) – Index of projected dimension to use for the Y axis.
resolution (int) – Resolution of bincount grid.
logplot (bool) – Set True to plot the logarithm of counts.
plot (bool) – If the method should display the plot
save (str) – Path of the file in which to save the figure
- plotTrajSizes()#
Plot the lengths of all trajectories in a sorted bar plot
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> data.plotTrajSizes()
- property ref#
- rel2sim(relFrames, simlist=None)#
Converts trajectory index-frame pairs into Sim-frame pairs
- Parameters
relFrames (2D np.ndarray) – An array containing in each row trajectory index and frame pairs
simlist (numpy.ndarray of
Sim
objects) – Optionally pass a different (but matching, i.e. filtered) simlist for creating the Frames.
- Returns
frames – An array of
Frame
objects containing the simulation object, the trajectory piece ID and the frame index.- Return type
np.ndarray
Examples
>>> simframes = data.rel2sim([100, 56]) # 100th simulation frame 56
- sampleClusters(clusters=None, frames=20, replacement=False, allframes=False)#
Samples frames from a set of clusters
- Parameters
clusters (Union[None, list]) – A list of cluster indexes from which we want to sample
frames (Union[None, int, list]) – An integer with the number of frames we want to sample from each state or a list of same length as clusters which contains the number of frames we want from each of the clusters. If None is given it will return all frames.
replacement (bool) – If we want to sample with or without replacement.
allframes (bool) – Deprecated. Use frames=None instead.
- Returns
absframes (numpy.ndarray) – An array which contains for each state an array containing absolute trajectory frames
relframes (numpy.ndarray) – An array which contains for each state a 2D array containing the trajectory ID and frame number for each of the sampled frames
Examples
>>> data.sampleClusters(range(5), [10, 3, 2, 50, 1]) # Sample from first 5 clusters, 10, 3, etc frames respectively
- sampleRegion(point=None, radius=None, limits=None, nsamples=20, singlemol=False)#
Samples conformations from a region in the projected space.
- Parameters
point (list or np.ndarray) – A point in the projected space. Undefined dimensions should have None value.
radius (float) – The radius in around the point in which to sample conformations.
limits (np.ndarray) – A (2, ndim) dimensional array containing the min (1st row) and max (2nd row) limits for each dimension. None values will be interpreted as no limit in that dimension, or min/max value.
nsamples (int) – The number of conformations to sample.
singlemol (bool) – If True it will return all samples within a single Molecule instead of a list of Molecules.
- Returns
absFrames (list) – A list of the absolute frame indexes sampled
relFrames (list of tuples) – A list of (trajNum, frameNum) tuples sampled
mols (Molecule or list of Molecules) – The conformations stored in a Molecule or a list of Molecules
Examples
>>> # Working with 4 dimensional data for example >>> abs, rel, mols = data.sampleRegion(point=(0.5, 3, None, None), radius=0.1) # Point undefined in dim 3, 4 >>> minlims = [-1, None, None, 4] # No min limit for 2, 3 dim >>> maxlims = [2, 3, None, 7] # No max limit for 3 dim >>> abs, rel, mols = data.sampleRegion(limits=np.array([minlims, maxlims]))
- save(filename)#
Save a
MetricData
object to disk- Parameters
filename (str) – Path of the file in which to save the object
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> data.save('./data.dat')
- property simlist#
- splitCols()#
- toHDF5(filename=None, h5group=None)#