htmd.metricdata module#
- class htmd.metricdata.MetricData(dat=None, ref=None, description=None, simlist=None, fstep=0, parent=None, file=None, trajectories=None, cluster=None)#
Bases:
objectClass used for storing projected trajectories, their clustering and state assignments. Objects of this class are constructed by the project methods of the other projection classes. Only construct this class if you want to load saved data.
- Parameters:
ref (
list|ndarray|None) – Reference indices to the simulations and frames that generated the metricsdescription (
DataFrame|None) – A description of the metricssimlist (
ndarray|None) – A simulation list generated by thesimlistfunctionfstep (
float) – Size of simulation step in nsparent (
MetricData|None) – The MetricData object that was used to generate this objectcluster (
list|ndarray|None) – A list of cluster indexes for each frame in the data
- trajectories#
A list of Trajectory objects
- Type:
listofTrajectoryobjects
- description#
A description of the metrics
- Type:
pandas.DataFrame
- N#
Number of frames in each cluster
- Type:
- Centers#
Cluster centers (if available)
- Type:
- property St#
List of the cluster assignments (state trajectory) of each trajectory.
- abs2rel(absFrames)#
Convert absolute frame indexes into trajectory index-frame pairs
Useful when doing calculations on a concatenated data array of all trajectories. When you find a frame of interest you can deconcatenate the frame index to the corresponding trajectory index-frame pair.
- Parameters:
absFrames (
int|list|ndarray) – An absolute frame index or a list of absolute frame indexes- Returns:
pairs – A array where each row is a trajectory index-frame pair
- Return type:
Examples
>>> relidx = data.abs2rel(536)
- abs2sim(absFrames)#
Converts absolute frame indexes into Sim-frame pairs
- Parameters:
absFrames (
int|list|ndarray) – An absolute frame index or a list of absolute frame indexes- Returns:
frames – An array of
Frameobjects containing the simulation object, the trajectory piece ID and the frame index.- Return type:
Examples
>>> simframes = data.abs2sim(563) # 563rd frame to simulation/frame pairs
- property aggregateTime#
The total aggregate simulation time
Examples
>>> data.aggTime
- append(other)#
Append the trajectories of another
MetricDataobject to this oneThe simulation ids of all trajectories are renumbered after appending, and any existing clustering is reset.
- Parameters:
other (
MetricData) – The MetricData object whose trajectories will be appended to this one- Returns:
data – This object, with the trajectories of other appended
- Return type:
- bootstrap(ratio, replacement=False)#
Randomly sample a set of trajectories
- Parameters:
- Returns:
bootdata – A new
MetricDataobject containing only the sampled trajectories- Return type:
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> databoot = data.bootstrap(0.8)
- cluster(clusterobj, mergesmall=None, batchsize=0)#
Cluster the metrics
- Parameters:
clusterobj (
ClusterMixinobject) – The object of a clustering class from sklearn or with the same interfacemergesmall (
int|None) – Clusters containing less than mergesmall conformations will be joined into their closest well-populated neighbour.batchsize (
int) – Batch sizes bigger than 0 will enable batching.
Examples
>>> from sklearn.cluster import MiniBatchKMeans >>> data = MetricDistance.project(sims, 'protein and name CA', 'resname MOL') >>> data.cluster(MiniBatchKMeans(n_clusters=1000), mergesmall=5)
- combine(otherdata)#
Combines two different metrics into one by concatenating them.
- Parameters:
otherdata (
MetricData) – Concatenates the metrics of otherdata to the current objects metrics
Examples
>>> dataRMSD = MetricRmsd.project(sims) >>> dataDist = MetricSelfDistance.project(sims, 'protein and name CA') >>> dataRMSD.combine(dataDist)
- copy()#
Produces a deep copy of the object
- Returns:
data – A copy of the current object
- Return type:
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> data2 = data.copy()
- property dat#
List of the projected metrics of each trajectory.
- deconcatenate(array)#
Split a concatenated data array back into per-trajectory pieces
- dropDimensions(drop=None, keep=None)#
Drop some dimensions of the data given their indexes
- Parameters:
Examples
>>> data.dropDimensions([1, 24, 3]) >>> data.dropDimensions(keep=[2, 10])
- dropFrames(idx, frames)#
Drop frames from a single trajectory of the data
- dropTraj(limits=None, multiple=None, partial=None, idx=None, keepsims=None)#
Drops trajectories based on their lengths
By default, drops all trajectories which are not of statistical mode (most common) length.
- Parameters:
limits (
list|tuple|ndarray|None) – Lower and upper limits of trajectory lengths we want to keep. e.g. [100, 500]multiple (
list|range|ndarray|None) – Drops trajectories whose length is not a multiple of lengths in the list. e.g. [50, 80]idx (
list|range|ndarray|None) – A list of trajectory indexes to dropkeepsims (
list|None) – A list of sims which we want to keep
- Returns:
dropidx – The indexes of the trajectories that were dropped
- Return type:
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> data.dropTraj() >>> data.dropTraj(multiple=[100])
- static fromHDF5(filename=None, h5group=None)#
Load a
MetricDataobject from an HDF5 file or group- Parameters:
- Returns:
data – The loaded MetricData object
- Return type:
- load(filename)#
Load a
MetricDataobject from disk- Parameters:
filename (
str|dict|MetricData) – Path to the saved MetricData object. A dict of attributes or a legacy MetricData object is also accepted for backward compatibility.
Examples
>>> data = MetricData() >>> data.load('./data.dat')
- property map#
The description of the projected dimensions (alias for
description).
- property numDimensions#
The number of dimensions
Examples
>>> data.numDimensions
- property numFrames#
Get the total number of frames in all trajectories
- Returns:
nframes – Total number of frames in all trajectories
- Return type:
Examples
>>> data.numFrames
- property numTrajectories#
The number of trajectories
Examples
>>> data.numTrajectories
- plotClusters(dimX, dimY, resolution=100, s=4, c=None, cmap='Greys', logplot=False, plot=True, save=None, data=None, levels=7)#
Plot a scatter-plot of the locations of the clusters on top of the count histogram.
- Parameters:
dimX (
int) – Index of projected dimension to use for the X axis.dimY (
int) – Index of projected dimension to use for the Y axis.resolution (
int) – Resolution of bincount grid.s (
float) – Marker size for clusters.c (
list) – Colors or indexes for each cluster.cmap (
matplotlib.colors.Colormap) – Matplotlib colormap for the scatter plot.logplot (
bool) – Set True to plot the logarithm of counts.plot (
bool) – If the method should display the plotsave (
str|None) – Path of the file in which to save the figuredata (
MetricData|None) – Optionally you can pass a different MetricData object than the one used for clustering. For example if the user wants to cluster on distances but wants to plot the centers on top of RMSD values. The new MetricData object needs to have the same simlist as this object.levels (
int) – Number of contour levels to draw.
- plotCounts(dimX, dimY, resolution=100, logplot=False, plot=True, save=None, levels=7, cmap='viridis')#
Plots a histogram of counts on any two given dimensions.
- Parameters:
dimX (
int) – Index of projected dimension to use for the X axis.dimY (
int) – Index of projected dimension to use for the Y axis.resolution (
int) – Resolution of bincount grid.logplot (
bool) – Set True to plot the logarithm of counts.plot (
bool) – If the method should display the plotsave (
str|None) – Path of the file in which to save the figurelevels (
int) – Number of contour levels to draw.cmap (
str) – Name of the matplotlib colormap to use.
- plotTrajSizes()#
Plot the lengths of all trajectories in a sorted bar plot
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> data.plotTrajSizes()
- property ref#
List of the reference simulation and frame indexes of each trajectory.
- rel2sim(relFrames, simlist=None)#
Converts trajectory index-frame pairs into Sim-frame pairs
- Parameters:
- Returns:
frames – An array of
Frameobjects containing the simulation object, the trajectory piece ID and the frame index.- Return type:
Examples
>>> simframes = data.rel2sim([100, 56]) # 100th simulation frame 56
- sampleClusters(clusters=None, frames=20, replacement=False, allframes=False)#
Samples frames from a set of clusters
- Parameters:
clusters (
int|list|range|ndarray|None) – A list of cluster indexes from which we want to sample. If None is given it will sample from all clusters.frames (
int|list|ndarray|None) – An integer with the number of frames we want to sample from each state or a list of same length as clusters which contains the number of frames we want from each of the clusters. If None is given it will return all frames.replacement (
bool) – If we want to sample with or without replacement.allframes (
bool) – Deprecated. Use frames=None instead.
- Returns:
absframes (
numpy.ndarray) – An array which contains for each state an array containing absolute trajectory framesrelframes (
numpy.ndarray) – An array which contains for each state a 2D array containing the trajectory ID and frame number for each of the sampled frames
Examples
>>> data.sampleClusters(range(5), [10, 3, 2, 50, 1]) # Sample from first 5 clusters, 10, 3, etc frames respectively
- sampleRegion(point=None, radius=None, limits=None, nsamples=20, singlemol=False)#
Samples conformations from a region in the projected space.
- Parameters:
point (
list|tuple|ndarray|None) – A point in the projected space. Undefined dimensions should have None value.radius (
float|None) – The radius in around the point in which to sample conformations.limits (
ndarray|None) – A (2, ndim) dimensional array containing the min (1st row) and max (2nd row) limits for each dimension. None values will be interpreted as no limit in that dimension, or min/max value.nsamples (
int) – The number of conformations to sample.singlemol (
bool) – If True it will return all samples within a single Molecule instead of a list of Molecules.
- Returns:
absFrames (
list) – A list of the absolute frame indexes sampledrelFrames (
listoftuples) – A list of (trajNum, frameNum) tuples sampledmols (
moleculekit.molecule.MoleculeorlistofMolecules) – The conformations stored in a Molecule or a list of Molecules
Examples
>>> # Working with 4 dimensional data for example >>> abs, rel, mols = data.sampleRegion(point=(0.5, 3, None, None), radius=0.1) # Point undefined in dim 3, 4 >>> minlims = [-1, None, None, 4] # No min limit for 2, 3 dim >>> maxlims = [2, 3, None, 7] # No max limit for 3 dim >>> abs, rel, mols = data.sampleRegion(limits=np.array([minlims, maxlims]))
- save(filename)#
Save a
MetricDataobject to disk- Parameters:
filename (
str) – Path of the file in which to save the object
Examples
>>> data = MetricSelfDistance.project(sims, 'protein and name CA') >>> data.save('./data.dat')
- splitCols()#
- toHDF5(filename=None, h5group=None)#
Save a
MetricDataobject to an HDF5 file or group
- class htmd.metricdata.Trajectory(projection=None, reference=None, sim=None, cluster=None)#
Bases:
objectClass used for storing trajectory projections, their clustering and state assignments.
- Parameters:
- property cluster#
The cluster indexes for each frame in the simulation
- copy()#
Produce a deep copy of the object.
- Returns:
traj – A copy of the current object.
- Return type:
- dropFrames(frames)#
Drop frames from the trajectory
- static fromHDF5(h5group)#
Reconstruct a
Trajectoryfrom an open HDF5 group.- Return type:
- property numDimensions#
The number of dimensions in the projection of this trajectory
- property numFrames#
The number of frames in this trajectory
- property projection#
The projected metrics of this simulation trajectory
- property reference#
The reference indices to the simulations and frames that generated the projections
- toHDF5(h5group)#
Write this trajectory’s data into an open HDF5 group.