# moleculekit.tools.hhblitsprofile module¶

moleculekit.tools.hhblitsprofile.getSequenceProfile(sequence, hhblits, hhblitsdb, ncpu=6, niter=4)

Calculates the sequence profile of a protein sequence using HHBlits

File description and unit conversions taken from section 6 https://hpc.nih.gov/apps/hhsuite-userguide.pdf

Parameters
• sequence (str) – A string encoding the one letter sequence of the protein

• hhblits (str) – The path to the hhblits executable

• hhblitsdb (str) – The path to the hhblits database that we want to search against. Should include the database name prefix, not just the folder.

• ncpu (int) – Number of CPUs to use for the search

• niter (int) – The number of hhblits iterations. The higher the value the more remote homologues it will find

Returns

• df (pandas.DataFrame) – A pandas dataframe containing all the information read from the file

• pssm (np.ndarray) – A Nx20 numpy array where N the number of residues of the protein. Contains the transition probabilities to all 20 residues.

Examples

>>> hhb = '~/hhsuite-2.0.16-linux-x86_64/bin/hhblits'
>>> hhbdb = '~/hhsuite-2.0.16-linux-x86_64/databases/uniprot20_2016_02/uniprot20_2016_02'