moleculekit.opm module#

moleculekit.opm.align_to_opm(mol, molsel='all', maxalignments=3, opmid=None, macrotype='protein')#

Align a Molecule to proteins/nucleics in the OPM database by sequence search

This function requires BLAST+ to be installed. You can find the latest BLAST executables here: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ Once you have it installed, export it to your PATH before starting python so that it’s able to detect the blastp and makeblastdb executables. Alternatively install it via conda install blast -c bioconda

Parameters:

mol (Molecule) – The query molecule. The alignments will be done on the first frame only.
molsel (str | ndarray) – The atom selection for the query molecule to use. Can be an atom selection string, a boolean mask, or an integer index array.
maxalignments (int) – The maximum number of aligned structures to return
opmid (str | None) – If an OPM ID is passed the function will skip searching the database
macrotype (str) – If to align on “protein” or “nucleic”

Returns:

results – Returns a number of alignments (maximum maxalignments). For each alignment it might contain a number of HSPs (high-scoring pairs) which correspond to different sequence alignments of the query on the same hit protein.

Return type:

list

moleculekit.opm.blast_search_opm(query, sequences)#

Search a query sequence against OPM protein sequences using BLAST+.

Builds a temporary BLAST protein database from the protein sequences of the OPM database and runs blastp for the query sequence. This requires the makeblastdb and blastp executables (from BLAST+) to be available on the system PATH.

Parameters:

query (str) – The query protein sequence to search for.
sequences (dict) – The OPM sequences database, as produced by generate_opm_sequences(), keyed by PDB id with a "protein" entry mapping chain ids to sequences.

Returns:

hits – The BLAST search hits, as parsed from the blastp JSON output.

Return type:

list

moleculekit.opm.generate_opm_sequences(opm_pdbs, outjson)#

Extract protein and nucleic sequences from OPM PDB files into an xz-compressed JSON file.

Each input PDB is filtered of its DUM placeholder atoms, loaded as a Molecule, and split into protein and nucleic components. The per-chain sequences (dropping chains shorter than 5 residues or consisting only of unknown X residues) are collected and written to an xz-compressed JSON file keyed by the PDB file basename. This is used to build the searchable database consumed by align_to_opm().

Parameters:

opm_pdbs (list) – Paths to the OPM PDB files to process.
outjson (str) – Path to the xz-compressed JSON file that the collected sequences are written to.

moleculekit.opm.get_opm_pdb(pdbid, keep=False, keepaltloc='A', validateElements=False)#

Download a membrane system from the OPM.

Parameters:

pdbid (str) – The 4-letter PDB code
keep (bool) – If False, removes the DUM atoms. If True, it keeps them.
keepaltloc (str) – Which altloc to keep if there are any
validateElements (bool) – Set to True to validate the elements read. Usually this will fail on OPM due to weird atom names

Returns:

mol (moleculekit.molecule.Molecule) – The oriented molecule
thickness (float or None) – The bilayer thickness (both layers)

Examples

>>> mol, thickness = get_opm_pdb("1z98")
>>> mol.numAtoms
7902
>>> thickness
28.2
>>> _, thickness = get_opm_pdb('4u15')
>>> thickness is None
True