moleculekit.opm module#
- moleculekit.opm.align_to_opm(mol, molsel='all', maxalignments=3, opmid=None, macrotype='protein')#
Align a Molecule to proteins/nucleics in the OPM database by sequence search
This function requires BLAST+ to be installed. You can find the latest BLAST executables here: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ Once you have it installed, export it to your PATH before starting python so that it’s able to detect the blastp and makeblastdb executables. Alternatively install it via conda install blast -c bioconda
- Parameters:
mol (
Molecule) – The query molecule. The alignments will be done on the first frame only.molsel (
str|ndarray) – The atom selection for the query molecule to use. Can be an atom selection string, a boolean mask, or an integer index array.maxalignments (
int) – The maximum number of aligned structures to returnopmid (
str|None) – If an OPM ID is passed the function will skip searching the databasemacrotype (
str) – If to align on “protein” or “nucleic”
- Returns:
results – Returns a number of alignments (maximum maxalignments). For each alignment it might contain a number of HSPs (high-scoring pairs) which correspond to different sequence alignments of the query on the same hit protein.
- Return type:
- moleculekit.opm.blast_search_opm(query, sequences)#
Search a query sequence against OPM protein sequences using BLAST+.
Builds a temporary BLAST protein database from the protein sequences of the OPM database and runs
blastpfor the query sequence. This requires themakeblastdbandblastpexecutables (from BLAST+) to be available on the system PATH.- Parameters:
query (
str) – The query protein sequence to search for.sequences (
dict) – The OPM sequences database, as produced bygenerate_opm_sequences(), keyed by PDB id with a"protein"entry mapping chain ids to sequences.
- Returns:
hits – The BLAST search hits, as parsed from the
blastpJSON output.- Return type:
- moleculekit.opm.generate_opm_sequences(opm_pdbs, outjson)#
Extract protein and nucleic sequences from OPM PDB files into a JSON file.
Each input PDB is filtered of its DUM placeholder atoms, loaded as a Molecule, and split into protein and nucleic components. The per-chain sequences (dropping chains shorter than 5 residues or consisting only of unknown
Xresidues) are collected and written to a JSON file keyed by the PDB file basename. This is used to build the searchable database consumed byalign_to_opm().
- moleculekit.opm.get_opm_pdb(pdbid, keep=False, keepaltloc='A', validateElements=False)#
Download a membrane system from the OPM.
- Parameters:
- Returns:
mol (
moleculekit.molecule.Molecule) – The oriented moleculethickness (
floatorNone) – The bilayer thickness (both layers)
Examples
>>> mol, thickness = get_opm_pdb("1z98") >>> mol.numAtoms 7902 >>> thickness 28.2 >>> _, thickness = get_opm_pdb('4u15') >>> thickness is None True