dgllife.utils.multiprocess_load_molecules¶
-
dgllife.utils.
multiprocess_load_molecules
(files, sanitize=False, calc_charges=False, remove_hs=False, use_conformation=True, num_processes=2)[source]¶ Load molecules from files with multiprocessing, which can be of format
.mol2
or.sdf
or.pdbqt
or.pdb
.- Parameters
files (list of str) – Each element is a path to a file storing a molecule, which can be of format
.mol2
,.sdf
,.pdbqt
, or.pdb
.sanitize (bool) – Whether sanitization is performed in initializing RDKit molecule instances. See https://www.rdkit.org/docs/RDKit_Book.html for details of the sanitization. Default to False.
calc_charges (bool) – Whether to add Gasteiger charges via RDKit. Setting this to be True will enforce
sanitize
to be True. Default to False.remove_hs (bool) – Whether to remove hydrogens via RDKit. Note that removing hydrogens can be quite slow for large molecules. Default to False.
use_conformation (bool) – Whether we need to extract molecular conformation from proteins and ligands. Default to True.
num_processes (int or None) – Number of worker processes to use. If None, then we will use the number of CPUs in the systetm. Default to 2.
- Returns
The first element of each 2-tuple is an RDKit molecule instance. The second element of each 2-tuple is the 3D atom coordinates of the corresponding molecule if use_conformation is True and the coordinates has been successfully loaded. Otherwise, it will be None.
- Return type
list of 2-tuples