dgllife.utils.multiprocess_load_molecules

dgllife.utils.multiprocess_load_molecules(files, sanitize=False, calc_charges=False, remove_hs=False, use_conformation=True, num_processes=2)[source]

Load molecules from files with multiprocessing, which can be of format .mol2 or .sdf or .pdbqt or .pdb.

Parameters
  • files (list of str) – Each element is a path to a file storing a molecule, which can be of format .mol2, .sdf, .pdbqt, or .pdb.

  • sanitize (bool) – Whether sanitization is performed in initializing RDKit molecule instances. See https://www.rdkit.org/docs/RDKit_Book.html for details of the sanitization. Default to False.

  • calc_charges (bool) – Whether to add Gasteiger charges via RDKit. Setting this to be True will enforce sanitize to be True. Default to False.

  • remove_hs (bool) – Whether to remove hydrogens via RDKit. Note that removing hydrogens can be quite slow for large molecules. Default to False.

  • use_conformation (bool) – Whether we need to extract molecular conformation from proteins and ligands. Default to True.

  • num_processes (int or None) – Number of worker processes to use. If None, then we will use the number of CPUs in the systetm. Default to 2.

Returns

The first element of each 2-tuple is an RDKit molecule instance. The second element of each 2-tuple is the 3D atom coordinates of the corresponding molecule if use_conformation is True and the coordinates has been successfully loaded. Otherwise, it will be None.

Return type

list of 2-tuples