dgllife.utils.smiles_to_nearest_neighbor_graph¶
-
dgllife.utils.
smiles_to_nearest_neighbor_graph
(smiles, coordinates, neighbor_cutoff, max_num_neighbors=None, p_distance=2, add_self_loop=False, node_featurizer=None, edge_featurizer=None, canonical_atom_order=True, keep_dists=False, dist_field='dist', explicit_hydrogens=False, num_virtual_nodes=0)[source]¶ Convert a SMILES into a nearest neighbor graph and featurize for it.
Different from bigraph and complete graph, the nearest neighbor graph may not be symmetric since i is the closest neighbor of j does not necessarily suggest the other way.
- Parameters
smiles (str) – String of SMILES
coordinates (numpy.ndarray of shape (N, D)) – The coordinates of atoms in the molecule. N for the number of atoms and D for the dimensions of the coordinates.
neighbor_cutoff (float) – If the distance between a pair of nodes is larger than neighbor_cutoff, they will not be considered as neighboring nodes.
max_num_neighbors (int or None.) – If not None, then this specifies the maximum number of neighbors allowed for each atom. Default to None.
p_distance (int) – We compute the distance between neighbors using Minkowski (\(l_p\)) distance. When
p_distance = 1
, Minkowski distance is equivalent to Manhattan distance. Whenp_distance = 2
, Minkowski distance is equivalent to the standard Euclidean distance. Default to 2.add_self_loop (bool) – Whether to add self loops in DGLGraphs. Default to False.
node_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for nodes like atoms in a molecule, which can be used to update ndata for a DGLGraph. Default to None.
edge_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for edges like bonds in a molecule, which can be used to update edata for a DGLGraph. Default to None.
canonical_atom_order (bool) – Whether to use a canonical order of atoms returned by RDKit. Setting it to true might change the order of atoms in the graph constructed. Default to True.
keep_dists (bool) – Whether to store the distance between neighboring atoms in
edata
of the constructed DGLGraphs. Default to False.dist_field (str) – Field for storing distance between neighboring atoms in
edata
. This comes into effect only whenkeep_dists=True
. Default to'dist'
.explicit_hydrogens (bool) – Whether to explicitly represent hydrogens as nodes in the graph. If True, it will call rdkit.Chem.AddHs(mol). Default to False.
num_virtual_nodes (int) – The number of virtual nodes to add. The virtual nodes will be connected to all real nodes with virtual edges. If the returned graph has any node/edge feature, an additional column of binary values will be used for each feature to indicate the identity of virtual node/edges. The features of the virtual nodes/edges will be zero vectors except for the additional column. Default to 0.
- Returns
Nearest neighbor DGLGraph for the molecule if
smiles
is valid and None otherwise.- Return type
DGLGraph or None
Examples
>>> from dgllife.utils import smiles_to_nearest_neighbor_graph >>> from rdkit import Chem >>> from rdkit.Chem import AllChem
>>> smiles = 'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C' >>> mol = Chem.MolFromSmiles(smiles) >>> AllChem.EmbedMolecule(mol) >>> AllChem.MMFFOptimizeMolecule(mol) >>> coords = get_mol_3d_coordinates(mol) >>> g = mol_to_nearest_neighbor_graph(mol, coords, neighbor_cutoff=1.25) >>> print(g) DGLGraph(num_nodes=23, num_edges=6, ndata_schemes={} edata_schemes={})
Quite often we will want to use the distance between end atoms of edges, this can be achieved with
>>> g = smiles_to_nearest_neighbor_graph(smiles, coords, neighbor_cutoff=1.25, keep_dists=True) >>> print(g.edata['dist']) tensor([[1.2024], [1.2024], [1.2270], [1.2270], [1.2259], [1.2259]])
By default, we do not explicitly represent hydrogens as nodes, which can be done as follows.
>>> smiles = 'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C' >>> mol = Chem.MolFromSmiles(smiles) >>> mol = Chem.AddHs(mol) >>> AllChem.EmbedMolecule(mol) >>> AllChem.MMFFOptimizeMolecule(mol) >>> coords = get_mol_3d_coordinates(mol) >>> g = smiles_to_nearest_neighbor_graph(smiles, coords, neighbor_cutoff=1.25, >>> explicit_hydrogens=True) >>> print(g) DGLGraph(num_nodes=41, num_edges=42, ndata_schemes={} edata_schemes={})