dgllife.utils.smiles_to_nearest_neighbor_graph

dgllife.utils.smiles_to_nearest_neighbor_graph(smiles, coordinates, neighbor_cutoff, max_num_neighbors=None, p_distance=2, add_self_loop=False, node_featurizer=None, edge_featurizer=None, canonical_atom_order=True, keep_dists=False, dist_field='dist', explicit_hydrogens=False, num_virtual_nodes=0)[source]

Convert a SMILES into a nearest neighbor graph and featurize for it.

Different from bigraph and complete graph, the nearest neighbor graph may not be symmetric since i is the closest neighbor of j does not necessarily suggest the other way.

Parameters
  • smiles (str) – String of SMILES

  • coordinates (numpy.ndarray of shape (N, D)) – The coordinates of atoms in the molecule. N for the number of atoms and D for the dimensions of the coordinates.

  • neighbor_cutoff (float) – If the distance between a pair of nodes is larger than neighbor_cutoff, they will not be considered as neighboring nodes.

  • max_num_neighbors (int or None.) – If not None, then this specifies the maximum number of neighbors allowed for each atom. Default to None.

  • p_distance (int) – We compute the distance between neighbors using Minkowski (\(l_p\)) distance. When p_distance = 1, Minkowski distance is equivalent to Manhattan distance. When p_distance = 2, Minkowski distance is equivalent to the standard Euclidean distance. Default to 2.

  • add_self_loop (bool) – Whether to add self loops in DGLGraphs. Default to False.

  • node_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for nodes like atoms in a molecule, which can be used to update ndata for a DGLGraph. Default to None.

  • edge_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for edges like bonds in a molecule, which can be used to update edata for a DGLGraph. Default to None.

  • canonical_atom_order (bool) – Whether to use a canonical order of atoms returned by RDKit. Setting it to true might change the order of atoms in the graph constructed. Default to True.

  • keep_dists (bool) – Whether to store the distance between neighboring atoms in edata of the constructed DGLGraphs. Default to False.

  • dist_field (str) – Field for storing distance between neighboring atoms in edata. This comes into effect only when keep_dists=True. Default to 'dist'.

  • explicit_hydrogens (bool) – Whether to explicitly represent hydrogens as nodes in the graph. If True, it will call rdkit.Chem.AddHs(mol). Default to False.

  • num_virtual_nodes (int) – The number of virtual nodes to add. The virtual nodes will be connected to all real nodes with virtual edges. If the returned graph has any node/edge feature, an additional column of binary values will be used for each feature to indicate the identity of virtual node/edges. The features of the virtual nodes/edges will be zero vectors except for the additional column. Default to 0.

Returns

Nearest neighbor DGLGraph for the molecule if smiles is valid and None otherwise.

Return type

DGLGraph or None

Examples

>>> from dgllife.utils import smiles_to_nearest_neighbor_graph
>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> smiles = 'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C'
>>> mol = Chem.MolFromSmiles(smiles)
>>> AllChem.EmbedMolecule(mol)
>>> AllChem.MMFFOptimizeMolecule(mol)
>>> coords = get_mol_3d_coordinates(mol)
>>> g = mol_to_nearest_neighbor_graph(mol, coords, neighbor_cutoff=1.25)
>>> print(g)
DGLGraph(num_nodes=23, num_edges=6,
         ndata_schemes={}
         edata_schemes={})

Quite often we will want to use the distance between end atoms of edges, this can be achieved with

>>> g = smiles_to_nearest_neighbor_graph(smiles, coords, neighbor_cutoff=1.25, keep_dists=True)
>>> print(g.edata['dist'])
tensor([[1.2024],
        [1.2024],
        [1.2270],
        [1.2270],
        [1.2259],
        [1.2259]])

By default, we do not explicitly represent hydrogens as nodes, which can be done as follows.

>>> smiles = 'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C'
>>> mol = Chem.MolFromSmiles(smiles)
>>> mol = Chem.AddHs(mol)
>>> AllChem.EmbedMolecule(mol)
>>> AllChem.MMFFOptimizeMolecule(mol)
>>> coords = get_mol_3d_coordinates(mol)
>>> g = smiles_to_nearest_neighbor_graph(smiles, coords, neighbor_cutoff=1.25,
>>>                                      explicit_hydrogens=True)
>>> print(g)
DGLGraph(num_nodes=41, num_edges=42,
         ndata_schemes={}
         edata_schemes={})