dgllife.utils.smiles_to_complete_graph

dgllife.utils.smiles_to_complete_graph(smiles, add_self_loop=False, node_featurizer=None, edge_featurizer=None, canonical_atom_order=True, explicit_hydrogens=False, num_virtual_nodes=0)[source]

Convert a SMILES into a complete DGLGraph and featurize for it.

Parameters
  • smiles (str) – String of SMILES

  • add_self_loop (bool) – Whether to add self loops in DGLGraphs. Default to False.

  • node_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for nodes like atoms in a molecule, which can be used to update ndata for a DGLGraph. Default to None.

  • edge_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for edges like bonds in a molecule, which can be used to update edata for a DGLGraph. Default to None.

  • canonical_atom_order (bool) – Whether to use a canonical order of atoms returned by RDKit. Setting it to true might change the order of atoms in the graph constructed. Default to True.

  • explicit_hydrogens (bool) – Whether to explicitly represent hydrogens as nodes in the graph. If True, it will call rdkit.Chem.AddHs(mol). Default to False.

  • num_virtual_nodes (int) – The number of virtual nodes to add. The virtual nodes will be connected to all real nodes with virtual edges. If the returned graph has any node/edge feature, an additional column of binary values will be used for each feature to indicate the identity of virtual node/edges. The features of the virtual nodes/edges will be zero vectors except for the additional column. Default to 0.

Returns

Complete DGLGraph for the molecule if smiles is valid and None otherwise.

Return type

DGLGraph or None

Examples

>>> from dgllife.utils import smiles_to_complete_graph
>>> g = smiles_to_complete_graph('CCO')
>>> print(g)
DGLGraph(num_nodes=3, num_edges=6,
         ndata_schemes={}
         edata_schemes={})

We can also initialize node/edge features when constructing graphs.

>>> import torch
>>> from rdkit import Chem
>>> from dgllife.utils import smiles_to_complete_graph
>>> from functools import partial
>>> def featurize_atoms(mol):
>>>     feats = []
>>>     for atom in mol.GetAtoms():
>>>         feats.append(atom.GetAtomicNum())
>>>     return {'atomic': torch.tensor(feats).reshape(-1, 1).float()}
>>> def featurize_edges(mol, add_self_loop=False):
>>>     feats = []
>>>     num_atoms = mol.GetNumAtoms()
>>>     atoms = list(mol.GetAtoms())
>>>     distance_matrix = Chem.GetDistanceMatrix(mol)
>>>     for i in range(num_atoms):
>>>         for j in range(num_atoms):
>>>             if i != j or add_self_loop:
>>>                 feats.append(float(distance_matrix[i, j]))
>>>     return {'dist': torch.tensor(feats).reshape(-1, 1).float()}
>>> add_self_loop = True
>>> g = smiles_to_complete_graph(
>>>         'CCO', add_self_loop=add_self_loop, node_featurizer=featurize_atoms,
>>>         edge_featurizer=partial(featurize_edges, add_self_loop=add_self_loop))
>>> print(g.ndata['atomic'])
tensor([[6.],
        [8.],
        [6.]])
>>> print(g.edata['dist'])
tensor([[0.],
        [2.],
        [1.],
        [2.],
        [0.],
        [1.],
        [1.],
        [1.],
        [0.]])

By default, we do not explicitly represent hydrogens as nodes, which can be done as follows.

>>> g = smiles_to_complete_graph('CCO', explicit_hydrogens=True)
>>> print(g)
DGLGraph(num_nodes=9, num_edges=72,
         ndata_schemes={}
         edata_schemes={})