dgllife.utils.smiles_to_complete_graph¶
-
dgllife.utils.
smiles_to_complete_graph
(smiles, add_self_loop=False, node_featurizer=None, edge_featurizer=None, canonical_atom_order=True, explicit_hydrogens=False, num_virtual_nodes=0)[source]¶ Convert a SMILES into a complete DGLGraph and featurize for it.
- Parameters
smiles (str) – String of SMILES
add_self_loop (bool) – Whether to add self loops in DGLGraphs. Default to False.
node_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for nodes like atoms in a molecule, which can be used to update ndata for a DGLGraph. Default to None.
edge_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for edges like bonds in a molecule, which can be used to update edata for a DGLGraph. Default to None.
canonical_atom_order (bool) – Whether to use a canonical order of atoms returned by RDKit. Setting it to true might change the order of atoms in the graph constructed. Default to True.
explicit_hydrogens (bool) – Whether to explicitly represent hydrogens as nodes in the graph. If True, it will call rdkit.Chem.AddHs(mol). Default to False.
num_virtual_nodes (int) – The number of virtual nodes to add. The virtual nodes will be connected to all real nodes with virtual edges. If the returned graph has any node/edge feature, an additional column of binary values will be used for each feature to indicate the identity of virtual node/edges. The features of the virtual nodes/edges will be zero vectors except for the additional column. Default to 0.
- Returns
Complete DGLGraph for the molecule if
smiles
is valid and None otherwise.- Return type
DGLGraph or None
Examples
>>> from dgllife.utils import smiles_to_complete_graph
>>> g = smiles_to_complete_graph('CCO') >>> print(g) DGLGraph(num_nodes=3, num_edges=6, ndata_schemes={} edata_schemes={})
We can also initialize node/edge features when constructing graphs.
>>> import torch >>> from rdkit import Chem >>> from dgllife.utils import smiles_to_complete_graph >>> from functools import partial
>>> def featurize_atoms(mol): >>> feats = [] >>> for atom in mol.GetAtoms(): >>> feats.append(atom.GetAtomicNum()) >>> return {'atomic': torch.tensor(feats).reshape(-1, 1).float()}
>>> def featurize_edges(mol, add_self_loop=False): >>> feats = [] >>> num_atoms = mol.GetNumAtoms() >>> atoms = list(mol.GetAtoms()) >>> distance_matrix = Chem.GetDistanceMatrix(mol) >>> for i in range(num_atoms): >>> for j in range(num_atoms): >>> if i != j or add_self_loop: >>> feats.append(float(distance_matrix[i, j])) >>> return {'dist': torch.tensor(feats).reshape(-1, 1).float()}
>>> add_self_loop = True >>> g = smiles_to_complete_graph( >>> 'CCO', add_self_loop=add_self_loop, node_featurizer=featurize_atoms, >>> edge_featurizer=partial(featurize_edges, add_self_loop=add_self_loop)) >>> print(g.ndata['atomic']) tensor([[6.], [8.], [6.]]) >>> print(g.edata['dist']) tensor([[0.], [2.], [1.], [2.], [0.], [1.], [1.], [1.], [0.]])
By default, we do not explicitly represent hydrogens as nodes, which can be done as follows.
>>> g = smiles_to_complete_graph('CCO', explicit_hydrogens=True) >>> print(g) DGLGraph(num_nodes=9, num_edges=72, ndata_schemes={} edata_schemes={})
See also