dgllife.utils.SMILESToBigraph¶

class dgllife.utils.SMILESToBigraph(add_self_loop=False, node_featurizer=None, edge_featurizer=None, canonical_atom_order=True, explicit_hydrogens=False, num_virtual_nodes=0)[source]¶

Convert SMILES strings into bi-directed DGLGraphs and featurize for them.

Parameters

add_self_loop (bool) – Whether to add self loops in DGLGraphs. Default to False.
node_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for nodes like atoms in a molecule, which can be used to update ndata for a DGLGraph. Default to None.
edge_featurizer (callable, rdkit.Chem.rdchem.Mol -> dict) – Featurization for edges like bonds in a molecule, which can be used to update edata for a DGLGraph. Default to None.
canonical_atom_order (bool) – Whether to use a canonical order of atoms returned by RDKit. Setting it to true might change the order of atoms in the graph constructed. Default to True.
explicit_hydrogens (bool) – Whether to explicitly represent hydrogens as nodes in the graph. If True, it will call rdkit.Chem.AddHs(mol). Default to False.
num_virtual_nodes (int) – The number of virtual nodes to add. The virtual nodes will be connected to all real nodes with virtual edges. If the returned graph has any node/edge feature, an additional column of binary values will be used for each feature to indicate the identity of virtual node/edges. The features of the virtual nodes/edges will be zero vectors except for the additional column. Default to 0.

Examples

>>> import torch
>>> from rdkit import Chem
>>> from dgllife.utils import SMILESToBigraph

>>> # A custom node featurizer
>>> def featurize_atoms(mol):
>>>     feats = []
>>>     for atom in mol.GetAtoms():
>>>         feats.append(atom.GetAtomicNum())
>>>     return {'atomic': torch.tensor(feats).reshape(-1, 1).float()}

>>> # A custom edge featurizer
>>> def featurize_bonds(mol):
>>>     feats = []
>>>     bond_types = [Chem.rdchem.BondType.SINGLE, Chem.rdchem.BondType.DOUBLE,
>>>                   Chem.rdchem.BondType.TRIPLE, Chem.rdchem.BondType.AROMATIC]
>>>     for bond in mol.GetBonds():
>>>         btype = bond_types.index(bond.GetBondType())
>>>         # One bond between atom u and v corresponds to two edges (u, v) and (v, u)
>>>         feats.extend([btype, btype])
>>>     return {'type': torch.tensor(feats).reshape(-1, 1).float()}

>>> smi_to_g = SMILESToBigraph(node_featurizer=featurize_atoms,
...                            edge_featurizer=featurize_bonds)
>>> g = smi_to_g('CCO')
>>> print(g.ndata['atomic'])
tensor([[6.],
        [8.],
        [6.]])
>>> print(g.edata['type'])
tensor([[0.],
        [0.],
        [0.],
        [0.]])

__init__(add_self_loop=False, node_featurizer=None, edge_featurizer=None, canonical_atom_order=True, explicit_hydrogens=False, num_virtual_nodes=0)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([add_self_loop, node_featurizer, …])

Initialize self.