Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
N-gram Graph: Simple Unsupervised Representation for Graphs, with
Applications to Molecules
Shengchao L iu , Mehmet Furkan Demire l , Yingyu L iang
Univers i ty o f Wiscons in -Madison, Madison
Machine Learning Progress
β’ Significant progress in Machine Learning
Computer vision Machine translation
Game Playing Medical Imaging
ML for Molecules?
ML for Molecules?
β’ Molecule property prediction
Machine
Learning
Model
Toxic
Not Toxic
Challenge: Representations
β’ Input to traditional ML models: vectors
β’ How to represent molecules as vectors? β’ Fingerprints: Morgan fingerprints, etc
β’ Graph kernels: Weisfeiler-Lehman kernel, etc
β’ Graph Neural Networks (GNN): Graph CNN, Weave, etc
β’ Fingerprints/kernels: unsupervised, fast to compute
β’ GNN: supervised end-to-end, more expensive; powerful
Our method: N-gram Graphs
β’ Unsupervised
β’ Relatively fast to compute
β’ Strong prediction performanceβ’ Overall better than traditional fingerprint/kernel and popular GNNs
β’ Inspired by N-gram approach in Natural Language Processing
N-gram Approach in NLP
β’ π-gram is a consecutive sequence of π words in a sentence
β’ Example: βthis molecule looks beautifulβ
β’ Its 2-grams: βthis moleculeβ, βmolecule looksβ, βlooks beautifulβ
N-gram Approach in NLP
β’ π-gram is a consecutive sequence of π words in a sentence
β’ Example: βthis molecule looks beautifulβ
β’ Its 2-grams: βthis moleculeβ, βmolecule looksβ, βlooks beautifulβ
β’ N-gram count vector π(π) is a numeric representation vector
β’ coordinates correspond to all π-grams
β’ coordinate value is the number of times the corresponding π-gram shows up in the sentence
β’ Example: π(1) is just the histogram of the words in the sentence
Dimension Reduction by Embeddings
β’ N-gram vector π(π) has high dimensions: π π for vocabulary π
β’ Dimension reduction by word embeddings: π(1) = ππ(1)
Dimension Reduction by Embeddings
β’ N-gram vector π(π) has high dimensions: π π for vocabulary π
β’ Dimension reduction by word embeddings: π(1) = ππ(1)
β’ =
β’ π(1) is just the sum of the word vectors in the sentence!
π π(1)π(1)
π-th column is the embedding vector for π-th word in the vocabulary
Dimension Reduction by Embeddings
β’ N-gram vector π(π) has high dimensions: π π for vocabulary π
β’ Dimension reduction by word embeddings: π(1) = ππ(1)
For general π:
β’ Embedding of an π-gram: entrywise product of its word vectors
β’ π(π): sum of embeddings of the π-grams in the sentence
N-gram Graphs
β’ Sentence: linear graph on words
β’ Molecule: graph on atoms with attributes
Analogy:
β’ Atoms with different attributes: different words
β’ Walks of length π: π-grams
N-gram Graphs
β’ Sentence: linear graph on words
β’ Molecule: graph on atoms with attributes
Analogy:
β’ Atoms with different attributes: different words
β’ Walks of length π: π-grams
A molecular graphIts 2-grams
N-gram Graph Algorithm
β’ Sentence: linear graph on words
β’ Molecule: graph on atoms with attributes
Given the embeddings for the atoms (vertex vectors)
β’ Enumerate all π-grams (walks of length π)
β’ Embedding of an π-gram: entrywise product of its vertex vectors
β’ π(π): sum of embeddings of the π-grams
β’ Final N-gram Graph embedding ππΊ: concatenation of π(1), β¦ , π(π)
N-gram Graph Algorithm
β’ Sentence: linear graph on words
β’ Molecule: graph on atoms with attributes
Given the embeddings for the atoms (vertex vectors)
β’ Enumerate all π-grams (walks of length π)
β’ Embedding of an π-gram: entrywise product of its vertex vectors
β’ π(π): sum of embeddings of the π-grams
β’ Final N-gram Graph embedding ππΊ: concatenation of π(1), β¦ , π(π)
β’ Vertex vectors: trained by an algorithm similar to node2vec
N-gram Graphs as Simple GNNs
β’ Efficient dynamic programming version of the algorithm
β’ Given vectors ππ for vertices π, and the graph adjacent matrix A
β’ Equivalent to a simple GNN without parameters!
Experimental Results
β’ 60 tasks on 10 datasets from [1]
β’ Methodsβ’ Weisfeiler-Lehman kernel + SVM
β’ Morgan fingerprints + Random Forest (RF) or XGBoost (XGB)
β’ GNN: Graph CNN (GCNN), Weave Neural Network (Weave), Graph Isomorphism Network (GIN)
β’ N-gram Graphs + Random Forest (RF) or XGBoost (XGB)
β’ Vertex embedding dimension π = 100, and π = 6
[1] Wu, Zhenqin, et al. "MoleculeNet: a benchmark for molecular machine
learning." Chemical science 9.2 (2018): 513-530.
Experimental Results
β’ N-gram+XGB: top-1 for 21 among 60 tasks, and top-3 for 48
β’ Overall better than the other methods
Runtime
β’ Relatively fast
Theoretical Analysis
β’ Recall π(1) = ππ(1)β’ π is the vertex embedding matrix
β’ π(1) is the count vector
β’ With sparse π(1) and random π, π(1) can be recovered from π(1)β’ Well-known in compressed sensing
Theoretical Analysis
β’ Recall π(1) = ππ(1)β’ π is the vertex embedding matrix
β’ π(1) is the count vector
β’ With sparse π(1) and random π, π(1) can be recovered from π(1)β’ Well-known in compressed sensing
β’ In general, π(π) = π(π)π(π), for some linear mapping π(π)depending on π
β’ With sparse π(π) and random π, π(π) can be recovered from π(π)
Theoretical Analysis
β’ Recall π(1) = ππ(1)β’ π is the vertex embedding matrix
β’ π(1) is the count vector
β’ With sparse π(1) and random π, π(1) can be recovered from π(1)β’ Well-known in compressed sensing
β’ In general, π(π) = π(π)π(π), for some linear mapping π(π)depending on π
β’ With sparse π(π) and random π, π(π) can be recovered from π(π)
β’ So π(π) preserves the information in π(π)
β’ Furthermore, can prove: regularized linear classifier on π(π) is
competitive to the best linear classifier on π(π)
THANK YOU!