23
N-gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules Shengchao Liu, Mehmet Furkan Demirel, Yingyu Liang University of Wisconsin-Madison, Madison

N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

N-gram Graph: Simple Unsupervised Representation for Graphs, with

Applications to Molecules

Shengchao L iu , Mehmet Furkan Demire l , Yingyu L iang

Univers i ty o f Wiscons in -Madison, Madison

Page 2: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Machine Learning Progress

β€’ Significant progress in Machine Learning

Computer vision Machine translation

Game Playing Medical Imaging

Page 3: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

ML for Molecules?

Page 4: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

ML for Molecules?

β€’ Molecule property prediction

Machine

Learning

Model

Toxic

Not Toxic

Page 5: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Challenge: Representations

β€’ Input to traditional ML models: vectors

β€’ How to represent molecules as vectors? β€’ Fingerprints: Morgan fingerprints, etc

β€’ Graph kernels: Weisfeiler-Lehman kernel, etc

β€’ Graph Neural Networks (GNN): Graph CNN, Weave, etc

β€’ Fingerprints/kernels: unsupervised, fast to compute

β€’ GNN: supervised end-to-end, more expensive; powerful

Page 6: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Our method: N-gram Graphs

β€’ Unsupervised

β€’ Relatively fast to compute

β€’ Strong prediction performanceβ€’ Overall better than traditional fingerprint/kernel and popular GNNs

β€’ Inspired by N-gram approach in Natural Language Processing

Page 7: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

N-gram Approach in NLP

β€’ 𝑛-gram is a consecutive sequence of 𝑛 words in a sentence

β€’ Example: β€œthis molecule looks beautiful”

β€’ Its 2-grams: β€œthis molecule”, β€œmolecule looks”, β€œlooks beautiful”

Page 8: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

N-gram Approach in NLP

β€’ 𝑛-gram is a consecutive sequence of 𝑛 words in a sentence

β€’ Example: β€œthis molecule looks beautiful”

β€’ Its 2-grams: β€œthis molecule”, β€œmolecule looks”, β€œlooks beautiful”

β€’ N-gram count vector 𝑐(𝑛) is a numeric representation vector

β€’ coordinates correspond to all 𝑛-grams

β€’ coordinate value is the number of times the corresponding 𝑛-gram shows up in the sentence

β€’ Example: 𝑐(1) is just the histogram of the words in the sentence

Page 9: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Dimension Reduction by Embeddings

β€’ N-gram vector 𝑐(𝑛) has high dimensions: 𝑉 𝑛 for vocabulary 𝑉

β€’ Dimension reduction by word embeddings: 𝑓(1) = π‘Šπ‘(1)

Page 10: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Dimension Reduction by Embeddings

β€’ N-gram vector 𝑐(𝑛) has high dimensions: 𝑉 𝑛 for vocabulary 𝑉

β€’ Dimension reduction by word embeddings: 𝑓(1) = π‘Šπ‘(1)

β€’ =

β€’ 𝑓(1) is just the sum of the word vectors in the sentence!

π‘Š 𝑐(1)𝑓(1)

𝑖-th column is the embedding vector for 𝑖-th word in the vocabulary

Page 11: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Dimension Reduction by Embeddings

β€’ N-gram vector 𝑐(𝑛) has high dimensions: 𝑉 𝑛 for vocabulary 𝑉

β€’ Dimension reduction by word embeddings: 𝑓(1) = π‘Šπ‘(1)

For general 𝑛:

β€’ Embedding of an 𝑛-gram: entrywise product of its word vectors

β€’ 𝑓(𝑛): sum of embeddings of the 𝑛-grams in the sentence

Page 12: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

N-gram Graphs

β€’ Sentence: linear graph on words

β€’ Molecule: graph on atoms with attributes

Analogy:

β€’ Atoms with different attributes: different words

β€’ Walks of length 𝑛: 𝑛-grams

Page 13: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

N-gram Graphs

β€’ Sentence: linear graph on words

β€’ Molecule: graph on atoms with attributes

Analogy:

β€’ Atoms with different attributes: different words

β€’ Walks of length 𝑛: 𝑛-grams

A molecular graphIts 2-grams

Page 14: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

N-gram Graph Algorithm

β€’ Sentence: linear graph on words

β€’ Molecule: graph on atoms with attributes

Given the embeddings for the atoms (vertex vectors)

β€’ Enumerate all 𝑛-grams (walks of length 𝑛)

β€’ Embedding of an 𝑛-gram: entrywise product of its vertex vectors

β€’ 𝑓(𝑛): sum of embeddings of the 𝑛-grams

β€’ Final N-gram Graph embedding 𝑓𝐺: concatenation of 𝑓(1), … , 𝑓(𝑇)

Page 15: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

N-gram Graph Algorithm

β€’ Sentence: linear graph on words

β€’ Molecule: graph on atoms with attributes

Given the embeddings for the atoms (vertex vectors)

β€’ Enumerate all 𝑛-grams (walks of length 𝑛)

β€’ Embedding of an 𝑛-gram: entrywise product of its vertex vectors

β€’ 𝑓(𝑛): sum of embeddings of the 𝑛-grams

β€’ Final N-gram Graph embedding 𝑓𝐺: concatenation of 𝑓(1), … , 𝑓(𝑇)

β€’ Vertex vectors: trained by an algorithm similar to node2vec

Page 16: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

N-gram Graphs as Simple GNNs

β€’ Efficient dynamic programming version of the algorithm

β€’ Given vectors 𝑓𝑖 for vertices 𝑖, and the graph adjacent matrix A

β€’ Equivalent to a simple GNN without parameters!

Page 17: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Experimental Results

β€’ 60 tasks on 10 datasets from [1]

β€’ Methodsβ€’ Weisfeiler-Lehman kernel + SVM

β€’ Morgan fingerprints + Random Forest (RF) or XGBoost (XGB)

β€’ GNN: Graph CNN (GCNN), Weave Neural Network (Weave), Graph Isomorphism Network (GIN)

β€’ N-gram Graphs + Random Forest (RF) or XGBoost (XGB)

β€’ Vertex embedding dimension π‘Ÿ = 100, and 𝑇 = 6

[1] Wu, Zhenqin, et al. "MoleculeNet: a benchmark for molecular machine

learning." Chemical science 9.2 (2018): 513-530.

Page 18: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Experimental Results

β€’ N-gram+XGB: top-1 for 21 among 60 tasks, and top-3 for 48

β€’ Overall better than the other methods

Page 19: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Runtime

β€’ Relatively fast

Page 20: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Theoretical Analysis

β€’ Recall 𝑓(1) = π‘Šπ‘(1)β€’ π‘Š is the vertex embedding matrix

β€’ 𝑐(1) is the count vector

β€’ With sparse 𝑐(1) and random π‘Š, 𝑐(1) can be recovered from 𝑓(1)β€’ Well-known in compressed sensing

Page 21: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Theoretical Analysis

β€’ Recall 𝑓(1) = π‘Šπ‘(1)β€’ π‘Š is the vertex embedding matrix

β€’ 𝑐(1) is the count vector

β€’ With sparse 𝑐(1) and random π‘Š, 𝑐(1) can be recovered from 𝑓(1)β€’ Well-known in compressed sensing

β€’ In general, 𝑓(𝑛) = 𝑇(𝑛)𝑐(𝑛), for some linear mapping 𝑇(𝑛)depending on π‘Š

β€’ With sparse 𝑐(𝑛) and random π‘Š, 𝑐(𝑛) can be recovered from 𝑓(𝑛)

Page 22: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

Theoretical Analysis

β€’ Recall 𝑓(1) = π‘Šπ‘(1)β€’ π‘Š is the vertex embedding matrix

β€’ 𝑐(1) is the count vector

β€’ With sparse 𝑐(1) and random π‘Š, 𝑐(1) can be recovered from 𝑓(1)β€’ Well-known in compressed sensing

β€’ In general, 𝑓(𝑛) = 𝑇(𝑛)𝑐(𝑛), for some linear mapping 𝑇(𝑛)depending on π‘Š

β€’ With sparse 𝑐(𝑛) and random π‘Š, 𝑐(𝑛) can be recovered from 𝑓(𝑛)

β€’ So 𝑓(𝑛) preserves the information in 𝑐(𝑛)

β€’ Furthermore, can prove: regularized linear classifier on 𝑓(𝑛) is

competitive to the best linear classifier on 𝑐(𝑛)

Page 23: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfΒ Β· N-gram Approach in NLP ‒𝑛-gram is a consecutive sequence of

THANK YOU!