Deep Learning 2people.iiis.tsinghua.edu.cn/~jianli/courses/ATCS2018spring/DL6-embedding.pdf ·...

Deep Learning - Embedding深度学习 - 嵌入

Jian Li

IIIS, Tsinghua

Word2Vec

Word Embedding

• Map each word to a vector (in relatively low-dim space)

Word2Vec

Objective of Word2Vec

Training method 2

• Word2Vec: unsupervised learning• Huge amount of training data (no label is needed)

• Can be incorporated to deep learning pipeline• The corresponding layers is usually called the embedding layer• The resulting vectors obtained from word2vec can be used to initialize the

parameters of NN• We can also get the embedding from training a specified DNN (for a specific task)

• Computational complexity too high (much higher than word2vec)• The embedding may not be useful in other tasks• On the other hand, word2vec captures a lot of semantic information, which is useful in a

variety of tasks

According to Mikolov:CBOW (Continuous Bag of Words): Use context to predict the current word.

--several times faster to train than the skip-gram, slightly better accuracy for the frequent words Skip-gram: Use the current word to predict the context.

--works well with small amount of the training data, represents well even rare words or phrases.

GloVe: Global Vectors for Word Representation

predicting the context given a word

Ratio Matters

Derivation

Objective

Reference

• Baroni, Marco, Georgiana Dinu, and Germán Kruszewski. Don‘t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. ACL 14

• An early paper claims that the prediction formulation (like word2vec) is better than factorizing a co-occurrence matrix

• Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving Distributional Similarity with Lessons Learned from Word Embeddings.

(comparing several SNSG, GloVe, and SVD)

A very reasonable blog discussing the relations between different models

http://sebastianruder.com/secret-word2vec/index.html

• O Levy, Y Goldberg. Neural Word Embedding as Implicit Matrix Factorization. NIPS 14. (in blackboard)

• Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, Jie Tang. The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018).

Deep Walk-embedding node in a social network

Embedding node (using pairwise relations)

Key Idea

Prob of u_k (context word) conditioning on the vector of

v_j (center word)

Multimodal representation learning---Image Caption 2

Kires et al. Unifying Visual-Semantic Embeddings withMultimodal Neural Language Models

OverviewMap CNN codes and RNN code to a common space

Details of SC-NLM. Please see the paper

Details

• LSTM notations used in this work

Details

• D: length of the CNN code (CNN can be AlexNet, VggNet, or ResNet)

𝑊𝑇: precomputed using e.g. word2vec

Details

• Optimize pairwise rank loss (𝜃:parameters needed to be learnt: 𝑊𝐼and LSTM parameters) (similar to negative sampling in spirit)

Max-margin formulation. 𝛼 marginWe have nonzero loss if

s(x,v) is less than s(x,vk)+\alpha

Vector for image

Vector for sentence

Vector for contrastive sentence

Vector for contrastive

Deep Learning 2people.iiis.tsinghua.edu.cn/~jianli/courses/ATCS2018spring/DL6-embedding.pdf ·...

Documents

GP1 and DL6 GSM Modem and Network Cabling Slide Show

Deep Learning 2 - Tsinghua Universityiiis.tsinghua.edu.cn/~jianli/courses/ML2016/DL4.pdfDeep Learning 4 Autoencoder ... •Compact representation •Sparse representation •Representation

DL6 DL8 DL10 DL12 Dynalift Parts Manuals

Energy-Efﬁcient Dynamic Task Scheduling …chaitali/jourpapers/DVS_TECS.pdfEnergy-Efﬁcient Dynamic Task Scheduling Algorithms for DVS Systems JIANLI ZHUO and CHAITALI CHAKRABARTI

DL6 70 Aluminium Sliding Door System DL6 70 · APAlogiKal ® apalogiKal ® APAlogiKal ® APAlogiKal APA logiKal ® APAlogiKal DL6 70 thermally insulated sliding door features & benefits

BSTRACT - 👍 #iyileşeceğizhome.ku.edu.tr/~bozbagci/embedding.pdf · 2009-07-20 · vides a handlebody decomposition of a Lefschetz ﬁbration ov er D2: we attach 2-handles to

virtual embedding.pdf

GP1/DL6 GSM Modem System - Delta T … · User Manual for the GP1/DL6 GSM Modem System GSM-UM-1.1 Delta-T Devices

DL5/DL6 system with CBD6S

1 Unrelated Parallel Machine Schedulingnas.iiis.tsinghua.edu.cn/~jianli/courses/ATCS2014spring/... · 2019. 3. 26. · 1 Unrelated Parallel Machine Scheduling 1.1 Mathematical foundation

Thermosteric Effects on Long-Term Global Sea Level Change Jianli Chen Center for Space Research, University of Texas at Austin, USA

52 SWAIN COURT NORTHALLERTON, DL6 1EL ... High Street, Northallerton, DL7 8PE Tel: 01609 771959 Fax: 01609 778500 S.3424 52 SWAIN COURT NORTHALLERTON, DL6 1EL • Attractively Positioned

Coreset Construction and Estimation over Stochastic Datapeople.iiis.tsinghua.edu.cn/~jianli/paper/thesis-lingxiao.pdf · probability distribution describing its location. Both stochastic

General Purpose Logger type GP1 - Delta T · PDA DL6 M12 GP1/DL6-M8 1.5m DL6 ... (or PDA). Delta-T Devices Ltd 130 Low Road, Burwell Cambridge CB25 0EJ UK Tel: +44 1638 742922

Case Report Xiaowen Chen, Jianli Chen, Sihai Liao, Yuwen

39 SCHOLLA VIEW - Northallerton Estate Agencynorthallertonestateagency.co.uk/docs/S4558_39SCHOLLAVIEW... · 2017. 8. 12. · 39 Scholla View, Northallerton DL6 3RT. Kitchen 4.27m

Multi-Attribute Exchange Market: Search for Optimal Matches Eugene Fink Jianli Gong John Hershberger

dl6.irlanguage.comdl6.irlanguage.com/English/OALD9/Help to Install alcohol... · Web viewراهنمای نصب نرم افزار Oxford Advanced Learners Dictionary 9thالف- مراحل

Group-Sensitive Triplet Embedding for Vehicle Reidentification Triplet Embedding.pdf · Group-Sensitive Triplet Embedding for Vehicle Reidentiﬁcation Yan Bai, Student Member, IEEE,YihangLou,

DL6 Recessed Luminaire MRI Compatible - pdcbiz.comThe DL6 Recessed Luminaire is designed for MRI suite applications and has a non-ferrous assembly which can be mounted within the MRI