Word2vec

Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

Word2vec was created by a team of researchers led by Tomas Mikolov at Google. The algorithm has been subsequently analysed and explained by other researchers. Embedding vectors created using the Word2vec algorithm have many advantages compared to earlier algorithms like Latent Semantic Analysis.

Favorite site

Wikipedia (en) Word2vec
Github - facebook research - Pre-trained word vectors (페이스북이 미리 학습시킨 90개의 언어 모델 공개)
[추천] word2vec 관련 이론 정리 ¹
[추천] 딥러닝 기반 자연어처리 기법의 최근 연구 동향 ²
딥러닝 챗봇, PART 1 – INTRODUCTION (한글번역)
딥러닝 챗봇 , PART 2 – IMPLEMENTING A RETRIEVAL-BASED MODEL IN TENSORFLOW(한글번역)
SlideShare - 머신러닝의 자연어 처리기술(I)
5-1. 텐서플로우(TensorFlow)를 이용해 자연어를 처리하기(NLP) – Word Embedding(word2vec)
Word2Vec으로 상품 연관 키워드 추출하기

References

Word2vec_related_theorem_-_Beomsu_Kim.pdf ↩
Recent_Trends_in_Deep_Learning_Based_Natural_Language_Processing_-_ko.pdf ↩

Word2vec

See also

Favorite site

References