Tales of Data Science

A tale about LDA2vec: when LDA meets word2vec


UPD: regarding the very useful comment by Oren, I see that I did really cut it too far describing differencies of word2vec and LDA – in fact they are not so different from algorithmic point of view. So I corrected this post. Errare humanum est, stultum est in errore perseverare, you know. Also, now I really recommend you to read this presentation of Yoav Goldberg

A few days ago I found out that there had appeared lda2vec (by Chris Moody) – a hybrid algorithm combining best ideas from well-known LDA (Latent Dirichlet Allocation) topic modeling algorithm and from a bit less well-known tool for language modeling named word2vec.

You can also read this text in Russian, if you like.

And now I’m going to tell you a tale about lda2vec and my attempts to try it and compare with simple LDA implementation (I used gensim package for this). So, once upon a time…

Read More

51,203 total views, no views today