LDA
Created: 2022-05-03 09:59
#note
Â
LDA is a Bayesian version of PLSA.
Latent Dirichlet Allocation works as follow:
- Specify the number of topics;
- Each word is randomly assigned to one of the topics (technically the distribution is a Dirichlet one, i.e. the numbers of assigned across the topics add up to 1);
- Topic assignments for each word are updated in an iterative fashion by updating the prevalence of the word across the topics, as well as the prevalence of the topics in the document. TF-IDF is used in this stage.
- Stop when iterations begin to have little impact on the probabilities assigned to each word.
It exists a non-parametric version of LDA, whch uses Hierarchical Dirichlet Processes to automatically detect the optimal number of topics (paper).
References
Code
Tags
#topic_modeling