Random Notes On Clustering

Random notes on clustering

Created: 2022-05-21 17:36
#note

Top2Vec embedds documents and words together. Document are assigned to a single topic;
BERTopic embedds documents, reduces the dimensionality and clusters documents. Then it uses TF-IDF to extract topic representation from each cluster. Documents are assigned to a single topic;
Machine Learning/Unsupervised/Clustering/WEClustering is similar to BERTopic but it clusters the embeddings, it defines a topic as a cluster obtained in the previous step and then it clusters the documents based on the class TF-IDF (sum of the TF-IDFs of the words of the embedding that are in a cluster);
Combined Topic Modelling uses Autoencoder on embeddings which have been concatenated to BoW.