Paper Review TO-DO

Paper review TO-DO

Created: 2022-08-08 09:05
#to-do

Be on the same page with Maryam about the approaches used and the metrics used to evaluate the approaches.

WEClustering with sentence transformer and HDBSCAN: BERT with sentence transformer -> I think that the mothod used by Maryam is more similar to BERTopic than Machine Learning/Unsupervised/Clustering/WEClustering -> check this
Evaluation metrics:
- PMI
- inter/intra cluster
- topic coherence (paper)
- topic diversity and predictive accuracy (paper)
AlBERT and XLNet
Other approaches:
- NMF (here and here)
- LDA2Vec
- Neural topic modeling here
Test Top2Vec with deep-learn
Add section about tests on Top2Vec with several embedding models and chunking, with graphic comparison
update results table
Test n-gram vs no n-gram
results divided in qualitative results and quantitative results
improve plots
write sections on paper
datasets statistics
Study IRBO formula
tests on raw data:
- tourpedia
  - bert
  - roberta
  - sentence-tr
  - top2vec
- easytour:
  - bert
  - roberta
  - sentence-tr
  - top2vec
add embedding coherence

We should not preprocess data before fed it to BERT models:

Evaluation parameters:

Calinski-Harabasz -> survey (1000+ citations) new survey most of the papers are about clustering, but there are some that are about topic modeling (most of them have less of 30 citations and regards the medical field)
Davies-Bouldin -> number of topics optimization with 26 citations LDA similar to Calinski-Harabasz but less used
Perplexity -> it needs labels
Inter/intra cluster distance topic modeling for twitter 32 citations, there are also other papers
Topic diversity now works properly and could be interesting for our paper

Devised models: