Various Results
Various results
Created: 2022-08-17 16:09
#note
Easytour
LDA
Raw data: (LDA does not deal with stop words so we have to preproecss data)
Number of topics ->
Diversity ->
NPMI ->
UCI ->
UMASS ->
C_V ->
Processed data:
Number of topics -> 20
Diversity -> 0.088
NPMI -> -0.004
UCI -> -0.06
UMASS -> -1.65
C_V -> 0.39
NMF
Raw data:
Number of topics -> 20
Diversity -> 0.896
NPMI -> -0.09
UCI -> -3.89
UMASS -> -3.92
C_V -> 0.54
Processed data:
Number of topics -> 20
Diversity -> 0.79
NPMI -> 0.03
UCI -> -1.20
UMASS -> -3.79
C_V -> 0.55
Top2Vec
Raw data:
Number of topics -> 47
Diversity -> 0.30
NPMI -> 0.21
UCI -> -6.26
UMASS -> -5.13
C_V -> 0.37
Processed data:
Number of topics -> 53
Diversity -> 0.31
NPMI -> 0.04
UCI -> 0.19
UMASS -> -1.63
C_V -> 0.46
BERTopic (with sentence transformers)
Raw data:
Number of topics -> 64
Diversity -> 0.199
NPMI -> -0.06
UCI -> -1.87
UMASS -> -3.16
C_V -> 0.37
Processed data:
Number of topics -> 64
Diversity -> 0.299
NPMI -> -0.04
UCI -> -3.29
UMASS -> -7.19
C_V -> 0.47
BERTopic (with plain BERT)
Raw data:
Number of topics -> 8
Diversity -> 0.12
NPMI -> -0.03
UCI -> -0.20
UMASS -> -0.44
C_V -> 0.35
Processed data:
Number of topics ->
Diversity ->
NPMI ->
UCI ->
UMASS ->
C_V ->
RoBERTa
Raw data:
Number of topics -> 36
Diversity -> 0.10
NPMI -> -0.03
UCI -> -0.49
UMASS -> -0.98
C_V -> 0.33
Processed data:
Number of topics -> 43
Diversity -> 0.26
NPMI -> 0.05
UCI -> -0.73
UMASS -> -3.41
C_V -> 0.54
CTM
Raw data:
Number of topics -> 20
Diversity -> 0.24
NPMI -> -0.17
UCI -> -5.17
UMASS -> -5.87
C_V -> 0.41
Processed data:
Number of topics -> 20
Diversity -> 0.25
NPMI -> -0.005
UCI -> -1.69
UMASS -> -4.33
C_V -> 0.59
ETM (300 epochs, embeddings learnt by the model and 20 topics)
Raw data:
Number of topics -> 20
Diversity -> 0.27
NPMI -> -0.001
UCI -> -0.06
UMASS -> -1.06
C_V -> 0.32
Processed data:
Number of topics -> 20
Diversity -> 0.442
NPMI -> 0.05
UCI -> 0.34
UMASS -> -1.98
C_V -> 0.50
Tourpedia
LDA
Raw data: (LDA does not deal with stop words so we have to preprocess data)
Number of topics ->
Diversity ->
NPMI ->
UCI ->
UMASS ->
C_V ->
Processed data:
Number of topics -> 20
Diversity -> 0.17
NPMI -> -0.07
UCI -> -2.12
UMASS -> -4.84
C_V -> 0.46
NMF
Raw data:
Number of topics -> 20
Diversity -> 0.632
NPMI -> -0.20
UCI -> -6.35
UMASS -> -9.20
C_V -> 0.45
Processed data:
Number of topics -> 20
Diversity -> 0.52
NPMI -> -0.18
UCI -> -5.95
UMASS -> -8.95
C_V -> 0.41
Top2Vec
Raw data:
Number of topics -> 122
Diversity -> 0.30
NPMI -> -0.30
UCI -> -8.51
UMASS -> -10.98
C_V -> 0.47
Processed data:
Number of topics -> 101
Diversity -> 0.26
NPMI -> -0.39
UCI -> -1.82
UMASS -> -2.30
C_V -> 0.40
BERTopic (sentence transformers)
Raw data:
Number of topics -> 141
Diversity -> 0.32
NPMI -> -0.22
UCI -> -7.49
UMASS -> -13.18
C_V -> 0.35
Processed data:
Number of topics -> 149
Diversity -> 0.32
NPMI -> nan
UCI -> nan
UMASS -> nan
C_V -> 0.37
BERTopic (plain BERT)
Raw data:
Number of topics -> 49
Diversity -> 0.24
NPMI -> -0.13
UCI -> -4.52
UMASS -> -8.89
C_V -> 0.40
Processed data:
Number of topics ->
Diversity ->
NPMI ->
UCI ->
UMASS ->
C_V ->
RoBERTa
Raw data:
Number of topics -> 16
Diversity -> 0.29
NPMI -> -0.18
UCI -> -5.52
UMASS -> -9.97
C_V -> 0.37
Processed data:
Number of topics -> 59
Diversity -> 0.32
NPMI -> -0.17
UCI -> -7.01
UMASS -> -13.93
C_V -> 0.40
CTM
Raw data:
Number of topics -> 20
Diversity -> 0.20
NPMI -> -0.21
UCI -> -6.11
UMASS -> -8.82
C_V -> 0.49
Processed data:
Number of topics -> 20
Diversity -> 0.24
NPMI -> -0.27
UCI -> -8.05
UMASS -> -12.22
C_V -> 0.51
ETM
Raw data:
Number of topics -> 20
Diversity -> 0.16
NPMI -> -0.004
UCI -> -0.08
UMASS -> -1.96
C_V -> 0.46
Processed data:
Number of topics -> 20
Diversity -> 0.18
NPMI -> -0.0005
UCI -> -0.64
UMASS -> -3.51
C_V -> 0.52