2Nd Phase
2nd phase
Created: 2022-06-11 10:14
#note
New features:
- users actions are profiled;
- initial questions to define 80 different types of users (distinction on two levels) -> we already have an initial list of recommended items.
Will the user be able to follow/be friends with other users (similarly to any social network)?
Clustering
Survey on clustering/topic modelling methods for Italian texts on Airbnb and Tripadvisor data -> Airbnb has more topic than Tripadvisor, where reviews are more similar to each others. Goal: compare and find best approaches for each dataset, given their unique aspects.
- Add more evaluation methods (Evaluate Clustering);
- Other approaches:
- GANs;
- Graphs list of papers;
- non-parametric models -> DeepDPM.pdf
Right now, non-parametric models based on Embeddings seem to be the best choices -> is this going to be valid for the other datasets?
Recommender system
Ideas:
- Based on clustering methods I tried -> Top2Vec seems to be the best model;
- Multi layered graphs -> one for users, one for documents and one for words (check entity2rec and path recommendations);
- Hybrid approach -> different smaller methods combined together
Real time clustering
Goal: find a way to update clusters in real time given new data.
BERTopic considers the evolution in time of the dataset.
Real time recommendations
Sequential RS -> predict documents that could be interesting for user based on interaction in a given session.
RNNs, Transformers, word2Vec-liked algorithms
Do we have enough data for the sessions?
ig2Vec -> given an interaction on a document, get users similar to the owner of the document we liked (check FAISS), filter them and rank.
A session can be seen as a language modelling problem in which we want to predict the next item (like in this paper from Amazon Science).
If I can obtain good results with BERTopic and Concept library, I could use them to suggest the images to show in the feed -> given a recommended post, search the most similar images to such post and show them.
Short-term preferences can be captured by a model like BERT4Rec, but I need something for long-term preferences too.
References
Code
Tags
#clustering #dl #real_time #session_based