Collaborative Topic Modeling

Created: 2022-05-03 10:41
#paper

Main idea

Collaborative Topic Modeling (CTM) is a recommender system for text-based items buildt upon Probabilistic Matrix Factorization (PMF) and LDA.

CTM is superior compared to PMF approach because it is able to do out-of-matrix predictions, i.e. it can derive latent vector of qualities for unrated items.

In deep

A latent qualities vector for a document i, QiQ_i is represented as: Qi=θi+ϵiQ_i=\theta_i+\epsilon_i, where θi\theta_i is the (Kx1) vector of topic proportions for item i obtained from traditional LDA estimates, and ϵi\epsilon_i is a (Kx1) offset vector that adjust topic proportions by considering ratings.

The generative process for CTm is shown in the following image:
CTM_algorithm.png

Where ÏƒP2\sigma^2_P and ÏƒQ2\sigma^2_Q represent the variance we impose a priori on the distribution of the elements of the vectors in P and Q. Similarly, Ïƒu2\sigma^2_u represents the variance we impose a priori on the distribution of the ratings. Dir(α)Dir(\alpha) is the Dirichlet distribution.

So we are interested into learning the paramenters θi\theta_i, QiQ_i and PuP_u. We can do this using Maximum Likelihood, where the likelihood od our data is defined as: p(P,Q,θ,R∣σP,σQ,σ,β,α)=p(P∣σP2)p(Q∣θ,σQ2)p(θ,w∣α,β)p(R∣P,Q,σ2)p(P,Q,\theta, R|\sigma_P,\sigma_Q,\sigma, \beta,\alpha)=p(P|\sigma^2_P)p(Q|\theta,\sigma^2_Q)p(\theta,w|\alpha,\beta)p(R|P,Q,\sigma^2)

As often happen, it is more convenient to work with the log-likelihood.
By working on each component independently:

First Component:
CTM_log1.png

Second Component:
CTM_log2.png

Third Component:
CTM_log3.1.png
That, after imposing α=1\alpha=1 for convenience, becomes:
CTM_log3.2.png

Fourth Component:
CTM_log4.png

Then, by riassemblying everything:
CTM_final_log.png

To estimate the parameters, we obtaine θ\theta using standard LDA and then optimize for [P, Q] via gradient ascent (originally in the paper they used Coordinate Ascent method for optimization, but this is computationally heavier).
The gradient of the log-likelihood with respect to [P,Q] is:
CTM_gradient.png

References

  1. Paper
  2. TowardsDataScience
  3. Probabilistic Matrix Factorization
  4. LDA

Code