Implementing Probabilistic Matrix Factorization

Advantages of Probabilistic Matrix Factorization

Here is a basic implementation of Probabilistic Matrix Factorization (PMF) using just NumPy. This implementation doesn’t use any probabilistic programming libraries, but rather performs the optimization using stochastic gradient descent (SGD).

Step 1: Import Necessary Libraries

Python

import numpy as np

Step 2: Define PMF Class

This code defines a Probabilistic Matrix Factorization (PMF) class to perform matrix factorization using Stochastic Gradient Descent (SGD). It initializes user and item latent factor matrices, updates them iteratively based on observed entries in the rating matrix, and computes the Mean Squared Error (MSE) to track the training progress. The class includes methods for training, predicting ratings, computing MSE, and generating the full predicted matrix.

Python

class PMF:
    def __init__(self, R, num_factors, learning_rate=0.01, reg_param=0.01, num_epochs=100):
        self.R = R
        self.num_users, self.num_items = R.shape
        self.num_factors = num_factors
        self.learning_rate = learning_rate
        self.reg_param = reg_param
        self.num_epochs = num_epochs

    def train(self):
        # Initialize user and item latent factor matrices
        self.U = np.random.normal(scale=1./self.num_factors, size=(self.num_users, self.num_factors))
        self.V = np.random.normal(scale=1./self.num_factors, size=(self.num_items, self.num_factors))

        # Training using Stochastic Gradient Descent (SGD)
        for epoch in range(self.num_epochs):
            for i in range(self.num_users):
                for j in range(self.num_items):
                    if self.R[i, j] > 0:  # Only consider observed entries
                        prediction = self.predict(i, j)
                        error = self.R[i, j] - prediction

                        # Update latent factors
                        self.U[i, :] += self.learning_rate * (error * self.V[j, :] - self.reg_param * self.U[i, :])
                        self.V[j, :] += self.learning_rate * (error * self.U[i, :] - self.reg_param * self.V[j, :])

            mse = self.compute_mse()
            print(f'Epoch: {epoch+1}, MSE: {mse}')

    def predict(self, i, j):
        return np.dot(self.U[i, :], self.V[j, :])

    def compute_mse(self):
        xs, ys = self.R.nonzero()
        predicted = self.full_matrix()
        error = 0
        for x, y in zip(xs, ys):
            error += (self.R[x, y] - predicted[x, y]) ** 2
        return np.sqrt(error)

    def full_matrix(self):
        return np.dot(self.U, self.V.T)

Step 3: Generate Synthetic Data and Train PMF Model

This code generates synthetic data to simulate a recommendation system. It first creates true latent factor matrices for users and items, then generates an observed rating matrix by multiplying these factors and adding noise. Some entries are masked to simulate missing values. The Probabilistic Matrix Factorization (PMF) model is then trained using this data to predict the missing values. Finally, the reconstructed rating matrix is printed, showing the model’s predictions for the missing entries.

Python

# Generate synthetic data
np.random.seed(0)
num_users = 100
num_items = 50
num_factors = 10

# True latent factors
U_true = np.random.normal(0, 1, (num_users, num_factors))
V_true = np.random.normal(0, 1, (num_items, num_factors))

# Generate the observed matrix
R = np.dot(U_true, V_true.T)

# Add noise
R += np.random.normal(0, 0.1, R.shape)

# Masking some entries to simulate missing values
mask = np.random.rand(*R.shape) < 0.8
R[~mask] = 0

# Train PMF model
pmf = PMF(R, num_factors=num_factors, learning_rate=0.01, reg_param=0.01, num_epochs=50)
pmf.train()

# Reconstructed matrix
R_reconstructed = pmf.full_matrix()

print("Original Matrix with Missing Values:\n", R)
print("Reconstructed Matrix:\n", R_reconstructed)

Output:

Original Matrix with Missing Values: [[ 2.2817193 2.44978245 0. ... 0. 2.31801743 -6.87439354] [ 2.22888495 2.61800886 0.72803179 ... -0.8421724 1.15581792 0. ] [-0.37044918 2.93442797 4.36216321 ... 0. 2.10912286 -7.26262222] Reconstructed Matrix: [[ 1.42961502 1.86599571 -2.06899195 ... 1.05251486 2.02129157 -1.04044408] [ 0.04237691 2.69347594 0.80094165 ... 0.54012532 0.73083943 2.62283316] [ 2.16977192 3.51233722 4.08734245 ... 0.4136497 2.63007494 2.80993158]

Conclusion

This implementation provides a straightforward and basic approach to Probabilistic Matrix Factorization using NumPy and SGD. It effectively reconstructs the original matrix and handles missing values in the dataset, demonstrating the core concepts of PMF in a clear and concise manner.

Probabilistic Matrix Factorization

Probabilistic Matrix Factorization (PMF) is a sophisticated technique in the realm of recommendation systems that leverages probability theory to uncover latent factors from user-item interaction data. PMF is particularly effective in scenarios where data is sparse, making it a powerful tool for delivering personalized recommendations.

This article explores the fundamentals of Probabilistic Matrix Factorization, its advantages, and how it is implemented in recommendation systems.

Tags:

#AI-ML-DS With Python #Data Science Blogathon 2024 #AI-ML-DS #Blogathon #Machine Learning #Machine Learning

Advantages of Probabilistic Matrix Factorization

Implementing Probabilistic Matrix Factorization

Step 1: Import Necessary Libraries

Step 2: Define PMF Class

Step 3: Generate Synthetic Data and Train PMF Model

Conclusion

Probabilistic Matrix Factorization

Similar Reads