Recommendation systems are widely used in many domains : E-commerce including large companies like Amazon, on-demand video streaming platforms such as Youtube and Netflix and content recommendation in social networks like Facebook or Instagram. Recommender systems aim at suggesting to the client some products/services when there is a great chance that He/She would like them, of course based on the user’s historical behavior. These systems have been enhanced over the years thanks to Machine learning and Deep Learning advances.
In a general setting, some users bought or used some items. In addition, these users expressed their satisfaction of the products or services as ratings ; let’s say from 1 to 5 (low to high satisfaction). The recommender system is designed to predict the ratings one user could give for an item if He/She had used It, then accordingly It will recommend the items with high rating predictions for the user.
There are mainly two categories of recommendation methods :
- Content-Based Filtering : In order to make recommendations for a user, the system relies on the user’s profile and on the items’ features. Some user features can be explicitely provided by the user, others can be implicitely deduced from His/Her previous liked or used items.
Let’s show you a simple example. Suppose you’re a Netflix client and that you already watched several action and comedy movies, you also watched many movies featuring Henry Cavill or Robert Downey Jr. The system will store all of this information about you in a feature vector and will find movies that have the most similar feature vectors to your representation in terms of a defined similarity measure. Thus, The system will probably show you comedy and action movies similar to what you’ve watched before and will recommend to you movies featuring Henry Cavill or Robert Downey Jr. that you haven’t watched yet. This is roughly what content-based filtering is ; It is about describing the user’s profile and the items’ characteristics and matching the two based on a similarity function. Remember! Here we don’t use information about other users to infer what item a user might like.
Content-based filtering usually yields disappointing performances because the system can’t describe exactly the feature representation of a user based only on his existing interests. In addition, this method requires a lot of hand engineering to define item features (example of possible features for a movie : producer, country, actors, genre, story..etc). However, content-based filtering is a scalable technique since every recommendation takes into account only the user specific interests. This method can actually capture nich interests that are particular to only a few users.
- Collaborative filtering : This category of recommendation engines is so popular . It adresses some limitations of content-based filtering methods such as the feature engineering task. The main idea is to make automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). For instance, if user A and user B liked many common series {The lord of the rings, Vikings, Game of Thrones, The last Kingdom}and that user B liked the series “The Witcher” then It is likely that user A would like “The Witcher” too. So the recommender system relies on this assumption to recommend The Witcher to user A.
In collaborative filtering, we usually represent the problem as a matrix where rows are items and columns are users (or the inverse). The element i,j of the matrix represents the rating that user j gave to item i or It can take just the value 1 if user j liked item i , 0 if not.. The figure below illustrates this representation :
The goal of the recommendation in this setting is to predict unknown entries of the matrix R (blank values that we replaced by 0 in the figure) ; that is, assuming that similar users have similar tastes we predict whether a user will like a particular item. A basic solution for this problem is Matrix Factorization. Find out more about It in this tutorial of Google : https://developers.google.com/machine-learning/recommendation/collaborative/matrix
Collaborative filtering methods are efficient and do not require some domain knowledge but they do not resolve the cold start problem which consists in giving recommendation for a new user that never liked an existing item.