Product recommender systems

Updated: Jul 16, 2019

How do we know what the customer wants? And if we already know what she is going to buy anyway, why would we waist marketing message on that product, why not steer him to buy something he didn’t plan? Before stepping into philosophical topic of changing the future by knowing it, let’s first review the basics of product recommender systems.

Selecting a product to recommend can be very simple like product popularity ranking or abandoned cart products. It can be smarter, like collaborative filtering or content-based. It can also be very smart, like deep learning and behavioral modelling. What is important to notice, however, is that in this case the smartest heavy machinery is not necessarily the best choice. The best choice is dictated by the data availability. If we only have few purchases and anonymous shoppers, there is really no need to go wild. The model can be only as good as our data are.

Content based recommenders

As the name suggests, the content-based solutions are focusing on descriptions of products and users. Think of Pandora or LinkedIn. It is all about designing features that will capture nature of products and nature of customers. After that, it’s all about matching. Getting the product features can be trivial (like LinkedIn where job descriptions are given by employers) or more tricky (like Pandora where songs need to be labeled). In any case, the product cold start is solved, because even brand new products will have features and they can be matched to buyers. New buyers are more challenging, especially if required customer features are activity based. Bottom line – is it’s like apples to apples principle.

“The smartest heavy machinery is not necessarily the best choice. The best choice is dictated by the data since the model can be only as good as our data are.”

Collaborative filtering

This one is all about customer product interactions. Think along the lines of restaurant review sites or fashion retail sites or Netflix (in the old days). Algorithms are based on user-item matrix, meaning who bought/reviewed/graded what. The philosophy is that similar people like similar things, so we don’t need to calculate similarity of customers through their features nor similarity of products, we just capture who likes what. There are trivial solutions like Nearest Neighbour looking for simply subset of similar people (or products), and there are better ones like dimensionality reduction based solutions. The star player is Matrix Factorisation. The bold statement “better ones” comes from the fact that this group of solutions is intended for high number products / sparse interactions matrix types of business.

Think of tens of thousands of products and customers buying 1-10 products throughout the history. There is no similarity metric or classifier that would handle it. We are far from claiming that collaborative filtering has the magic wand but it’s safe to say that it’s your best bet.

The idea is to take this big matrix of user-item interactions and transform it into a product of user matrix and item matrix. The number of latent variables (dimension of new matrices) is matter of reason. Choosing just size 1 would mean popularity contest, in a sense of 1 score assigned to each customer and each product. Dimension 2 already creates some segmentation in a sense of two subsets with different preferences. Pushing it too far however is just overfitting, meaning we lose the generalization power.

The downsides are that these matrices can be the size of terabytes, followed by no room for content/context, and crowned by no dynamics incorporated. Taking it step by step, terabytes are an obvious problem, you can’t decompose matrix unless you load it to RAM. Content/context issue is reflected in the fact that we often have some information about customers or products like age, gender, product price,…and strictly speaking collaborative filtering does not care about those – even though we intuitively know it’s important. Finally, collaborative filtering is all about overall customers’ preference throughout history. Customer’s fashion style evolves and customer’s needs are seasonal and there is no room for dynamics in these solutions.

Hybrid models

This article strives to structure different recommendation systems and the structure of content/collaborative/hybrid is one choice but it means that the hybrid is grouping Factorization machines, Deep learning, and Bayes behavioral modelling, all of which are wide research fields on their own. What puts them together is that they all incorporate customer and product features and customer-product interaction. What also joins them is that they all need a lot (and I mean a lot) of data to train them. And that’s fine, think Amazon or Netflix (modern era). Whether you rephrase the model as multilabel classifier or Markov Chain Monte Carlo, it is the most reason-based, reality dictated model. If you are hoping to understand the details of these models through blogs, just don’t, hire a data scientist.

Closing word

This article is meant as a textbook, out-of-the-box algorithms structure overview. There are many, many hacks and variations that will move solutions into gray areas. The point is that there is no one state-of-art, “you have to use it” solution. Be aware of your data, your problem setup, your scalability and transparency requirements. Be aware that your data only contains the past, which you are trying to change, so unless you have room for thousands of A/B tests you should get into reinforcement learning. A good solution is usually combination of everything you ever thought of. Even though people are creatures of habit, capturing it in a model remains a worthy challenge.