None

Fast recommendation system vs slow recommendation system

Liudmyla Kyrashchuk Data Scientist
2 weeks, 1 day ago · 4 min read Rust Recommendation system Redis
Table of contents

In our experience one of the TOP 10 most frequently asked questions, related to the recommendation systems, is "How to make a recommendation system fast?". Although there are a lot of possible answers, they often depend on when you asked that question: before or during the product development.

Before, answers can be found in algorithms, programming languages and the fastest databases. During, would require the diagnostics of the problem in the specific case, it can be related to the code optimization or switching to more suitable architecture.

Before building the recommender

Algorithms

Slow recommender can be seen as an engine that utilizes the historic data and can't adjust quickly to the changes in the user's behaviour. It can use algorithms that heavily depend on the huge and sparse matrices of user-item interactions generated from the historic data. Thus time needed to calculate recommendations from the raw data or refresh them can be measured in hours or even days. In some algorithms (like item-based collaborative filtering) the whole rating database is searched for the single recommendation, leading to the scalability problem. Although historic data is very important in serving recommendations, a more light-weighted approach can be used alongside. For example, it's possible to implement online item-based collaborative filtering, with online updates and calculate item similarity from a narrow set of most plausible items, like TencentRank (implemented by the Tencent Group). Also, the recommender can be based on data from current and last sessions, not the whole history.

Programming languages

Although Python is the most popular language for machine learning and data science, it speeds up and eases the process of developing new models and testing new ideas, but it may not be the most suitable language for streaming applications.

Everything begins with serving the requests. Python is slower than compiled languages, like Scala, Java, Rust or Golang (check out gorse, offline recommender system in Go). Personally, we like Rust (here you can check why we think - rust is language of the future) and are currently rewriting our recommender to speed it up even more.

During the development

My recommender system is slow - what should I do?

First of all, thoroughly diagnose the problem. There are tools to profile your code so that you get quantitative information about time consumed by each component. That gives you two things:

We will discuss the most frequent scenarios and give some advice on how to address such problems.

It seems that some chunks of my code are slow

Thanks to the profiling that you did before, you exactly know where the issue is.

Firstly, it might be just some code that can be rewritten in more performant form. You know what to do then :).

Secondly, it can also be some library code. In that case there is no universal solution. Sometimes you can just find a more efficient implementation. Sometimes you glue a couple of other libraries together to come up with the efficient code with the same functionality. Sometimes, you can be forced to write it from scratch (maybe even in another language?) if it’s worth it.

Finally, it can be the database issue as well. Maybe your database is not optimized for the queries you often perform. Sometimes adding indices solves the problem. Sometimes it does not and in that case you may consider using another database. For example, while building a real time recommender for Sephora we used Redis. First of all, it is a key-value store, which reduces search times from linear (in case of not indexed table) or logarithmic (indexed table) to constant. Secondly, it keeps data in the memory instead of the disk, which greatly increases read/write speed.

My API is blazing fast when I test it, but very slow in production

Again, there might be a couple of reasons.

Firstly, when you are running the whole recommender system locally, you do not experience any delay related to sending requests via the Internet. Then, when you deploy it - it might happen that different system’s components are located in different places, e.g. the recommendation api server is in one place and the database in the other, 200 kilometers away. The requests need time to travel via the Internet and this may be the main cause of experienced latency. In such cases, they should be moved to some common location. For example Microsoft Azure came up with Proximity Placement Groups to address this issue (they ensure that your virtual machines within the group are located in the same data center).

Another possibility is that your API gets choked by a massive amount of requests it gets in the production environment. Again, a couple of things could be tried here to boost speed.

Firstly, other API could be used. While working for Sephora’s recommendation engine we achieved a significant speed up just by switching from Flask + Gunicorn to Sanic API.

Secondly, incoming requests can be gathered in a queue and then processed in batches. One may object that this will make users wait longer for their responses, because of additional time spent waiting for requests to form a batch. Note however that since we are trying to respond to a massive number of requests, the user would have to wait anyway due to the limited number of workers. Moreover, they would have to wait even longer because computing responses to e.g. 1000 requests separately can be far more time consuming than computing them in a batch.

The third thing that could be done is horizontal scaling. You could wrap your recommender in a Docker container and then create multiple instances of it using Kubernetes. Finally, you employ a load balancer to direct traffic to them in an intelligent way. Kubernetes makes it possible to add more recommender containers when experiencing heavy traffic and reduce this number when they are no longer needed.

Our recommender and lessons learned

Our approach was to start from a simple framework first with the best known language throughout the company (Python) and by adding the additional complexity check the performance increase with relation to the speed decrease.

Lessons learned:

Liudmyla Kyrashchuk Data Scientist
2 weeks, 1 day ago · 4 min read Rust Recommendation system Redis

Author other articles

Related articles