Before, answers can be found in algorithms, programming languages and the fastest databases. During, would require the diagnostics of the problem in the specific case, it can be related to the code optimization or switching to more suitable architecture.
Before building the recommender
Slow recommender can be seen as an engine that utilizes the historic data and can’t adjust quickly to the changes in the user’s behaviour. It can use algorithms that heavily depend on the huge and sparse matrices of user-item interactions generated from the historic data. Thus time needed to calculate recommendations from the raw data or refresh them can be measured in hours or even days. In some algorithms (like item-based collaborative filtering) the whole rating database is searched for the single recommendation, leading to the scalability problem. Although historic data is very important in serving recommendations, a more light-weighted approach can be used alongside. For example, it’s possible to implement online item-based collaborative filtering, with online updates and calculate item similarity from a narrow set of most plausible items, like TencentRank (implemented by the Tencent Group). Also, the recommender can be based on data from current and last sessions, not the whole history.
Although Python is the most popular language for machine learning and data science, it speeds up and eases the process of developing new models and testing new ideas, but it may not be the most suitable language for streaming applications.
Everything begins with serving the requests. Python is slower than compiled languages, like Scala, Java, Rust or Golang (check out gorse, offline recommender system in Go). Personally, we like Rust (here you can check why we think – rust is language of the future) and are currently rewriting our recommender to speed it up even more.
During the development
My recommender system is slow – what should I do?
First of all, thoroughly diagnose the problem. There are tools to profile your code so that you get quantitative information about time consumed by each component. That gives you two things:
- You will save a lot of time spent on fixing things that were not broken.
- You will be able to compare your initial to optimized code and tell whether (and how much) optimization helped. You will also be able to make some experiments and, in the end, pick a solution that reached the highest speed.
We will discuss the most frequent scenarios and give some advice on how to address such problems.
It seems that some chunks of my code are slow
Thanks to the profiling that you did before, you exactly know where the issue is.
Firstly, it might be just some code that can be rewritten in more performant form. You know what to do then :).
Secondly, it can also be some library code. In that case there is no universal solution. Sometimes you can just find a more efficient implementation. Sometimes you glue a couple of other libraries together to come up with the efficient code with the same functionality. Sometimes, you can be forced to write it from scratch (maybe even in another language?) if it’s worth it.
Finally, it can be the database issue as well. Maybe your database is not optimized for the queries you often perform. Sometimes adding indices solves the problem. Sometimes it does not and in that case you may consider using another database. For example, while building a real time recommender for Sephora we used Redis. First of all, it is a key-value store, which reduces search times from linear (in case of not indexed table) or logarithmic (indexed table) to constant. Secondly, it keeps data in the memory instead of the disk, which greatly increases read/write speed.
My API is blazing fast when I test it, but very slow in production
Again, there might be a couple of reasons.
Firstly, when you are running the whole recommender system locally, you do not experience any delay related to sending requests via the Internet. Then, when you deploy it – it might happen that different system’s components are located in different places, e.g. the recommendation api server is in one place and the database in the other, 200 kilometers away. The requests need time to travel via the Internet and this may be the main cause of experienced latency. In such cases, they should be moved to some common location. For example Microsoft Azure came up with Proximity Placement Groups to address this issue (they ensure that your virtual machines within the group are located in the same data center).
Another possibility is that your API gets choked by a massive amount of requests it gets in the production environment. Again, a couple of things could be tried here to boost speed.
Firstly, other API could be used. While working for Sephora’s recommendation engine we achieved a significant speed up just by switching from Flask + Gunicorn to Sanic API.
Secondly, incoming requests can be gathered in a queue and then processed in batches. One may object that this will make users wait longer for their responses, because of additional time spent waiting for requests to form a batch. Note however that since we are trying to respond to a massive number of requests, the user would have to wait anyway due to the limited number of workers. Moreover, they would have to wait even longer because computing responses to e.g. 1000 requests separately can be far more time consuming than computing them in a batch.
The third thing that could be done is horizontal scaling. You could wrap your recommender in a Docker container and then create multiple instances of it using Kubernetes. Finally, you employ a load balancer to direct traffic to them in an intelligent way. Kubernetes makes it possible to add more recommender containers when experiencing heavy traffic and reduce this number when they are no longer needed.
Our recommender and lessons learned
Our approach was to start from a simple framework first with the best known language throughout the company (Python) and by adding the additional complexity check the performance increase with relation to the speed decrease.
- Use in-memory key-value database (like Redis), especially if persistence is an important part of the application. In the best case scenario run Redis on the same server, or as close as possible to the application. The serialization and deserialization of data is the main disadvantage of Redis though. As a possible solution data can be stored in application’s memory.
- Group redis requests and send them in pipeline.
- For some cases batch updates are better than stream.
- Leave heavy computation to the single worker and run it occasionally.
- And last but not least, faster but simpler recommendations are in general better than more sophisticated but slower.