Federated learning in recommendations

Federated learning (FL) was introduced by Google to the machine learning community for the first time in 2016. From that time, slow but steady, the other companies are catching up to this technology and are starting to offer their solutions based on it as well.
Blog
Machine learning

Table of Contents

What is FL?

source: Federated Learning: Collaborative Machine Learning without Centralized Training Data

Federated learning is a fairly new approach in machine learning based on the idea of data decentralization. The company stores the global model on their server, sends model weights and training program to the users’ devices to train on their data locally. Once trained, the update for weights is sent back to the server, where it is averaged with the other users’ updates. Company then updates the global model on the server with that averaged update. Once done, the cycle repeats. From the client’s side the model locally doesn’t do any predictions; it is just trained on the device. The device is communicating with the server only while idle and connected to power. Additionally, the company doesn’t need to store those data in their databases, so it’s a win-win situation.

Federated learning can be:

  • Horizontal, where clients’ databases share same features
  • Vertical, where different clients could have different set of features

Google in their article form 2016 proposed Federated Averaging as an optimization algorithm, that has the following steps:

  1. Sample a random set of clients
  2. Compute updates for each client in parallel locally and send them to the server
  3. Average the clients’ updates
  4. Update the global model with the averaged update

While this approach brings better data protection and less clients data into the company databases, there are still some challenges:

Security. Those include attacks from the client-side to confuse the global model either by complicating the converges or leading the model to train towards the adversarial targets; GAN-based attacks from the model or client-side. Another question to consider is the privacy of the updates both as data itself (differential privacy) and as a computation problem (using the secure multi-party computation (MPC), homomorphic encryption (HE), and trusted execution environments (TEEs)).

Machine learning. Due to the fact that the device must be on idle and connected to power while sending updates to the server, updates are often expected to be unbalanced, non-IID (independent and identically distributed), temporarily unavailable and geographically biased. One of the key challenges is the optimization on those data with the regard to parallelization of the process for each client. Among others are hyperparameter tuning, architecture selection, using in the unsupervised settings.

Bias and fairness. In the settings where the data is decentralized the fairness and bias measurements can be a challenge, as the company doesn’t get the necessary information to check the algorithm.

Deployment. Maintaining the process is another challenge. The upload process is slower than download, so the new ways of updating the models are in development. You can read about two of them here.

Federated learning in RecoAI

While Google recently announced that they are rolling out the trial for Federated learning of Cohorts (FloC) for their advertising, we firmly believe that this technology can be used in recommendations.

There are two possible aspects where and how it can be used:

  • Recommendation generation. In this setting recommendations will be created in the browser on the go, although this will break the promise that the model won’t predict. There can be an interesting trand-off between privacy and generalization. It will generate recommendations and send updates only by seeing the effects of those recommendations. From the practical point, there are already some libraries in Python such as Tensorflow Federated and PySyft to create neural networks in federated learning framework, but challenges with cold start users and appropriate architecture will need to be solved.
  • Improving the ranking model. It is a more standard federated learning scenario. The main model can be trained in horizontal fashion to be optimized towards better ranking.

Why does it matter? At RecoAI we believe that we can create a better product by offering privacy to the user. We should be able to provide basic components of comfort, like deleting the event from the recommendations, opting-out from sending data, or getting recommendations, making us “forget” about some viewed or bought products. The core of our RecoAI is a Rust language, Rust does compensate for the slowness of the Python and heavy memory usage of Java, but specifically in this context, its ability to naturally compile to WebAssembly is a key. Being able to prepare recommendations in the browser for the specific user can bring security and privacy to another level.

Topics:

Data Scientist

Data Scientist

Other stories in category

BlogKaggle Days
4 – Nature never goes out of style!

4 – Nature never goes out of style!

4 – Nature never goes out of ...

Five continents, twelve events, one grand finale, and a community of more than 10 million - that's Kaggle Days, a nonprofit event for data science enthusiasts and Kagglers. Beginning in November 2021, hundreds of participants attending each meetup face a daunting task to be on the podium and win one of three invitations to the finals in Barcelona and prizes from Kaggle Days and Z by HPZ by HP.

Paras Varshney

16 Aug 2022

BlogKaggle Days
3 – Now you are playing with power

3 – Now you are playing with power

3 – Now you are playing with ...

"It was amazing," commented attendees of the third Kaggle Days X Z by HP World Championship meetup, and we fully agree. The Moscow event brought together as many as 280 data science enthusiasts in one place to take on the challenge and compete for three spots in the grand finale of Kaggle Days in Barcelona. Of course, we already know the winning teams that best handled the contest task. In addition to the excitement of the competition, in Moscow were also inspiring lectures, speeches, and fascinating presentations of modern equipment. As always, at Kaggle Days, a lot was going on.

Paras Varshney

16 Aug 2022

BlogKaggle Days
2 – Water Water everywhere, not a drop to drink

2 – Water Water everywhere, not a drop to drink

2 – Water Water everywhere, n...

"Happy to be part of shaping the future." "It's the Way of The Future." That is how the participants summed up another meetup organized as part of Kaggle Days, a non-profit event for data science enthusiasts who want to grow and compete for prizes under the watchful eye of top Kaggle mentors and grandmasters. The second meetup in New Delhi is behind us. Three hundred participants, more than one hundred teams, and only three invitations to the finals in Barcelona mean that the excitement could not be lacking.

Paras Varshney

16 Aug 2022