As we have outlined in our previous article about churn modeling, loss of customers, subscribers or employees is one of the biggest challenges a business has to face. Hence the need to provide the data science assistance in understanding the reasons for churn and estimating the probability of its occurrence. In this article, we will show how we have approached the problem when working for one of our clients.
The churn prediction use case
Our client was looking to better understand the churn behaviors of their employees, working in the fields where there is an especially high turnover rate – gastronomy, hospitality or retail. Employees in these fields work on a shift basis with non-fixed schedules, meaning that their work hours may change from day to day or week to week, which might result in differing shift lengths. While employee’s timetable is usually planned a month in advance, changes are pretty common, leading to unplanned shifts. Our goal was to find how these changes, shifts’ lengths, pay, and demographic factors affect churn probability.
We were provided with a complete work history for employees engaged in the years 2015-2018. For every employee, we had access to their age, sex, education level and a city that they worked in, along with their shift history, including planned shit dates and hours and the actual time the employee worked for.
Defining the churn
As Greg has mentioned in his introduction to churn modeling, churn events can be defined in multiple ways and the choice of representation is probably the most important decision in the project. After gathering the requirements of our client and taking into consideration their specific field of activity, we have decided on setting the job end date for each employee on their last shift date and marking 30 days leading to this event as ones with a high probability of churn. This came mostly from the realization that we do not have data on when the employee actually gave notice or was dismissed.
After defining the churn event and cleaning the data, we have performed correlation tests on the readily available variables (mostly demographic ones) and concluded that linear regression models won’t give us a satisfactory result. Additionally, we had to incorporate data on work schedules in a way that would account for shifts’ varying lengths and divergence from planned schedules.
Inspiration for data representation came from the winning paper in WSDM Cup 2018 Churn Challenge – “Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data“. We also had explainability in mind, as our client was interested in better understanding the phenomena, and not only receiving a measure of probability for each employee. Hence we have decided to forgo sequential models, as results of Recurrent Neural Networks tend to be hard to explain. Our attention went to tree-based models (Random Forests, Gradient Boosting Machines) and this has finally determined a data transformation method.
Each employee in our raw dataset became represented as multiple rows, for each day of said person’s work. In each of the rows, we have written in the static part (not changing from row to row): demographic information, job specification, and the dynamic part containing aggregations for current shifts data and previous periods.
Graphic description: Simplified data transformation process for one employee, with 2 last days marked as having high churn probability.
This data transformation approach leads to high dimensionality with a number of rows being a sum of all days worked by all employees. We were prepared for such a big dataset, but in case of spacing issues transformation may be performed with bigger granularity, eg. final dataset having only every second workday, or some of the rows may be randomly removed, which also acts as a way of regularization.
Models and explanations
Having the transformed dataset, we were able to train a Machine Learning model for the task of churn prediction. Our choice of model was quite straightforward, as we had to take into consideration:
- Model explainability
- Model efficiency
- Model robustness on larger datasets
We have opted for the Gradient Boosted Tree model implemented in LightGBM library. The model was trained, including hyperparameter optimization, to achieve the best results on the AUC score. We then used eli5 library to pull feature importances for all training features employed in the tree models.
To allow easy usage, we have provided a web API to the trained model and the explanation component and packaged the whole project into a docker image, which allowed for smooth integration with the client’s internal system.
Our client tested the solution thoroughly and reported that results were both consistent with their intuition on the subject and sometimes surprising, providing a novel view of the issue. We were happy to hear that our model provided accurate results with explained predictions and methods we used allowed the client for quick integration and production use.
- person by Adrien Coquet from the Noun Project
- chef by Adrien Coquet from the Noun Project
- schedule by Adnen Kadri from the Noun Project
Author other articles
3 languages for Machine Learning
Realtime recommendation systems for session-based user activities
How to build event-based models