While the loss of customers is a challenge that every business has to face, it’s good to know that there are some data science techniques that can prevent this kind of loss. Such a phenomenon – when a group of customers or subscribers leave given supplier in the defined time period – is often referred to as “churn” rate or simply “churn”. The longer a customer buys your products or services, the better for your business, so you want to keep as many of them as possible for as long as possible.
You’ll probably agree with me, that businesses want not only to keep existing clients but to turn as many of them into loyal and frequent buyers as possible. Usually, companies have great ideas on how to encourage clients to keep buying their products – it may be a sale, promotion, giveaway, free trial… It’s great when you know what to do, but there is one missing piece – which customers are likely to leave us and when? This question is so complex that actually can be divided into a few smaller tasks. In this blog post, I’ll describe them one by one.
Define “customer”
This may sound trivial, but you should know your customer. If we are trying to model customer churn, you should know who or what exactly will be “churned”?
- A customer in a given point in time?
- A subscription plan?
- Maybe “non-churned” period for one of these?
There is no universal answer here – you should pick the definition that best fits your (or shareholders’) problem.
Define “churn”
Ok, we may have some definition of churn already, but is it really helpful? How to define the correct time period? You have to face the truth – finally every customer or supplier will churn if we wait long enough. Is this churn the same as “regular” customer rotation (e.g. when a contract comes to an end because of time). Again, there is no silver bullet here – choose the item period best for your business needs. For example, you can try to predict it indirectly, by predicting “time to event”. In result, you’ll get the amount of time in which given customers may withdraw from your services. Further, in this article, I’ll show some approaches to churn modeling which deals with this problem.
Be aware of censored data
Churn prediction is, by definition, a time-based problem. For each customer, subscriber, etc. (essentially, any “record” in your data source) you will have different amounts of information and you’ll have only the information about events that already happened. For example, you may have information from the last 2 years about seasoned customers and only 2 weeks (or days) about the newest ones. In other words, you have access to only the data from the past, but timespan of this “past” may differ among your customers. Moreover, your data about all these groups will be limited to a given moment in time – the moment of data extraction at best. This is your “now” in such scenario – the data after this point are “censored”.
Because we only recorded event-data from the observed past (from when we first saw the customer up until “now”), after the last seen event (e.g. purchase, transaction) we don’t have data for the actual time to the next event, which yet has to be observed. In that sense, our knowledge of data is limited to a lower bound that we can use for training. The model we choose for churn prediction should take this into account, which brings us to the next part….
Choose your model carefully
When you answer all aforementioned questions (probably, even more, depending on your business problem), you may think about the right model. They come in different flavors and choosing the best one may not be an easy task. However, if you really brainstormed previous questions, this choice might be a bit easier. Let’s consider some common approaches to churn modeling.
Approach 1: Sliding box model
This approach avoids modeling time to event (churn) directly and focuses on predicting if an event occurred in a predefined time frame – our “box”. Deciding on how big this time frame should be is somewhat arbitrary – once again think which size best suits your business needs. This approach is fairly easy to explain and allows us to use common classification algorithms, including state-of-the-art boosting methods. It was also proven useful in situations when you have well-defined groups of customers and want to target them with specific campaigns. But this comes at a price – what you get as an output is a probability of N days without a (churn) event. Translating it into actionable “insights” may be cumbersome. Choosing the right size of the “box” is often not easy, too.
Approach 2: Learning-to-rank approach
In sliding box models, we defined churned in a binary way. However, we may want to know some “grades” here, that is, which customers are more churned than others. In such a scenario you can rank customers according to the risk of churn. This approach induces some order – let’s say we know there was at least say 5 days until an event, we can compare this to when we know that there were 3 days to an event (finally, 3<5). The customer with fewer days to the event is more “churned” than the other. In the simplest scenario, such ranking is really defined by all such pairwise comparisons. This approach also has some pitfalls – the training dataset (training time) grows quadratically because the dataset consists of pairwise combinations of all the observations (if you choose more complex approach it may grow even faster). Moreover, our results are somewhat “relative”. We may only be able to answer whether a customer is more churned than someone else, but answering if an individual customer is predicted as churned or not may be an entirely different issue.
Approach 3: Survival analysis
This kind of tools may not be an obvious choice since it was developed for medical reasons to predict if patients of a given population will survive in a certain timespan. Traditionally a tool for health professionals can be also used to predict customer churn. Although it may sound morbid, churn event replaces “death” in such scenarios. However, because of its origin, survival analysis methods can handle censored data quite well. On the other hand, they suffer from problems of their own – in many scenarios they may have lower predictive power than modern classification algorithms. There are some attempts to combine these two, e.g survival analysis with boosting trees, with promising results, but the real breakthrough is yet to come.
Now you have some knowledge about common issues associated with churn prediction and popular predictive methods for such problems. This list is by no means complete – each of these methods can be subject of dozens of articles. I hope that this brief introduction gave some insight and allowed you to see your business problems from a new perspective. In the next part, I will cover specific case-study in more detail. Stay tuned.
Interested in doing such an analysis for your business? We are here to help!
Contact me to get answers to your questions: greg@logicai.io