Leveraging machine learning to optimise class schedule for international language school

Rarely do projects fall into pure classification or regression problems. More often the problem is linked to a business process. In this case we managed to combine predictive task with optimization.

Table of Contents

Help students by helping the school

Wall Street English is one of the largest providers of English language education in the whole world. One of the factors responsible for their success is the unique method of teaching that combines offline and online activities in order to maximise learning efficiency. This is the story of how we helped them to improve their teaching system even more.

When optimisation meets predictions

Wall Street English students learn by completing online parts of the course. At the end of each unit they attend offline classes in order to reinforce their newly gained knowledge. The aim of this project was to use machine learning for predicting when a given student will finish an online unit and will be ready for an offline class. Having this information, optimisation comes into play to efficiently book offline classes with higher average occupancy and lower waiting time between finishing an online unit and attending an  offline class.

System overview

Our system has several tasks and it was only logical to divide it into separate modules. The first module connects to a database and uses Quill to handle SQL queries and retrieve information about students’ learning history. The second module is built around our own feature engineering framework called EventAI, it takes all the obtained information and transforms them into meaningful features, ready to be used by a machine learning algorithm. The third and fourth modules are responsible for training machine learning models and making predictions. The fifth and last module takes predicted unit finish times and creates an optimised schedule for offline classes. We discuss these modules in greater detail later in the article.

Handle SQL with Quill

The role of the first module was to connect to a database and retrieve all required information about a teaching center, teachers and students, especially their learning history. We decided to use Quill, as it provides a Quoted Domain Specific Language to express queries in Scala. As a result, we obtained needed information in a form of type-safe case classes. It gave us certainty that all our queries are correct and we extract the proper information.

Feature engineering made easy

The second part of our system is responsible for processing students’ information in order to predict, with high accuracy, when a given student will finish an online material. To do that, we used EventAI, our very own data processing framework for feature engineering.

We decided to go with EventAI because of two main reasons. First, it’s fast and really easy to use. Second, it allowed us to save a lot of time and avoid mistakes during the feature engineering process. For this project we had to create new features from students’ data, luckily declaring features using EventAI is concise and simple, yet offers a lot of possibilities. See below an example of a feature declaration in EventAI:

Here we defined one feature, namely average time between lessons for last month, choosing only events called Lessons and making sure we take into account only lessons that a student has passed. We also specified the time window to be equal to last month.

As you can see from this example, declaring features in EventAI is very simple. There are already a lot of predefined aggregations (we used here Average time between aggregation), but in case there’s a need for a more complicated function, we can create one with no fuss at all.

One of the key aspects of efficient feature engineering is deciding what features are required for the model to make accurate predictions. To include seasonality and preferences in students behaviour, we extracted day of the week and day of the year when students prefer to study online. We also included time since last activity, time since first ever activity and so on.

There was one more thing we had to define, our target feature. In our case it should be time to next activity before an offline class. Fortunately EventAI allows defining future windows for processing events, so all we had to do was to define the time window as next 12 months..

Having all features defined, EventAI uses Spark to process all the data and save it for another step.

Predict the future

Predicting human behaviour is never easy. We are just so unique and there are so many factors that can influence our future. Nevertheless we took our shot at it. Having all the meaningful data extracted in the previous step, we decided to use decision tree classification. Our approach was as follows: decide for what period we want to predict online units finish times. Then, train our classifier, where each day within a prediction period is a unique class. To complicate things a bit more, we wanted to predict when a given student will finish not only the next unit, but even two or three more. To do this, we had to create separate classifiers, i.e. one classifier to predict the next unit, one to predict the 2nd next unit and so on and so forth.

This gave us already a pretty decent accuracy, but we wouldn’t be ourselves if we didn’t want to be even better. Therefore we decided to create one more classifier for each unit, but with just two classes this time: that a student will finish a unit within our prediction period or not. Now, after combining all classifiers together, we were finally satisfied with the results and ready to go to the next, final step.

Optimisation with simulated annealing

So far we extracted important information, trained our models and predicted when students will finish online units and be ready for offline classes. Now it’s time to put everything together and create a schedule, optimised for maximising occupancy of classes and minimising waiting time between finishing a unit and attending a class. There are a lot of ways to solve this problem, we decided to get inspired by nature and used a simulated annealing approach. Simulated annealing is a probabilistic approach to approximate a global optimum of a given function, especially useful in case of discrete events. In our case we wrote the cost function, taking into account what we want to optimise. We also had to think about all constraints to our solution – students and teachers can’t be in two places at the same time, classes have certain capacity, students have preferences when it comes to classes’ times etc. Then we wrote modifying functions, that were able to to change our schedule, add, move and delete classes, add, move and remove students from classes etc. Thanks to this approach we were able to find a better overall solution, even though some steps caused cost function value increase.

Putting it all together

Our system was almost complete. Building an easy-to-use API was a final touch. We created the API that takes requests with specified English school Id, prediction dates and other parameters. We wanted to make our system as fast as possible, that’s why we used akka actors to communicate states between modules, allowing certain modules to run in parallel. That gave the system a huge speed boost.

Key takeaway points

It was an interesting and challenging project and we learned a lot while developing the solution. We’d like to summarize our experience by indicating most important takeaway points:

  1. Feature engineering is crucial. In our opinion, this was the most important step. Your model is going to be just as good as are the features you declared. Using EventAI for feature engineering allowed us to save a lot of time and avoid hard to catch mistakes.
  2. Parallelise your system. Creating separated modules for each step and using akka actors to ensure communication between them significantly improved execution times. Think outside of the box, not everything has to be executed in a particular order.
  3. User experience matters. Concise, easy to use API makes a huge difference. After all, it’s your client that’s going to use the system you developed, not you.
  4. Don’t forget about documentation. When the system was complete, we handed it over to the client’s IT department. Providing them with well-written and extensive documentation speeded up the deployment process and minimised the amount of time that had to be spent on explaining how the system works.

If you are interested in how we can help you, just drop us a message at hello@logicai.io.

Topics: Scala

Data Scientist

Data Scientist

Other stories in category

Predicting student progress, optimizing classes schedules

Predicting student progress, optimizing classes schedules

Predicting student progress, optimi...

Predicting any kind of human behaviour is a difficult task, predicting student progress in online/offline courses is no exception. Our client, an international language school, wanted to improve scheduling of their offline classes by predicting students’ progress in the online part of the course.

LogicAI Team

16 Nov 2022

BlogKaggle Days
4 – Nature never goes out of style!

4 – Nature never goes out of style!

4 – Nature never goes out of ...

Five continents, twelve events, one grand finale, and a community of more than 10 million - that's Kaggle Days, a nonprofit event for data science enthusiasts and Kagglers. Beginning in November 2021, hundreds of participants attending each meetup face a daunting task to be on the podium and win one of three invitations to the finals in Barcelona and prizes from Kaggle Days and Z by HPZ by HP.

Paras Varshney

16 Aug 2022