Hotel with big data in a timely manner to help stop

Editor’s note: This article is from the micro-channel public number “HyperAI super nervous” (ID: HyperAI), Author: Come little nervous.

Content summary:

Nowadays, big data has been applied by all walks of life, and the hotel industry is no exception. Making full use of big data enables hotels to predict changes in market demand, conduct intelligent decision-making analysis, and improve operating conditions.

Nowadays, major OTA (Online Travel Agency) platforms have greatly facilitated people’s travel, hotel accommodations, scenic spots tickets, etc., and reservations can be easily completed with the touch of a finger.

Dozens of hotel and homestay booking platforms at home and abroad

In order to attract more users to book, these platforms will encourage merchants to set more relaxed cancellation policies, such as free cancellation at any time, or free cancellation within a limited time.

The world’s largest online hotel booking website Booking (Booking website), with the advantage of free cancellation, is loved by the majority of travellers.

However, for users, “free cancellation” is very nice, but for hotels, it is very big. If an order is temporarily cancelled, it usually causes the following losses to the hotel:

  • The cancelled room cannot be sold in time, and the hotel loses revenue;

  • The hotel lowered prices and sold cancelled rooms, reducing profits

  • In order to book these rooms as soon as possible, the hotel needs to increase the cost of additional promotion and distribution channels;

As users can release hotel pigeons at any time, is there any way for the hotel to minimize losses?

Manuel Banza, a Portuguese Business Analyst (Business Analyst, BA, this position is equivalent to the product manager of an IT company), has more than 5 years of experience in hotel management. He used publicly available European hotel booking platform data and found it easierThe user characteristics of canceling the order to help the hotel stop loss in time.

From nearly 120,000 hotel booking data, we found patterns

As a data science enthusiast, Manuel Banza started with data science and machine learning.

He first conducted a comprehensive analysis of a “hotel booking demand data set” (Hotel booking demand). The data set contains 32 dimensions of data for general hotels and resort hotels, including:

The user’s nationality, booking time, stay time, the number of adults and children or babies, whether the order was finally cancelled, the number of times the user canceled the order before this order, etc.

Hotel Booking Demand

Hotel reservation demand data set

Issuing agency: University of Lisbon, Portugal

Number included: a total of 119390 pieces of data, 32 dimensions

Data format: csv

Data size: 16.9 MB (compressed file 1.3 MB)

Address: https://hyper.ai/datasets/14866

Part of the data display

Through statistics, Manuel Banza found that in a year, there were many users who cancelled hotel bookings.

In 2018, 49.8% of booking orders on the OTA platform Booking canceled their orders; on HRS Group, the proportion was even as high as 66%. Overall, the average cancellation rate of booking orders on multiple platforms in 2018 reached 39.6%.

Proportion of cancelled orders by various booking channels

Next, the author conducted an exploratory analysis of the data and found the following findings:

  • Compared with resort hotels, regular hotels are more likely to be cancelled by guests;

  • The cancellation rate in the Spring Festival and the summer is larger, while the cancellation rate in the winter is the lowest;

  • Among the various booking channels, users place the most orders on the OTA platform, and the OTA platform has the most cancelled orders;

  • The earlier the user booking time, the greater the uncertainty and the greater the probability of cancellation

The author stated that booking time is one of the most important indicators when analyzing hotel revenue performance. The analysis results show that the cancellation probability of reservations more than 1 year in advance is the highest, which is 57.14%; the cancellation probability of reservations within a week is the lowest, which is 7.73%.

The number of days in advance of booking (horizontal axis) is proportional to the probability of order cancellation (vertical axis)

It seems that the sooner the plan is, the sooner the change will be missed

Machine learning model: predict who is most likely to “release pigeons”

After a comprehensive analysis of the data set, the author began to build a model for predicting order cancellation.

The first step: data cleaning

First, deal with the missing values ​​in the data set. If the variable is a numeric variable, these missing values ​​must be replaced with the mean of the feature; if the variable is a categorical feature, it must be replaced with a constant.

Then delete the reservation_status (reservation status, this variable represents whether the order is cancelled, 0 means not cancelled, 1 means cancelled), becauseBecause this is the value that the machine learning model will predict.

Step 2: Choose the best model

Before starting to test the best algorithm for the data, split the data set at an 8:2 ratio. After that, 80% of the data will be used to train the model, and 20% of the data will be used as the validation set.

In the field of data science, predicting order cancellation is a supervised classification problem, also called binary classification. Therefore, the author selected several existing binary classification models such as LightGBM, CatBoost, XGBoost and H2O for training and comparison, and finally selected the model CatBoost with the best experimental results.

Through the CatBoost prediction results, the following points were found:

  • If the user’s nationality is Portuguese, the possibility of order cancellation is high. However, for group bookings, hotels generally do not get everyone’s nationality information in advance. If the order is cancelled, most hotels will default their nationality to the country where the hotel is located. Therefore, this information is for reference only and may not be accurate;

  • Compared with users who have made at least one special request, users who have not made any special requests are more likely to cancel their orders;

  • The lower the value of lead_time (the number of days between the reservation time and the check-in time), the lower the possibility that the reservation will be cancelled (this point of prediction is consistent with the results of previous data analysis).

A popular hotel in Portugal, Eurostar Museum, featuring archaeological exhibitions

Online multiple OTA platforms, support online booking and free cancellation

The performance of the CatBoost model on the validation set:

Performance on the entire “hotel reservation demand” data set:

Hotel: Before canceling, let me rescue a wave

Using this predictive model, the hotel can know in advance which users may cancel orders and take timely remedial measures.

For example, contact users who are more likely to cancel in advance, let them cancel as soon as possible through communication, and reserve more time for the hotel to sell rooms.

Alternatively, you can also contact users who have a tendency to cancel, introduce the advantages of the hotel to them, and give them some rewards for staying in, so as to turn the tide and keep them.

Machine learning helps the hotel to start first.

News source:

https://www.linkedin.com/pulse/u-hotel-booking-cancellations-using-machine-learning-manuel-banza