Data Set Link: https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset

Predicting Canceled Reservations for Hotels¶

In much of the hotel segment, it is common for a large amount of travelers to book ahead of their stay. Howeverver, with the rise of last minute travel, and third-party booking platforms that offer easy online cancelation policies--it is reasonable to investigate the soundness of hotel reservations.

According to industry sources, cancelation rates on hotel reservations hover between 10-20%. This is quite a noticible rate, and when one considers that canceled reservations often result in hotel rooms going unfilled or sold at a last-minute-discount, it becomes clear that a significant amount of profits can be swept away by canceled reservations.

Project Proposal¶

While industry wide data can be an intreasting discussion topic, it can be hard for individual franchieies to predict exactly how much of their coustomer base will cancel their reservations on a given night. What hotels really need is a way to predict if a given coustomer in their reservation system might cancel their booking. Using a dataset of ~36,000 bookings, I propose to train a model to predict reservation cancelations based off of data provided durring the booking process. With this model, hoteliers would be able to have a better prediction of the true number of booked rooms on a given night--and would thus be able to better sell to last minute travelers.

In [11]:
import pandas as pd

df = pd.read_csv('Hotel Reservations.csv')

df.head()
Out[11]:
Booking_ID no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
0 INN00001 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 0 65.00 0 Not_Canceled
1 INN00002 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 0 106.68 1 Not_Canceled
2 INN00003 1 0 2 1 Meal Plan 1 0 Room_Type 1 1 2018 2 28 Online 0 0 0 60.00 0 Canceled
3 INN00004 2 0 0 2 Meal Plan 1 0 Room_Type 1 211 2018 5 20 Online 0 0 0 100.00 0 Canceled
4 INN00005 2 0 1 1 Not Selected 0 Room_Type 1 48 2018 4 11 Online 0 0 0 94.50 0 Canceled
In [10]:
df.shape
Out[10]:
(36275, 19)

The data set that I propose to use contains 17 variables, all of which can be derived from booking information. The dataset comes with the following Data Dictionary.

Data Dictionary

  • Booking_ID: unique identifier of each booking
  • no_of_adults: Number of adults
  • no_of_children: Number of Children
  • no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
  • no_of_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
  • type_of_meal_plan: Type of meal plan booked by the customer:
  • required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)
  • room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.
  • lead_time: Number of days between the date of booking and the arrival date
  • arrival_year: Year of arrival date
  • arrival_month: Month of arrival date
  • arrival_date: Date of the month
  • market_segment_type: Market segment designation.
  • repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)
  • no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking
  • no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking
  • avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
  • no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
  • booking_status: Flag indicating if the booking was canceled or not.

A note about industry impact¶

Given the obvious use of models such as these, it is highly likely that much more sophisticated models exist and are in use in hotels across America. Given the low-capital, franchies model that many hotel and hotel competitors use, it is likely in fact that this is a primary busniess function of the majority of big name hotel brands. For this reason, it would be obviously disengenuis to say that this project would obviously make "progress on your real-world problem" in the context of large hotels.

A possibility however, is to provide this software open-source, allowing mom-&-pop shops to train a model based off of their specific client data without paying industry software lisceses or joining a franchies program. This is the main intreast that I would have for the progect.

Specific Solution¶

We will classify specific reservations as either likely to cancel or not--likely using a teird system (0-10% chance, 11-30% chance, ect.). Using these classifications, we can integrate over a random set of bookings (exemplary of the set of bookings on a given night), and predict the likelyhood of total cancelations.