Data Set Link: https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset

Predicting Canceled Reservations for Hotels¶

In much of the hotel segment, it is common for a large amount of travelers to book ahead of their stay. Howeverver, with the rise of last minute travel, and third-party booking platforms that offer easy online cancelation policies--it is reasonable to investigate the soundness of hotel reservations.

According to industry sources, cancelation rates on hotel reservations hover between 10-20%. This is quite a noticible rate, and when one considers that canceled reservations often result in hotel rooms going unfilled or sold at a last-minute-discount, it becomes clear that a significant amount of profits can be swept away by canceled reservations.

Project Proposal¶

While industry wide data can be an intreasting discussion topic, it can be hard for individual franchieies to predict exactly how much of their coustomer base will cancel their reservations on a given night. What hotels really need is a way to predict if a given coustomer in their reservation system might cancel their booking. Using a dataset of ~36,000 bookings, I propose to train a model to predict reservation cancelations based off of data provided durring the booking process. With this model, hoteliers would be able to have a better prediction of the true number of booked rooms on a given night--and would thus be able to better sell to last minute travelers.

In [11]:

import pandas as pd

df = pd.read_csv('Hotel Reservations.csv')

df.head()

Out[11]:

	Booking_ID	no_of_adults	no_of_weekend_nights	no_of_week_nights	type_of_meal_plan	room_type_reserved	lead_time	arrival_year	arrival_month	arrival_date	market_segment_type	avg_price_per_room	no_of_special_requests	booking_status
0	INN00001	2	1	2	Meal Plan 1	Room_Type 1	224	2017	10	2	Offline	65.00	0	Not_Canceled
1	INN00002	2	2	3	Not Selected	Room_Type 1	5	2018	11	6	Online	106.68	1	Not_Canceled
2	INN00003	1	2	1	Meal Plan 1	Room_Type 1	1	2018	2	28	Online	60.00	0	Canceled
3	INN00004	2	0	2	Meal Plan 1	Room_Type 1	211	2018	5	20	Online	100.00	0	Canceled
4	INN00005	2	1	1	Not Selected	Room_Type 1	48	2018	4	11	Online	94.50	0	Canceled

In [10]:

df.shape

Out[10]:

(36275, 19)

The data set that I propose to use contains 17 variables, all of which can be derived from booking information. The dataset comes with the following Data Dictionary.

Data Dictionary

Booking_ID: unique identifier of each booking
no_of_adults: Number of adults
no_of_children: Number of Children
no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
no_of_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
type_of_meal_plan: Type of meal plan booked by the customer:
required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)
room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.
lead_time: Number of days between the date of booking and the arrival date
arrival_year: Year of arrival date
arrival_month: Month of arrival date
arrival_date: Date of the month
market_segment_type: Market segment designation.
repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)
no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking
no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking
avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
booking_status: Flag indicating if the booking was canceled or not.

A note about industry impact¶

Given the obvious use of models such as these, it is highly likely that much more sophisticated models exist and are in use in hotels across America. Given the low-capital, franchies model that many hotel and hotel competitors use, it is likely in fact that this is a primary busniess function of the majority of big name hotel brands. For this reason, it would be obviously disengenuis to say that this project would obviously make "progress on your real-world problem" in the context of large hotels.

A possibility however, is to provide this software open-source, allowing mom-&-pop shops to train a model based off of their specific client data without paying industry software lisceses or joining a franchies program. This is the main intreast that I would have for the progect.

Specific Solution¶

We will classify specific reservations as either likely to cancel or not--likely using a teird system (0-10% chance, 11-30% chance, ect.). Using these classifications, we can integrate over a random set of bookings (exemplary of the set of bookings on a given night), and predict the likelyhood of total cancelations.