In much of the hotel segment, it is common for a large amount of travelers to book ahead of their stay. Howeverver, with the rise of last minute travel, and third-party booking platforms that offer easy online cancelation policies--it is reasonable to investigate the soundness of hotel reservations.
According to industry sources, cancelation rates on hotel reservations hover between 10-20%. This is quite a noticible rate, and when one considers that canceled reservations often result in hotel rooms going unfilled or sold at a last-minute-discount, it becomes clear that a significant amount of profits can be swept away by canceled reservations.
While industry wide data can be an intreasting discussion topic, it can be hard for individual franchieies to predict exactly how much of their coustomer base will cancel their reservations on a given night. What hotels really need is a way to predict if a given coustomer in their reservation system might cancel their booking. Using a dataset of ~36,000 bookings, I propose to train a model to predict reservation cancelations based off of data provided durring the booking process. With this model, hoteliers would be able to have a better prediction of the true number of booked rooms on a given night--and would thus be able to better sell to last minute travelers.
import pandas as pd
df = pd.read_csv('Hotel Reservations.csv')
df.head()
Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
df.shape
(36275, 19)
The data set that I propose to use contains 17 variables, all of which can be derived from booking information. The dataset comes with the following Data Dictionary.
Data Dictionary
Given the obvious use of models such as these, it is highly likely that much more sophisticated models exist and are in use in hotels across America. Given the low-capital, franchies model that many hotel and hotel competitors use, it is likely in fact that this is a primary busniess function of the majority of big name hotel brands. For this reason, it would be obviously disengenuis to say that this project would obviously make "progress on your real-world problem" in the context of large hotels.
A possibility however, is to provide this software open-source, allowing mom-&-pop shops to train a model based off of their specific client data without paying industry software lisceses or joining a franchies program. This is the main intreast that I would have for the progect.
We will classify specific reservations as either likely to cancel or not--likely using a teird system (0-10% chance, 11-30% chance, ect.). Using these classifications, we can integrate over a random set of bookings (exemplary of the set of bookings on a given night), and predict the likelyhood of total cancelations.