Machine Learning to Predict Uber Surge Price¶

Uber, the ride-sharing tech giant, aims to provide the best customer experience possible. When using a ride-sharing service, the price is never fixed. For the same starting point and destination, customers observe that the fare always varies. Uber's prices are affected by the supply and demand of rides at a given time. The price of a ride can depend on many factors from the weather to an extensive amount of people summoning a ride at the same time. When the demand for rides increases, the price for the ride increaes to ensure that people who need a ride can get one- according to Uber. The intention behind this is that the people who need rides "more" than others will pay the extra surge price whereas others who are in less of a rush can wait for a lower price. Even though the datasets below also contain data for Lyft, we will focus on Uber. This is currently a problem because people will end up paying way more on an Uber ride than they really should be paying.

Citations¶

References Delaney, D. (2016, May 29). Surge pricing: Why a rainy day ride will cost you more. Tennessean.com. https://www.tennessean.com/story/money/2016/05/29/surge-pricing-why-rainy-day-ride-cost-you-more/84922758/

Helling, B. (2023). Surge Pricing: What It Is & How It Works For Riders & Drivers. Ridester.com. https://www.ridester.com/surge-pricing/

How Surge Pricing Works. (n.d.). Uber.com. Retrieved February 27, 2023, from https://www.uber.com/us/en/drive/driver-app/how-surge-works/

Datasets from Kaggle

In [1]:
import pandas as pd

rides_df = pd.read_csv('cab_rides.csv')
In [2]:
rides_df.head()
Out[2]:
distance cab_type time_stamp destination source price surge_multiplier id product_id name
0 0.44 Lyft 1544952607890 North Station Haymarket Square 5.0 1.0 424553bb-7174-41ea-aeb4-fe06d4f4b9d7 lyft_line Shared
1 0.44 Lyft 1543284023677 North Station Haymarket Square 11.0 1.0 4bd23055-6827-41c6-b23b-3c491f24e74d lyft_premier Lux
2 0.44 Lyft 1543366822198 North Station Haymarket Square 7.0 1.0 981a3613-77af-4620-a42a-0c0866077d1e lyft Lyft
3 0.44 Lyft 1543553582749 North Station Haymarket Square 26.0 1.0 c2d88af2-d278-4bfd-a8d0-29ca77cc5512 lyft_luxsuv Lux Black XL
4 0.44 Lyft 1543463360223 North Station Haymarket Square 9.0 1.0 e0126e1f-8ca9-4f2e-82b3-50505a09db9a lyft_plus Lyft XL

Data Dictionary:¶

  • distance: distance between source and destination
  • cab_type: Uber or Lyft
  • time_stamp: epoch time when data was queried
  • destination: destination of the ride
  • source: the starting point of the ride
  • price: price estimate for the ride in USD
  • surge_multiplier: the multiplier by which price was increased, default 1
  • id: unique identifier
  • product_id: uber/lyft identifier for cab-type
  • name: visible type of the cab: eg: Uber Pool, UberXL
In [3]:
weather_df = pd.read_csv('weather.csv')
In [4]:
weather_df.head()
Out[4]:
temp location clouds pressure rain time_stamp humidity wind
0 42.42 Back Bay 1.0 1012.14 0.1228 1545003901 0.77 11.25
1 42.43 Beacon Hill 1.0 1012.15 0.1846 1545003901 0.76 11.32
2 42.50 Boston University 1.0 1012.15 0.1089 1545003901 0.76 11.07
3 42.11 Fenway 1.0 1012.13 0.0969 1545003901 0.77 11.09
4 43.13 Financial District 1.0 1012.14 0.1786 1545003901 0.75 11.49

Data Dictionary¶

  • temp: temperature in F
  • location: location name
  • clouds: clouds
  • pressure: pressure in mb
  • rain: rain in inches for the last hr
  • time_stamp: epoch time when row data was collected
  • humidity: humidity in %
  • wind: wind speed in mph

These two sets of data will be used to solve the problem by building a machine learning-based model that predicts the serge multipler- specifically based on the various weather conditions. The machine-learning based model that seems to be the best fit is ordinal regression/classification. This project will allow users to look at an area and see if there's currently a surge. This will be beneficial to users as they can choose to avoid this price by either waiting to call the Uber until the surge goes down or walking a bit to a starting position that is not in a surge zone.