To Determine the winner of a ufc fight, before the fight occurs. This data can be helpful in betting and future decision making for rosters.
the goal of this project is to use the data given(height, reach, age, number of wins, stance, etc.). Where we will use the difference in the data between the two fighters to find a correlation.
If this works, then we will have a program that can predict the winner of a fight before the fight even happens. This while may be a dark secret sauce for gamblers and bettors, it could also provide data to fight matchers, who want to create fights with the closest odds(non-no-brainer fights). However this is not the be-all of decision makers, as many things effect fihts, like mindset, what's happening on that day, and other things that cannot be so easily measurable like height and number of wins.
Using data set from kaggle... We are actually given three datasets. The first data set is just on fighter profiles, containing information that usually does not change, or is specific to that fighter(instead of a specific fight).
The 2nd dataset is a bunch of UFC fights itself. Giving us information about how the fight went, the lenght, etc. R and B represent two fighters
import pandas as pd
df_fighter_info = pd.read_csv('raw_fighter_details.csv')
df_fighter_info = pd.DataFrame(df_fighter_info)
df_fighter_info
fighter_name | Height | Weight | Reach | Stance | DOB | SLpM | Str_Acc | SApM | Str_Def | TD_Avg | TD_Acc | TD_Def | Sub_Avg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Tom Aaron | NaN | 155 lbs. | NaN | NaN | Jul 13, 1978 | 0.00 | 0% | 0.00 | 0% | 0.00 | 0% | 0% | 0.0 |
1 | Papy Abedi | 5' 11" | 185 lbs. | NaN | Southpaw | Jun 30, 1978 | 2.80 | 55% | 3.15 | 48% | 3.47 | 57% | 50% | 1.3 |
2 | Shamil Abdurakhimov | 6' 3" | 235 lbs. | 76" | Orthodox | Sep 02, 1981 | 2.45 | 44% | 2.45 | 58% | 1.23 | 24% | 47% | 0.2 |
3 | Danny Abbadi | 5' 11" | 155 lbs. | NaN | Orthodox | Jul 03, 1983 | 3.29 | 38% | 4.41 | 57% | 0.00 | 0% | 77% | 0.0 |
4 | Hiroyuki Abe | 5' 6" | 145 lbs. | NaN | Orthodox | NaN | 1.71 | 36% | 3.11 | 63% | 0.00 | 0% | 33% | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3591 | Zhang Tiequan | 5' 8" | 155 lbs. | 69" | Orthodox | Jul 25, 1978 | 1.23 | 36% | 2.14 | 51% | 1.95 | 58% | 75% | 3.4 |
3592 | Alex Zuniga | NaN | 145 lbs. | NaN | NaN | NaN | 0.00 | 0% | 0.00 | 0% | 0.00 | 0% | 0% | 0.0 |
3593 | George Zuniga | 5' 9" | 185 lbs. | NaN | NaN | NaN | 7.64 | 38% | 5.45 | 37% | 0.00 | 0% | 100% | 0.0 |
3594 | Allan Zuniga | 5' 7" | 155 lbs. | 70" | Orthodox | Apr 04, 1992 | 3.93 | 52% | 1.80 | 61% | 0.00 | 0% | 57% | 1.0 |
3595 | Virgil Zwicker | 6' 2" | 205 lbs. | 74" | NaN | Jun 26, 1982 | 3.34 | 48% | 4.87 | 39% | 1.31 | 30% | 50% | 0.0 |
3596 rows × 14 columns
df_fight_info = pd.read_csv('preprocessed_data.csv')
df_fight_info = pd.DataFrame(df_fight_info)
df_fight_info
Winner | title_bout | B_avg_KD | B_avg_opp_KD | B_avg_SIG_STR_pct | B_avg_opp_SIG_STR_pct | B_avg_TD_pct | B_avg_opp_TD_pct | B_avg_SUB_ATT | B_avg_opp_SUB_ATT | ... | B_Stance_Open Stance | B_Stance_Orthodox | B_Stance_Sideways | B_Stance_Southpaw | B_Stance_Switch | R_Stance_Open Stance | R_Stance_Orthodox | R_Stance_Sideways | R_Stance_Southpaw | R_Stance_Switch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Red | False | 0.000000 | 0.0 | 0.420000 | 0.49500 | 0.330 | 0.36000 | 0.500000 | 1.000000 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
1 | Red | False | 0.500000 | 0.0 | 0.660000 | 0.30500 | 0.300 | 0.50000 | 1.500000 | 0.000000 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
2 | Red | False | 0.015625 | 0.0 | 0.450000 | 0.42750 | 0.250 | 0.20000 | 0.148468 | 0.098389 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
3 | Blue | False | 0.015625 | 0.0 | 0.450000 | 0.42750 | 0.250 | 0.20000 | 0.148468 | 0.098389 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
4 | Blue | False | 0.125000 | 0.0 | 0.535625 | 0.57875 | 0.185 | 0.16625 | 0.125000 | 0.187500 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5897 | Red | False | 0.015625 | 0.0 | 0.450000 | 0.42750 | 0.250 | 0.20000 | 0.148468 | 0.098389 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
5898 | Red | False | 0.015625 | 0.0 | 0.450000 | 0.42750 | 0.250 | 0.20000 | 0.148468 | 0.098389 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
5899 | Red | False | 0.015625 | 0.0 | 0.450000 | 0.42750 | 0.250 | 0.20000 | 0.148468 | 0.098389 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
5900 | Red | False | 0.015625 | 0.0 | 0.450000 | 0.42750 | 0.250 | 0.20000 | 0.148468 | 0.098389 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
5901 | Red | False | 0.015625 | 0.0 | 0.450000 | 0.42750 | 0.250 | 0.20000 | 0.148468 | 0.098389 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
5902 rows × 160 columns
Some potential problems that could come into play is that we are playing with a lot of different factors, different fighters prefer different fighting styles. Some can win with low striking but numerous takedowns, and vica versa. Also many fighters have physcological weaknesses with many fighters, mind games, etc. While it is easy to predict when a wrestler goes up against another wrestler it is much harder to predict when a boxer goes up against a wrestler.
Most likley will use regression with data such as sig_strikes, avgtd, height, reach, etc. to predict the winner, will use the difference of the B and R_ respective variables.