Project Proposal(UFC Data)¶

Motivation¶

To Determine the winner of a ufc fight, before the fight occurs. This data can be helpful in betting and future decision making for rosters.

Solution¶

the goal of this project is to use the data given(height, reach, age, number of wins, stance, etc.). Where we will use the difference in the data between the two fighters to find a correlation.

Impact¶

If this works, then we will have a program that can predict the winner of a fight before the fight even happens. This while may be a dark secret sauce for gamblers and bettors, it could also provide data to fight matchers, who want to create fights with the closest odds(non-no-brainer fights). However this is not the be-all of decision makers, as many things effect fihts, like mindset, what's happening on that day, and other things that cannot be so easily measurable like height and number of wins.

Dataset¶

Using data set from kaggle... We are actually given three datasets. The first data set is just on fighter profiles, containing information that usually does not change, or is specific to that fighter(instead of a specific fight).

  • height
  • age
  • weight
  • stance
  • Reach

The 2nd dataset is a bunch of UFC fights itself. Giving us information about how the fight went, the lenght, etc. R and B represent two fighters

  • wins(number of wins in fighter's career)
  • losses(number of losses in fighter's career)
  • win(winner of sample fight)
  • current streak(either win or losing streak)
  • Significant Strikes
  • Total Time Fighting(total time a fighter has fought in the ufc)
  • Significant Strikes
  • Take Downs(TD)
  • win_by(how the win was acheived: KO, submission, Decision, Doctor Stoppage, etc.)
In [13]:
import pandas as pd

df_fighter_info = pd.read_csv('raw_fighter_details.csv')

df_fighter_info = pd.DataFrame(df_fighter_info)
df_fighter_info
Out[13]:
fighter_name Height Weight Reach Stance DOB SLpM Str_Acc SApM Str_Def TD_Avg TD_Acc TD_Def Sub_Avg
0 Tom Aaron NaN 155 lbs. NaN NaN Jul 13, 1978 0.00 0% 0.00 0% 0.00 0% 0% 0.0
1 Papy Abedi 5' 11" 185 lbs. NaN Southpaw Jun 30, 1978 2.80 55% 3.15 48% 3.47 57% 50% 1.3
2 Shamil Abdurakhimov 6' 3" 235 lbs. 76" Orthodox Sep 02, 1981 2.45 44% 2.45 58% 1.23 24% 47% 0.2
3 Danny Abbadi 5' 11" 155 lbs. NaN Orthodox Jul 03, 1983 3.29 38% 4.41 57% 0.00 0% 77% 0.0
4 Hiroyuki Abe 5' 6" 145 lbs. NaN Orthodox NaN 1.71 36% 3.11 63% 0.00 0% 33% 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3591 Zhang Tiequan 5' 8" 155 lbs. 69" Orthodox Jul 25, 1978 1.23 36% 2.14 51% 1.95 58% 75% 3.4
3592 Alex Zuniga NaN 145 lbs. NaN NaN NaN 0.00 0% 0.00 0% 0.00 0% 0% 0.0
3593 George Zuniga 5' 9" 185 lbs. NaN NaN NaN 7.64 38% 5.45 37% 0.00 0% 100% 0.0
3594 Allan Zuniga 5' 7" 155 lbs. 70" Orthodox Apr 04, 1992 3.93 52% 1.80 61% 0.00 0% 57% 1.0
3595 Virgil Zwicker 6' 2" 205 lbs. 74" NaN Jun 26, 1982 3.34 48% 4.87 39% 1.31 30% 50% 0.0

3596 rows × 14 columns

In [57]:
df_fight_info = pd.read_csv('preprocessed_data.csv')
df_fight_info = pd.DataFrame(df_fight_info)
df_fight_info
Out[57]:
Winner title_bout B_avg_KD B_avg_opp_KD B_avg_SIG_STR_pct B_avg_opp_SIG_STR_pct B_avg_TD_pct B_avg_opp_TD_pct B_avg_SUB_ATT B_avg_opp_SUB_ATT ... B_Stance_Open Stance B_Stance_Orthodox B_Stance_Sideways B_Stance_Southpaw B_Stance_Switch R_Stance_Open Stance R_Stance_Orthodox R_Stance_Sideways R_Stance_Southpaw R_Stance_Switch
0 Red False 0.000000 0.0 0.420000 0.49500 0.330 0.36000 0.500000 1.000000 ... 0 1 0 0 0 0 1 0 0 0
1 Red False 0.500000 0.0 0.660000 0.30500 0.300 0.50000 1.500000 0.000000 ... 0 1 0 0 0 0 1 0 0 0
2 Red False 0.015625 0.0 0.450000 0.42750 0.250 0.20000 0.148468 0.098389 ... 0 1 0 0 0 0 0 0 1 0
3 Blue False 0.015625 0.0 0.450000 0.42750 0.250 0.20000 0.148468 0.098389 ... 0 0 0 1 0 0 0 0 0 1
4 Blue False 0.125000 0.0 0.535625 0.57875 0.185 0.16625 0.125000 0.187500 ... 0 1 0 0 0 0 1 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5897 Red False 0.015625 0.0 0.450000 0.42750 0.250 0.20000 0.148468 0.098389 ... 0 1 0 0 0 0 0 0 1 0
5898 Red False 0.015625 0.0 0.450000 0.42750 0.250 0.20000 0.148468 0.098389 ... 0 1 0 0 0 0 0 0 1 0
5899 Red False 0.015625 0.0 0.450000 0.42750 0.250 0.20000 0.148468 0.098389 ... 0 1 0 0 0 0 1 0 0 0
5900 Red False 0.015625 0.0 0.450000 0.42750 0.250 0.20000 0.148468 0.098389 ... 0 1 0 0 0 0 1 0 0 0
5901 Red False 0.015625 0.0 0.450000 0.42750 0.250 0.20000 0.148468 0.098389 ... 0 1 0 0 0 0 1 0 0 0

5902 rows × 160 columns

Potential Problems¶

Some potential problems that could come into play is that we are playing with a lot of different factors, different fighters prefer different fighting styles. Some can win with low striking but numerous takedowns, and vica versa. Also many fighters have physcological weaknesses with many fighters, mind games, etc. While it is easy to predict when a wrestler goes up against another wrestler it is much harder to predict when a boxer goes up against a wrestler.

Method¶

Most likley will use regression with data such as sig_strikes, avgtd, height, reach, etc. to predict the winner, will use the difference of the B and R_ respective variables.

In [ ]: