soccer (predicting premier league)¶

This project aims to determine if a season's premier league title can be determinied through ML techniques analyzing certains match stats. The model will then be used to determine this season's (2022-2023) premier league winner. ML techniques is already widenly used with the sports analytics world and can also play a large role in sports betting as it becomes more mainstream.

Machine_Learning_in_Sports_Analytics

Machine_Learning_in_Sports_Betting

In [1]:
import pandas as pd
In [2]:
cols = ['Date','HomeTeam', 'AwayTeam','FTHG','FTAG','FTR','HTGS','ATGS','HTGC','ATGC','HTP','ATP']
df = pd.read_csv('final_dataset.csv', usecols=cols)
df
Out[2]:
Date HomeTeam AwayTeam FTHG FTAG FTR HTGS ATGS HTGC ATGC HTP ATP
0 19/08/00 Charlton Man City 4 0 H 0 0 0 0 0.000000 0.000000
1 19/08/00 Chelsea West Ham 4 2 H 0 0 0 0 0.000000 0.000000
2 19/08/00 Coventry Middlesbrough 1 3 NH 0 0 0 0 0.000000 0.000000
3 19/08/00 Derby Southampton 2 2 NH 0 0 0 0 0.000000 0.000000
4 19/08/00 Leeds Everton 2 0 H 0 0 0 0 0.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ...
6835 13/05/18 Newcastle Chelsea 3 0 H 36 62 47 35 1.078947 1.842105
6836 13/05/18 Southampton Man City 0 1 NH 37 105 55 27 0.947368 2.552632
6837 13/05/18 Swansea Stoke 1 2 NH 27 33 54 67 0.868421 0.789474
6838 13/05/18 Tottenham Leicester 5 4 H 69 52 32 55 1.947368 1.236842
6839 13/05/18 West Ham Everton 3 1 H 45 43 67 55 1.026316 1.289474

6840 rows × 12 columns

  • Date = Match Date (dd/mm/yy) (Type: string)
  • HomeTeam = Home Team (Type: string)
  • Away team = Away Team (Type: string)
  • FTHG = Full Time Home Team Goals (Type: numpy.int64)
  • FTAG = Full-Time Away Team Goals (Type: numpy.int64)
  • FTR = Full-Time Result (H = Home Win, D=Draw, A = Away Win, NH = Not Home Win) (Type: string)
  • HTGS = Home Team Goals Scored (Type: numpy.int64)
  • ATGS = Away Team Goals Scored (Type: numpy.int64)
  • HTGC = Home Team Goal Count (Type: numpy.int64)
  • ATGC = Away Team Goal Count (Type: numpy.int64)
  • HTP = Home Team Points (Type: numpy.float64)
  • ATP = Away Team Point (Type: numpy.float64)

A model will be created to analyze the statistics and outcome of each match for a season and then told the season ranking. After being trained on every season dating back to the year 2000, the model should be able to predict this year's results.

In [ ]: