This project aims to determine if a season's premier league title can be determinied through ML techniques analyzing certains match stats. The model will then be used to determine this season's (2022-2023) premier league winner. ML techniques is already widenly used with the sports analytics world and can also play a large role in sports betting as it becomes more mainstream.
import pandas as pd
cols = ['Date','HomeTeam', 'AwayTeam','FTHG','FTAG','FTR','HTGS','ATGS','HTGC','ATGC','HTP','ATP']
df = pd.read_csv('final_dataset.csv', usecols=cols)
df
Date | HomeTeam | AwayTeam | FTHG | FTAG | FTR | HTGS | ATGS | HTGC | ATGC | HTP | ATP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 19/08/00 | Charlton | Man City | 4 | 0 | H | 0 | 0 | 0 | 0 | 0.000000 | 0.000000 |
1 | 19/08/00 | Chelsea | West Ham | 4 | 2 | H | 0 | 0 | 0 | 0 | 0.000000 | 0.000000 |
2 | 19/08/00 | Coventry | Middlesbrough | 1 | 3 | NH | 0 | 0 | 0 | 0 | 0.000000 | 0.000000 |
3 | 19/08/00 | Derby | Southampton | 2 | 2 | NH | 0 | 0 | 0 | 0 | 0.000000 | 0.000000 |
4 | 19/08/00 | Leeds | Everton | 2 | 0 | H | 0 | 0 | 0 | 0 | 0.000000 | 0.000000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6835 | 13/05/18 | Newcastle | Chelsea | 3 | 0 | H | 36 | 62 | 47 | 35 | 1.078947 | 1.842105 |
6836 | 13/05/18 | Southampton | Man City | 0 | 1 | NH | 37 | 105 | 55 | 27 | 0.947368 | 2.552632 |
6837 | 13/05/18 | Swansea | Stoke | 1 | 2 | NH | 27 | 33 | 54 | 67 | 0.868421 | 0.789474 |
6838 | 13/05/18 | Tottenham | Leicester | 5 | 4 | H | 69 | 52 | 32 | 55 | 1.947368 | 1.236842 |
6839 | 13/05/18 | West Ham | Everton | 3 | 1 | H | 45 | 43 | 67 | 55 | 1.026316 | 1.289474 |
6840 rows × 12 columns
A model will be created to analyze the statistics and outcome of each match for a season and then told the season ranking. After being trained on every season dating back to the year 2000, the model should be able to predict this year's results.