Fantasy Premier League Prediction Model¶

Motivation:¶

Problem¶

Fantasy Premier League (FPL), a fantasy sports game run by England's top flight soccer league, is very hard. There are a lot of factors at play, not everyone is numerically inclined, and it's all a game of chance at the end of the day.

Solution¶

FPL has an API that provides every statistic they track and calculate for each player, team, and fixture in the league. All of that data is available to users via the official FPL interface, too, when they select their teams. The goal of this project is to identify a relationship between FPL's advanced statistics and points scored, the ultimate goal of fantasy sports.

Impact¶

If successful, this work may yield a program which can pick up on patterns in obscure, hard to comprehend data. This sould help FPL managers choose entire rosters, or simply decide to between two similar players.

One negative side effect of this program could be that it's TOO formulaic. Each year, random players are very good, and the model may not be able to capture those more obscure, random occurances.

Dataset¶

Detail¶

We will use Fantasy Premier League's live-updating API to observe the following features for each player:

  • web_name

  • element_type (represents position)

  • form
  • now_cost
  • points_per_game
  • total_points
  • minutes
  • starts
  • goals_scored
  • assists
  • clean_sheets
  • goals_conceded
  • yellow_cards
  • red_cards
  • bonus
  • bps
  • influence
  • creativity
  • threat
  • ict_index
  • expected_goals
  • expected_assists
  • expected_goals_conceded

That's just for starters.

In addition to these, each team has a whole host of stats that are likely relevant. Those numbers are also available, and would likely be used, via the API. Our project seeks to use the features above to estimate the points output of a player.

Potential Problems¶

While most numbers up there are just counting stats, like goals and yellow_cards, which can be directly observed, other stats are calculcated specifically for FPL, such as bps, which stands for bonus points system, or creativity. Understanding how those statistics are calculated, and what effect they could have on the model, is crucial. There are also a lot of inputs, so there could be confounding variables at play.

Shown below is code importing some raw data and creating a dataframe with some basic statistics for each player in the league

In [2]:
import requests
import pandas as pd
import numpy as np

url = 'https://fantasy.premierleague.com/api/bootstrap-static/'
r = requests.get(url)
json = r.json()

elements_df = pd.DataFrame(json['elements'])
relevant_columns = ['web_name', 'team','element_type', 'now_cost', 'goals_scored', 'assists', 'clean_sheets', 'goals_conceded', 'yellow_cards', 'red_cards', 'total_points']
basic_elements_df = elements_df[relevant_columns]

basic_elements_df.head()
Out[2]:
web_name team element_type now_cost goals_scored assists clean_sheets goals_conceded yellow_cards red_cards total_points
0 Xhaka 1 3 49 3 5 9 23 3 0 90
1 Elneny 1 3 41 0 0 0 2 0 0 6
2 Holding 1 2 42 0 0 0 0 0 0 7
3 Partey 1 3 47 2 0 9 11 2 0 58
4 Ødegaard 1 3 70 8 7 9 22 3 0 134

Method:¶

This is a regression model. expected_points, which would be the output, is continuous. The goal is to use the wealth of data available to FPL managers in a simpler way. One advantage of this approach is that it offers an intuitive output as each feature will either raise, or lower, the amount of points a player can be expected to score.