This project simulates full English Premier League (EPL) seasons using a Poisson-based hierarchical goal model. It applies Monte Carlo methods to evaluate how team strength, random variation, and marginal changes in strategy translate into league outcomes and economic value. The simulation framework estimates expected values, variances, and counterfactual impacts under different assumptions of performance and fortune.
The English Premier League allocates hundreds of millions of pounds annually based on final table positions. While team quality plays a large role, randomness and scheduling quirks can significantly affect outcomes. This project uses statistical modeling to quantify these dynamics: how much success is driven by strength, how much by luck, and where small changes can yield disproportionate returns.
At the heart of the simulation is a probabilistic match outcome engine built on team-specific attack and defense strengths. Match results are generated stochastically across full seasons, with outcomes translated into revenue via placement-based prize mapping.
This project estimates team-specific attacking and defensive abilities in the English Premier League using historical match outcomes. We model the number of goals each team scores using a Poisson log-linear model, with parameters estimated via maximum likelihood.
We use match-level data from the 2021-22, 2022-23 and 2023–24 EPL seasons. Each record includes:
- Home and away team names
- Match date
- Final score (home goals, away goals)
The data is preprocessed into a long format, with one row per team per match (i.e., each match yields two observations — one for home team goals, one for away).
data2023.24 <- read.csv('https://www.football-data.co.uk/mmz4281/2324/E0.csv')
data2022.23 <- read.csv('https://www.football-data.co.uk/mmz4281/2223/E0.csv')
We assume that the number of goals team i scores against team j follows a Poisson distribution:
Where:
-
$\alpha_i$ : Latent attack strength of team i -
$\delta_j$ : Latent defense weakness of team j -
$\lambda_{ij}$ : Expected goals scored by team i vs. j
Each match contributes two observations:
- Home team goals:
$\lambda_{\text{home}} = \exp(\alpha_{\text{home}} - \delta_{\text{away}})$ - Away team goals:
$\lambda_{\text{away}} = \exp(\alpha_{\text{away}} - \delta_{\text{home}})$
We estimate the parameters
Where:
-
$y_m$ : Observed goal count -
$\lambda_m = \exp(\alpha_i - \delta_j)$ : Modeled rate -
$M$ : Total number of team-match observations
The negative log-likelihood is minimized using scipy.optimize.minimize
(e.g., L-BFGS-B).
out<-optim(theta0,likelihood.function,control= list(fnscale=-1), method="BFGS" )
The model is overparameterized: adding a constant to all
This centers team attack strengths relative to a league-average baseline.
alphaOut[20] <- -1*sum( alphaOut[1:19] ) #sum to zero constraint
deltaOut[20] <- -1* sum( deltaOut[1:19] )
Interpretation
-
$\alpha_i > 0$ : Team i scores more than average (strong offense) -
$\delta_j < 0$ : Team j concedes fewer goals than average (strong defense)
These parameters provide a quantitative summary of team quality, inferred from observed match outcomes.
Each team’s simulated final league position is mapped to a revenue effect in GBP millions, centered relative to 17th place (the baseline). These estimates approximate broadcasting, merit, and exposure-based earnings.
# Hard code value of earnings and position
revenue_by_position <- data.frame(
position = 1:20,
revenue_million_gbp = c(
149.6, 145.9, 142.1, 138.4, 73.7, 70.0, 55.2, 33.5, 29.8, 26.0,
22.3, 18.6, 14.9, 11.2, 7.5, 3.7, 0, -88.7, -92.5, -96.2))
Interpretation:
The financial structure of the EPL is nonlinear. A shift from 17th to 18th (relegation) costs ~£89 million, while a Top 4 finish is worth ~£140 million more than just avoiding relegation. As such, financial outcomes vary far more than ranks do—and it's this revenue distribution we model and analyze.
We simulate each EPL season using Monte Carlo methods, where match outcomes are generated stochastically based on Poisson-distributed goals.
For each match between Team A and Team B:
- Simulate goals scored by Team A:
$$\text{Goals}_A \sim \text{Poisson}(\lambda_A), \quad \lambda_A = \exp(\alpha_A - \delta_B)$$ - Simulate goals scored by Team B:
$$\text{Goals}_B \sim \text{Poisson}(\lambda_B), \quad \lambda_B = \exp(\alpha_B - \delta_A)$$
Where:
-
$( \alpha_i ):$ attack strength of team$( i )$ -
$( \delta_i ):$ defense strength of team$( i )$ -
$( \lambda ):$ expected goals for each side, always positive due to exponential transformation
We simulate all 380 matches of the season using this method for 1000 seasons.
Each simulated season proceeds as follows:
- Generate all Possible Matches
# Simulate each match
ScoresMatrix <- matrix(nrow = nrow(allMatches), ncol = 4)
for (ii in 1:nrow(allMatches)) {
ScoresMatrix[ii, 1:2] <- allMatches[ii, ]
ScoresMatrix[ii, 3:4] <- draw.score(allMatches[ii, "home"], allMatches[ii, "away"])}
colnames(ScoresMatrix) <- c("home.team", "away.team", "home.score", "away.score")
No Repeats
-
20*38 = 760, but do not recount because teams play each other
-
760/2 = 380 total matches (with no repeats)
- Generate match Goals using Poisson sampling
# Function to simulate a single match
draw.score <- function(team1, team2) {
c(
rpois(1, exp(alphaList[team1] - deltaList[team2])),
rpois(1, exp(alphaList[team2] - deltaList[team1])))}
- Assign points based on outcomes (3 for win, 1 for draw, 0 for loss)
- Construct the league table using total points, goal difference, and goals scored
- Map final ranks to revenue using the revenue table
- Repeat for a thousand seasons to build full distributions for each team
# Simulate 1000 seasons
season_sims <- monte.carlo.sim(
fun = simulate_season,
fun.arg = list(alphaList = alphaList, deltaList = deltaList, team_names = team_names),
nSims = 1000)
This process creates thousands of alternate versions of the same season under probabilistic laws. The resulting simulations show what’s possible, and the aggregation of many simulations shows what is likely.
We report the expected revenue (in GBP millions) for each team across all simulations.
Expected revenue better reflects a team’s value under uncertainty than rank alone. It captures how often a team reaches high-paying vs. low-paying outcomes—making it a better metric for strategic decision-making.
We analyze the revenue distribution for each team to understand how fragile or stable their expected outcomes are.
Some teams show wide revenue swings due to their proximity to important thresholds such as Top 4 qualification or relegation. These teams are more exposed to randomness and may benefit from risk-mitigation strategies (e.g., increasing squad depth, stabilizing tactics, or reducing match-to-match variance).
The left plot shows the standard deviation of revenue across all simulated seasons — this represents the actual volatility in outcomes that a team might experience. In contrast, the right plot shows confidence intervals around the average revenue, indicating the precision of our estimate of each team’s expected revenue.
Importantly, the CI bars on the right do not reflect the full spread of revenue outcomes — they are much narrower and only quantify uncertainty in the mean, not variability in individual season results. The SD plot on the left captures that total variability directly.
To assess the financial value of good fortune, we simulate the impact of converting one random loss into a win. 20 teams × 1,000 simulations = 20,000 “flip scenarios” and 1000 baseline simulated seasons
-
Method:
- Identify one random loss per team per simulationed season
- flip the score (convert to a win (+3 points))
- Rerank season, recompute revenue
- Compare to baseline revenue
-
Output:
- Average revenue change due to a single "lucky" win
This analysis quantifies the marginal financial value of luck. For teams near key thresholds (e.g., 4th vs. 5th, or 17th vs. 18th), a single lucky result can be worth tens of millions of pounds. Teams with steep revenue gradients around their most likely finish (e.g., high mid-table or relegation zone) tend to benefit most from small improvements.
There’s also a strategic angle to the lucky win: there is a chance that the win occurs against a direct rival in the table — a team close in points and position. Flipping that match not only gains 3 points but potentially removes 3 points from a close competitor, increasing the likelihood of leapfrogging them in the standings.
Given 380 matches in the season and 19 possible opponents per team, the probability of your lucky win being against a direct competitor (e.g., within 1–2 places in the table) is non-trivial. Roughly:
- Each team plays 38 matches
- Suppose 1-2 of those are realistically close rivals over the season
- Then the chance that your lucky win occurs against one of those is about 5%-20% (depending on total wins and losses, and win/loss against rivals)
This makes the expected revenue gain from a flipped result not just a reflection of your own improvement — but potentially a double swing in revenue if it knocks a nearby team down a position.
We simulate two hypothetical improvements for each team:
- A 10% increase in expected goals scored
- A 10% reduction in expected goals conceded
To simulate marginal improvements in team performance, we implement a function that computes the required increase in a team’s attack strength parameter, αᵢ
, to produce a 10% increase in average expected goals across opponents.
Under a Poisson goals model, the expected number of goals scored by team
We solve for a shifted value
This is implemented as a nonlinear root-finding problem in
This required shift is team-specific. The magnitude of
This transformation enables fair and interpretable simulations of performance gains, aligning latent skill improvements to a consistent increase in expected output while accounting for the surrounding competitive environment.
These shifts are applied per team using a custom root-solving function.
# Define the alphaShift function
alphaShift <- function(team) {
# Ensure alphaIn is numeric
alphaIn <- as.numeric(team_estim[team, "alpha"])
# Ensure otherTeams are correct
otherTeams <- setdiff(rownames(team_estim), team)
# Define the function to solve for the 10% increase (vectorized)
fn <- function(x) {
# We are comparing two values - return the difference as a vector
result <- mean(exp(x - as.numeric(team_estim[otherTeams, "delta"]))) -
1.1 * mean(exp(alphaIn - as.numeric(team_estim[otherTeams, "delta"])))
# Return the result as a vector with the expected structure
return(c(result))}
# Use multiroot to solve for the value of alphaIn that increases the parameter by 10%
sol <- multiroot(fn, alphaIn, rtol = 1e-12, atol = 1e-8)
return(sol$root)}
For each team, we run three simulation sets:
-
Baseline: simulate a thousand of seasons with original
$( \alpha )$ ,$( \delta )$ (already done) -
Attack Boost: apply
$( \Delta \alpha_i )$ , re-simulate -
Defense Boost: apply
$( \Delta \delta_i )$ , re-simulate
For each intervention, we compute the change in expected revenue relative to the baseline:
Where
- 1,000 baseline simulations and 500 simulations for each of 20 teams with the alpha boost, a total of 21000 simulated full seasons (attack and defense)
- Average the thousand revenue changes for each team
This allows teams to compare the financial ROI of improving attack vs. defense. Some clubs benefit more from scoring upgrades; others from tightening defense. This data-driven approach helps guide investment priorities, recruitment, and tactical decisions.
In the plot, each point represents a team:
- Teams above the diagonal line benefit more from defensive improvements (reducing goals conceded).
- Teams below the line benefit more from offensive upgrades (increasing goals scored).
- Teams on or near the line show roughly equal financial returns from both types of improvements.
- Premier League outcomes translate into steep, nonlinear revenue differences.
- Expected rank is uniform; expected revenue is not—and it's what truly matters.
- Randomness is financially meaningful: small shifts in match results can change revenue by millions.
- Teams differ not just in strength, but in volatility. Simulation helps quantify both.
- A simulation-based approach provides a strategic planning tool: to evaluate improvement paths, quantify luck, and allocate resources more effectively.