Unraveling the Strategy of Soccer Substitutions Using Data

Jesse, Maaike, Joan
June 12th, 2023 · 6 min read

In soccer, substitutions are a manager’s main asset to directly influence a match. By strategically substituting players on the field with fresh bench players, the technical staff can introduce new tactics, make crucial adjustments, and possibly shift the momentum in their favour. In this post, we will study how COVID altered substitution dynamics, assess the general impact of substitutions on goal-scoring and explore the realm of predicting impactful player swaps.

Our analysis draws upon publicly available data sourced from fbref.com. We consider the top men’s European leagues that allowed five substitutions per match throughout the 2020/21 and 2021/22 seasons. These leagues include Spain’s LaLiga, Italy’s Serie A, France’s Ligue 1, Germany’s Bundesliga, Portugal’s Primeira Liga, and The Netherlands’ Eredivisie. For each match, the dataset encompasses a report of the main events including substitutions, bookings and goals, a lineup sheet with player information and positions, and a shot record featuring xG data.

Are substitutions done differently pre- vs. post-COVID?

With the resumption of competitions after the COVID-19 outbreak, a significant shift occurred in the rules governing substitutions. The previous allowance of three substitutions per match was expanded to five. However, these five changes still have to be executed in three freely chosen substitution windows. This rule change raises the question of how managers have adapted their approach to substitutions. In this blogpost, we investigate two key aspects: the type and timing of substitutions.

To analyze the type of substitutions, we categorize a substitution as neutral if the outgoing and incoming player play in similar positions and as offensive (defensive) if a more attacking (defending) player replaces a more defensive (offensive) one. The offensive and defensive substitutions typically signify a deliberate tactical choice, while injuries, fatigue and bookings are the typical factors that drive neutral substitutions.

In Table 1, we examine the impact of the increased number of substitution opportunities on the type of substitutions. It is noteworthy that despite the rise in overall substitutions after the rule change, the absolute number of non-neutral (i.e., tactical) subs initially remained relatively stable. However, the percentage of tactical subs decreased, primarily due to an increase in substitutions motivated by factors such as fatigue and injuries. Only in the 2021/22 season, with clubs having had time to adapt to the new rule, did we witness an increase in the absolute number of tactical changes compared to the pre-COVID seasons.

SeasonOffensive subs percentageDefensive subs percentageNeutral subs percentageNon-neutral subs per matchSubs per match
2018/1913.4%11.2%75.4%0.722.93
2019/20 pre-COVID12.8%11.4%76.2%0.702.87
2019/20 post-COVID8.6%9.4%82.0%0.774.29
2020/218.5%8.1%82.8%0.724.17
2021/2212.9%11.3 %75.6%1.14.33
Table 1. Substitution summary per season.

Table 2 shows the average time in the match that each substitution is made. For the seasons with five subs per match allowed, we counted the substitution windows used in second halves, as first-half-subs are typically motivated by extraordinary circumstances. With the rule change, substitutions happen a bit later and in more compact time windows, but the effect is diminished for the later substitutions.

First subSecond subThird sub
3-subs 60 ± 1074 ± 8 83 ± 7
5-subs65 ± 777 ± 785 ± 5
Table 2. Time of substitution (average minute ± standard deviation) for the 3-subs and 5-subs rule.

How does the generated xG evolve after substitutions?

The second half of soccer matches often features a surge in goal-scoring opportunities, accounting for approximately 55% of goals and xG. While it is commonly believed that substitutions play a role in this increase, it is hard to definitively prove that these substitutions are the cause, or whether it is simply that play becomes more open and teams intensify their pursuit of a favorable result as time runs out.

To shed light on this matter, instead of comparing the first and second halves, we analyzed the scoring intensity after each substitution. Figure 1 shows the evolution of average xG per 90 minutes for teams after each of the three substitution windows in the 2020/21 and 2021/22 seasons, on a league-by-league basis. With an increase in xG after the first and second substitution windows, we observe that substitutions generally have a positive effect on scoring probability. However, the overall scoring chances decrease after the third and last substitution window. These substitutions usually happen in the last minutes of the match, where the opposition has tactically adjusted to the previous subs, and they are frequently used for time-wasting purposes.

xg per sub
Figure 1. Average xG generated after each substitution window, grouped by league.

What is the performance of starting vs. substitute players?

To assess the performance of players, we leveraged the detailed shot information data on fbref.com, which includes xG and Shot Creating Actions (SCAs) comprising the two previous actions by teammates leading up to the shot. As an estimate of a player’s total goal-scoring contribution, we compute his expected value (xV) as the sum of the xG of his shots and SCAs. It is important to acknowledge that xV provides a more limited evaluation compared to more advanced metrics that require full event data. Nonetheless, Figure 2 shows the average xV generated by midfielders and forwards, depending on the minutes played and differentiated by starting XI and substitutes.

start vs sub
Figure 2. xV generated through a match at the average player rate of xV per minute. For bench players, minutes are counted from the minute they enter the pitch.

Figure 2 reveals an intriguing finding that might seem counterintuitive at first glance. Substitutes tend to generate more xV than players in the starting XI. Even though we expect the better players to start the match, the context in which substitutes enter the match affects their xV performance. Substitutes often enter in the second half when starters are more fatigued and the match becomes more open, thereby also increasing the number of goal-scoring opportunities. Additionally, the analysis reveals that the xV per minute does not decay with time, indicating that players maintain a consistent ability to generate xV throughout the match, with similar levels of productivity observed in both the first and last minutes. This suggests that any decline in physical performance is compensated by a generally higher goal-scoring intensity.

Can we predict whether a substitution will be useful?

To predict the impact of substitutions, we look at the expected goals difference (xGD), a metric that provides insights into the scoring chances generated by both teams and thus indicates match momentum. Based on xGD, we classified substitutions as useful or not depending on the change in xGD after the substitution was made. Specifically, we label a substitution as positive if the xGD increased by 0.5 or more negative if the xGD decreased by at least 0.5, and neutral otherwise. We describe each substitution using three types of features. The match-specific features are goal difference and xGD when the substitution is made. The substitution-specific features are the type of substitution (offensive, defensive, neutral) and the minute of the substitution. Finally, the team-specific features are each team’s ELO rating and home and away status.

predict method
Figure 2. Methodology for predicting whether a substitution will be useful.

After experimenting with a set of models, we achieved the best performance using a kNN model. Data prior to the current season is the training data and tune the value of k using 10-fold cross-validation. On the ongoing 2022/23 season, our model achieves an accuracy of 74% whereas simply predicting the most common class (neutral) would yield an accuracy of 55%.

Taking the Barcelona vs Atlético de Madrid game on 23rd April 2023 as an example, we can see the predicted effect of the subs. In the 61st minute, Xavi made a double substitution, with Pedri and Eric Garcia coming in for Marcos Alonso and Ferran Torres. This defensive change was predicted to have a negative impact but the scoreline did not change after the substitutions. Cholo Simeone tried various substitutions to come back, but only the offensive double change in the 59th minute was a predicted to be a favorable substitution. Indeed, most of their xG was generated after this sub. They had 0.3 (on 6 shots) before the 58 minute and 1.0 afterwards (on 7 shots) — with most of this coming on an attempt by Griezmann with an xG of 0.51. The neutral substitutions in the 67th minute and 79th minute were predicted to be inconsequential for the result of the match.

prediction example
Figure 3. Predictions of useful subs in the Barcelona vs Atlético de Madrid game on 23rd April 2023.

Conclusion

In this study we wanted to evaluate substitutions and their impact on the game. Since the COVID outbreak, regulatory changes around substitutions have provided managers more flexibility and strategic options. We have seen how these changes have affected the number of tactical interventions per match and altered the timing of substitutions.

One notable effect of substitutions that emerged from our analysis was their impact on goal scoring. We observed a higher rate of xV by substitutes, indicating their ability to contribute significantly to scoring opportunities. Furthermore, there was a notable increase in xG after the first and second substitution windows, underscoring the positive effect of fresh legs and tactical adjustments on a team’s scoring potential.

Building on these insights, we trained machine learning models to predict the outcome of substitutions. Remarkably, our models accurately predicted the impact of almost three out of every four substitutions.

Overall, our analysis underscores the significance of substitutions and their ability to shape the outcome of a soccer match. By embracing a data-driven approach, teams can extract the maximum potential from their squad.


Further Reading

This blog is based on Joan Hernanz i Ibáñez bachelor’s degree thesis. The full text is available online.

Furthermore, the articles below also delve in the statistical analysis of substitutes and provide further insights and perspectives on the impact they can have in soccer:

More articles from DTAI Sports

Predicting the 2022 World Cup: Will South America Break Europe's Winning Streak?

Who are the favorites and dark-horses heading into the 2022 World Cup? We performed a statistical simulation to find out.

November 7th, 2022 · 7 min read

Women’s EURO 2022 predictions: Sweden and France favorites for title

Who are the favorites and dark-horses heading into the Women's EURO 2022? We performed a statistical simulation to find out.

June 23rd, 2022 · 9 min read