In 16 LaLiga seasons, Messi has racked up 2162 goal-scoring attempts with a cumulative xG value of 339.59. Thus, the xG model estimates that Messi should have scored about 340 goals from these chances, based on the type and the location of each shot. However, Messi has scored more goals, a lot more. In fact, in this time he has netted 444 goals. Given the xG value of each of his shots, the probability of scoring 444 goals or more is 0.0000000015%. It is nearly impossible to shoot that efficiently.
Numerous articles have been written about his great finishing skills. And yet, we have barely scratched the surface of understanding quite how Messi does this. We know that the majority of Messi’s goals come from his left foot, that he is average at scoring headers and that he is deadly from outside the penalty area. However, these observations can only partially explain Messi’s outstanding conversion rates. Is he simply better at converting shots, or are there specific shot contexts in which he sets himself apart from the rest?
One of Messi’s trademark finishes against Athletic Bilbao on 27-04-2013: he dribbles into the box, cuts in from the right and meticulously deposits the ball in the far corner with his left foot.
In this blog post, we analyze Messi’s finishing skills by comparing an xG model trained on Messi’s shots with an xG model trained on the shots of all other players. Specifically, we address the following question in this blog post: In which contexts is Messi more or less effective at converting shots than the average player?
A Bayesian Approach
Given Messi’s ability to convert shots in a crowded penalty area (illustrated by the example above), we believe that it is essential to model the location of the defense and the space available when analysing Messi’s shooting skills. In our previous blog post, we showed how to build such a model using StatsBomb’s freeze frame data and a gradient boosted trees learner. Using this model we can accurately estimate the xG value of Messi’s shots and quantify how this value is affected by characteristics such as the distance to the goal, the number of defenders in the area between the shot and the goal,… But the actual question still remains: How can we compare Messi’s scoring (or any player’s) to other players? Two factors complicate solving this problem:
- To make a meaningful comparison, you need data about many shots. Considering the nature of soccer, most players have taken a relatively small number of shots.
- Even if you have a sufficient number of shots for a player, it goes without saying that you have to take sufficient shots in each context. For example, one can have taken many shots but few headers. This would prevent the model from learning the correct expectancy for the headed shots of the given player.
Fortunately, we can use an approach that addresses both of these problems: Bayesian inference.1 Bayesian inference provides two advantages. First, it provides an estimate of its uncertainty about how important a variable is in the form of a confidence interval. Second, the model’s uncertainty is directly tied to how much evidence exists and how strong the evidence is. For example, if a player frequently scores headed goals, the model will become confident that this player is better at heading.
In this problem, we apply Bayesian inference to train a basic logistic regression model, for which we selected the 15 most important shot features as predictors. First, we train a model using 18,882 open play shots by LaLiga players excluding Messi. This model’s coefficients (i.e., the parameters) tell us how much the xG of a typical LaLiga player increases or decreases when the distance to goal increases, when the shot is a header, etc.
Second, we use the coefficients of this first model as the initial coefficients of a second model which we train on the 1,713 shots by Messi. This has the effect of updating the model’s coefficients based on the characteristics of Messi’s shots. Intuitively, three things can happen. If there is sufficient evidence that Messi differs from the typical player in some respect, then the coefficient for this variable will get a different value in the Messi model than the typical player model. If there is some, but not overwhelming evidence that Messi is different, then the Messi model will increase its uncertainty about a coefficient’s value (i.e., increase its confidence interval). Finally, if Messi and a typical player perform similarly, then the coefficient will remain unchanged.
Where Messi Excels
The figure below shows the coefficients2 of the features belonging to the population and Messi models. The coefficients stand for the effect of a feature on goal expectancy. Positive values increase and negative values decrease the scoring probability of a shot.

Here are our key takeaways on Messi:
- Sharpshooter: Two features describe the effect of the shot’s distance on scoring: “Distance to goal” and “Distance to keeper”. Both features are obviously highly correlated, since distant shots are typically further away from the goalkeeper too. However, while the distance to goal affects the scoring expectancy of Messi and the population equally, Messi is significantly less affected by a large distance to the goalkeeper. This means that the model assumes Messi to be much more effective at converting typical long range shots where the goalkeeper is on his line.
- Don’t give him space: This is a phrase that coaches often use when their team opposes Messi, and they have every right. The free projection feature shows us that when Messi is given a square of goal that is not blocked by a defender or the goalkeeper, he is more skilled than any other player to deposit the ball in that free square. To verify this we first assigned each shot to a bin according to the estimated size of its free projection area and subsequently measured the proportion of shots in each bin that actually ended up in that free projection area. The result is visualized in the figure below. Moreover, as seen in the ”# of defenders in the shot angle” feature he suffers less penalty even when there are many players between him and the goal.

- He is just better: Lastly, the model intercept says it all: he is just better. There is a significant difference in goal expectancy when Messi is the one taking the shot.
If we map that back to the earlier example, we would get an xG value of 0.1226 for Messi, while an average player would get an xG value of only 0.0336 in the exact same situation. That is mainly due to different intercept, which represents 52% of the difference, but also due to the large distance to the goalkeeper (24%), the fact that Barcelona was behind at that point of the game (6%), the small free projection (5%) and the many defenders in the shot angle (4%). This is visualized in the figure below, which shows the weight of the five most distinguishing features between Messi and the average player for this particular shot.

Conclusion
After an unfortunate year that culminated in the humiliating 8-2 defeat to Bayern Munich in Lisbon, it seemed for a while that Messi was about to leave Camp Nou. Luckily for Barcelona, they can rely on Messi’s goal-scoring prowess in what might come to a difficult season of transition. With his ability to convert the slightest chance and to score goals without any distance boundaries, Messi comes with a guarantee for goals.
This blogpost is the second part of a summary of Anıl Cem Arslan’s master’s thesis. The first part about encoding the position of defenders in an xG model can be found here. Data provided by StatsBomb.

Footnotes
-
We are not the first ones to apply Bayesian inference to xG. Marek Kwiatkowski used the same approach to quantify the finishing skill of players. Marek looks at finishing skill as an overall additive boost to the predictor, while we look at finishing skill as an additive boost to specific feature coefficients (i.e., a boost in specific shot contexts). ↩
-
High density intervals (94%) for the posterior distributions ↩