Introducing Atomic-SPADL: A New Way to Represent Event Stream Data

Jesse, Tom, Pieter
May 5th, 2020 · 4 min read

Being machine learning and artificial intelligence researchers, we know that how we represent the data is extremely important. On top of making analysis easier, having the right representation can really facilitate learning. That is one of the reasons we developed the SPADL representation for football event stream data and released a public set of tools that converts event stream data from various providers into the SPADL format.

Those of you who have dug around our repositories will have noticed that we have been working on another, alternative representation called Atomic-SPADL. We have gotten a couple of questions about this representation. So while we are aware of the risks of having multiple, competing representations, which is best explained by the XKCD cartoon below, we believe that discussing representations is extremely important and does not receive the attention it warrants in sports analytics. Therefore, we will briefly explain the Atomic-SPADL representation, why we developed it, and how it affects our VAEP model.

Handling the result of actions

When building models to value actions, a heavy point of debate is how to handle the results of actions. In other words, should our model make a distinction between a failed and a successful pass or not? On the one hand, an action should be valued on all its properties, and whether or not the action was successful (e.g., did a pass receive a teammate, was a shot converted into a goal) plays a crucial role in how useful the action was. That is, if you want to measure a player’s contribution during a match, successful actions are important. This is the viewpoint of SPADL and VAEP.

On the other hand, including the result of an action intertwines the contribution of the player who started the action (e.g., provides the pass) and the player who completes it (e.g., receives the pass). Perhaps a pass was not successful because of its recipient’s poor touch or because he was not paying attention. It would seem unfair to penalize the player who provided the pass in such a circumstance. Hence, it can be useful to generalize over possible results of an action to arrive at an action’s “expected value”. This is exactly the purpose of the expected goals metric which values shots by generalizing over both of their possible results.

Atomic-SPADL handles the result as a separate action

To accomodate this alternative viewpoint, we introduce Atomic-SPADL, which removes the “result” attribute from SPADL and adds a few new action and event types. In this representation, all actions are “atomic” in the sense that they are always completed successfully without interruption. Consequently, while SPADL treats a pass as one action consisting of both the initiation and receival of the pass, Atomic-SPADL sees giving and receiving a pass as two separate actions. Because not all passes successfully reach a teammate, Atomic-SPADL introduces an “interception” action if the ball was intercepted by the other team or an “out” event if the ball went out of play. We similarly divide shots, freekicks, and corners into two separate actions. Practically, the effect is that this representation helps to distinguish the contribution of the player who initiates the action (e.g., gives the pass) and the player who completes the action (e.g., receives the pass).

As a simple example, we first give 5 consecutive actions in SPADL and then the same actions in the Atomic-SPADL representation. The figure below illustrates the only goal in the France vs Belgium match in the semi-final of the 2018 FIFA world cup. The first three actions (pass, dribble, shot) detail France’s first attempt to score, which they missed. The next two actions detail the subsequent corner from which they headed the ball in.

1345
TimeTypexydxdyPlayerTeamBodypart
150m0sPass88.2455.955.29-12.05MatuidiFranceFoot
50m0sReceival93.5343.900.000.00GiroudFranceFoot
250m1sDribble93.5343.900.00-1.72GiroudFranceFoot
350m2sShot93.5342.181.76-1.72GiroudFranceFoot
50m2sOut95.2940.460.000.00GiroudFranceFoot
450m33sCorner105.000.86-3.5329.27GriezmannFranceFoot
50m34sReceival101.4730.130.000.00UmtitiFranceFoot
550m35sShot103.2428.411.762.93UmtitiFranceHead
50m35sGoal105.0031.330.000.00UmtitiFranceHead

The table shows this same phase, but in the Atomic-SPADL format. Notice how both the pass and the corner kick now have a subsequent “receival” action by the player successfully receiving the ball. In addition, the results of the shots have also been explicitly added to the event stream in the form of “out” and “goal” events. Finally, we removed the distinction between “corner_crossed” and “corner_short” (as encoded in SPADL), as at the exact moment the player kicks the ball, we have no idea whether the corner will be crossed or taken short. Hence, we merge these two actions into the simpler “corner” action.

Advantages

Empirically, we have noticed two benefits of using the Atomic-SPADL representation. First, the standard SPADL representation tends to assign shots a value that is the difference between the shot’s true outcome and its xG score. Hence, goals or a number of misses, particularly for players who do not take a lot of shots can have an outsized effect on their VAEP score. In contrast, Atomic-SPADL assigns shots a value closer to their xG score, which often better matches domain experts’ intuitions on action values.

Second, Atomic-SPADL leads to more robust action values and player ratings. A good rating system should capture the true quality of all players. Although some fluctuations in performances are possible across games, over the course of a season a few outstanding performances (possibly stemming from a big portion of luck) should not dramatically alter an assessment of a player. In our prior work comparing VAEP to xT, one advantage of xT was that it produced more stable ratings. Using Atomic-SPADL helps alleviate this weakness.

Following our earlier methodology to determine the robustness of a player rating system, we split one season of data in two random disjoint subsets. Subsequently, we computed each players’ average rating separately for both subsets and evaluated the Pearson correlation between both. Our traditional approach (VAEP + standard SPADL) resulted in a Pearson correlation of only 0.25 (compared to xT’s 0.89), while using Atomic-SPADL resulted in a huge bump in robustness to a Pearson correlation of 0.65. This suggests that using the Atomic-SPADL format allows VAEP to get closer to xT’s robustness, while keeping its important benefits such as capturing the risk-reward trade-off and reasoning over a rich information set of the action context.

Code and example notebooks are available in a seperate ‘atomic’ branch of our socceraction package.

More articles from DTAI Sports

Exploring how VAEP values actions

Introducing an interactive tool to explore how VAEP values player actions in soccer.

April 27th, 2020 · 2 min read

Een terugblik op de Jupiler Pro League

[DUTCH] We conduct a data-driven review of the 2019/2020 season of the Belgian league.

April 8th, 2020 · 4 min read