Introducing Atomic-SPADL: A New Way to Represent Event Stream Data

Being machine learning and artificial intelligence researchers, we know that how we represent the data is extremely important. On top of making analysis easier, having the right representation can really facilitate learning. That is one of the reasons we developed the SPADL representation for football event stream data and released a public set of tools that converts event stream data from various providers into the SPADL format.

Those of you who have dug around our repositories will have noticed that we have been working on another, alternative representation called Atomic-SPADL. We have gotten a couple of questions about this representation. So while we are aware of the risks of having multiple, competing representations, which is best explained by the XKCD cartoon below, we believe that discussing representations is extremely important and does not receive the attention it warrants in sports analytics. Therefore, we will briefly explain the Atomic-SPADL representation, why we developed it, and how it affects our VAEP model.

Handling the result of actions

When building models to value actions, a heavy point of debate is how to handle the results of actions. In other words, should our model make a distinction between a failed and a successful pass or not? On the one hand, an action should be valued on all its properties, and whether or not the action was successful (e.g., did a pass receive a teammate, was a shot converted into a goal) plays a crucial role in how useful the action was. That is, if you want to measure a player’s contribution during a match, successful actions are important. This is the viewpoint of SPADL and VAEP.

On the other hand, including the result of an action intertwines the contribution of the player who started the action (e.g., provides the pass) and the player who completes it (e.g., receives the pass). Perhaps a pass was not successful because of its recipient’s poor touch or because he was not paying attention. It would seem unfair to penalize the player who provided the pass in such a circumstance. Hence, it can be useful to generalize over possible results of an action to arrive at an action’s “expected value”. This is exactly the purpose of the expected goals metric which values shots by generalizing over both of their possible results.

Atomic-SPADL handles the result as a separate action

To accomodate this alternative viewpoint, we introduce Atomic-SPADL, which removes the “result” attribute from SPADL and adds a few new action and event types. In this representation, all actions are “atomic” in the sense that they are always completed successfully without interruption. Consequently, while SPADL treats a pass as one action consisting of both the initiation and receival of the pass, Atomic-SPADL sees giving and receiving a pass as two separate actions. Because not all passes successfully reach a teammate, Atomic-SPADL introduces an “interception” action if the ball was intercepted by the other team or an “out” event if the ball went out of play. We similarly divide shots, freekicks, and corners into two separate actions. Practically, the effect is that this representation helps to distinguish the contribution of the player who initiates the action (e.g., gives the pass) and the player who completes the action (e.g., receives the pass).

As a simple example, we first give 5 consecutive actions in SPADL and then the same actions in the Atomic-SPADL representation. The figure below illustrates the only goal in the France vs Belgium match in the semi-final of the 2018 FIFA world cup. The first three actions (pass, dribble, shot) detail France’s first attempt to score, which they missed. The next two actions detail the subsequent corner from which they headed the ball in.

Time	Type	x	y	dx	dy	Player	Team	Bodypart
50m0s	Pass	88.24	55.95	5.29	-12.05	Matuidi	France	Foot
50m0s	Receival	93.53	43.90	0.00	0.00	Giroud	France	Foot
50m1s	Dribble	93.53	43.90	0.00	-1.72	Giroud	France	Foot
50m2s	Shot	93.53	42.18	1.76	-1.72	Giroud	France	Foot
50m2s	Out	95.29	40.46	0.00	0.00	Giroud	France	Foot
50m33s	Corner	105.00	0.86	-3.53	29.27	Griezmann	France	Foot
50m34s	Receival	101.47	30.13	0.00	0.00	Umtiti	France	Foot
50m35s	Shot	103.24	28.41	1.76	2.93	Umtiti	France	Head
50m35s	Goal	105.00	31.33	0.00	0.00	Umtiti	France	Head

The table shows this same phase, but in the Atomic-SPADL format. Notice how both the pass and the corner kick now have a subsequent “receival” action by the player successfully receiving the ball. In addition, the results of the shots have also been explicitly added to the event stream in the form of “out” and “goal” events. Finally, we removed the distinction between “corner_crossed” and “corner_short” (as encoded in SPADL), as at the exact moment the player kicks the ball, we have no idea whether the corner will be crossed or taken short. Hence, we merge these two actions into the simpler “corner” action.

Advantages

Empirically, we have noticed two benefits of using the Atomic-SPADL representation. First, the standard SPADL representation tends to assign shots a value that is the difference between the shot’s true outcome and its xG score. Hence, goals or a number of misses, particularly for players who do not take a lot of shots can have an outsized effect on their VAEP score. In contrast, Atomic-SPADL assigns shots a value closer to their xG score, which often better matches domain experts’ intuitions on action values.

Second, Atomic-SPADL leads to more robust action values and player ratings. A good rating system should capture the true quality of all players. Although some fluctuations in performances are possible across games, over the course of a season a few outstanding performances (possibly stemming from a big portion of luck) should not dramatically alter an assessment of a player. In our prior work comparing VAEP to xT, one advantage of xT was that it produced more stable ratings. Using Atomic-SPADL helps alleviate this weakness.

Following our earlier methodology to determine the robustness of a player rating system, we split one season of data in two random disjoint subsets. Subsequently, we computed each players’ average rating separately for both subsets and evaluated the Pearson correlation between both. Our traditional approach (VAEP + standard SPADL) resulted in a Pearson correlation of only 0.25 (compared to xT’s 0.89), while using Atomic-SPADL resulted in a huge bump in robustness to a Pearson correlation of 0.65. This suggests that using the Atomic-SPADL format allows VAEP to get closer to xT’s robustness, while keeping its important benefits such as capturing the risk-reward trade-off and reasoning over a rich information set of the action context.

Code and example notebooks are available in a seperate ‘atomic’ branch of our socceraction package.

Introducing Atomic-SPADL: A New Way to Represent Event Stream Data

Handling the result of actions

Atomic-SPADL handles the result as a separate action

Advantages

More articles from DTAI Sports

Exploring how VAEP values actions

Een terugblik op de Jupiler Pro League