Data-based analysis in football on the example of the xG-model

Football is all about goals. And luck.” This quote is often used when some fans or coaches try to explain, why their team is not so successful at the moment. And this assumption is not wrong at all! In 2018, researchers of the TU Munich figured out, that 47% of all goals scored in the season 2016/17 in both, German Bundesliga and English Premier League, that been affected by some sort of luck [1]. But in a business worth a few billion dollar (for this check our article “Data in Professional Football”), Good or bad luck is never a good advisor. To get an inside about a team’s or player’s performance without the factor, some different indicators were developed. These Key Performance Indicators (KPI) determine the quality of a team or a player due to the desired aspects. Different KPIs can for example rate the distance covered during a match or the pitch control (learn more in an upcoming article). Also stats like pass quality or the exploitation of opportunities.

The latter can be measured with the expected-goals-method, short xG. This statistical model predicts the probability for a goal to be scored based on historic data. The parameters used for this model are, among others, the distance of the shot, the player’s angle to the goal posts and the number of players standing in the ball’s flight curve. The German Bundesliga uses an algorithm built by Amazon AWS, which has analysed 40,000 shots on goal to train its AI[2]. The xG-model rates the quality of a goal shot. For example, if team A has had 15 goal shots with an xG of 0.1 (could be 10 shots from central 20m to the goal) and team B has had 2 goal shots with an xG of 0.75 (2 penalties), than team A will probably with 5:2, if they have a world-class shooter from outside the box. Or lose 0:2, if every shot was taken by a player whose shooting skills from far outside are humble or every shot had been blocked. Taking the xG model, the game would end with a draw, namely 1.5 (15×0.1):1.5(2×0.75), although Team A has the odds ratio in their pocket with 15:2. This shows, that the result of a football game is dependent on many parameters, like the players’ quality or even luck. An especially crazy situation can be found in the 2019/20 season of German 3. Liga, where the 1. FC Magdeburg leads the table after 28 games and are close to promotion, if you take the xG-model as basis for all results. In reality, they sit on P 15 and have to fight against relegation, whereas the SpVgg Unterhaching faces the complete opposite, being on P2 with the xG-model having them on P15[3].

[1] https://www.sueddeutsche.de/muenchen/fussball-tore-zufall-forschung-professor-fc-bayern-dusel-1.4008709

[2] https://www.bundesliga.com/de/bundesliga/news/neue-echtzeit-spielanalysen-der-dfl-und-amazon-web-services-11247

[3] https://blaugelbedatenwelt.com/2020/06/01/3-liga-wahre-tabelle-xpts-und-die-xg-xa-elf-des-tages-28-spieltag/ (graphic)