May 19, 2014

A Shooting Model – An Exp(G)lanation and Application

If you’re a follower of the growing analytics movement in football you’ll probably have heard of the term ‘expected goals model’. The term may have been shortened to ExpG, xG or something, but whatever it’s been named, it’s basically a shots model.

What’s a shots model, then?

A shots model puts a theoretical goal value on a shot that a team or player takes during a game.

The value of a shot is either 0 (no goal) or 1 (a goal) isn’t it? So what’s the point?

Yep, the final outcome is always 0 or 1. However, in order to better analyse teams (and players) it’s not really good enough to say x scored 55 goals this season and y scored 60. It still begs the question of why that was the case. Did x have less shots? Did y have easier chances?

Ok, you can count up shots, but how do you quantify how easy a chance is?

We can begin to assign an ‘expected goal’ value for each shot by looking back into history. Not all shots are the same, so you have to group similar ones together. How often over time is the same type of chance converted? Working this out will give us an ‘expected goal’ value. Personally, I’m not sure I like the term ‘expected’ goal – it’s more of an average benchmark score to simply compare each team or player’s shots. However, expected goals is how it’s mostly referred to so I’ll stick with.

Surely there are too many variables to just lump shots together like this?

Each expected goals model that you come across will have different inputs. All of them use the location of where the shot was taken from. However, each model will be different in how detailed its location data is. Some models will take into account how the ball was delivered to the person making the shot (through ball, cross etc), and some will take into account whether the shot was made by foot or by header. Some models take into account all shots (on target, off target, blocked) and some take into account ball placement (in the corner, straight down the middle). What’s missing from all models in the public domain is defensive positioning – where are defenders in relation to the shooter? However, some modellers talk of adding ‘game state’ into the mix as a proxy for this – conversion rates vary depending on the scoreline (the theory being teams change the balance of their attack/defence based on what the score is).

So what does your model use?

My current model uses only shots on target. The only other driver is where on the pitch the shot was taken from. I split shots into 46 ‘bins’. Direct free kicks and penalties have their own bins:

Is that it? How much data have you got?

Yep, that’s it – I want to keep the model as simple as possible to understand while still having something that works. I have over 13000 shots on target recorded covering the last four Premier League seasons.

How do you apply the data?

As I’ve said, those 13000-odd shots have been put into the 46 location bins plus the direct free kick and penalty ones. That gives me an average goal value for each of those shot location bins. For example, a shot on target from bin 14 is worth a theoretical 0.59 goals. If a team gets a shot on target from bin 5 it’s worth a theoretical 0.91 goals. I can then tot up a team or players shots over a season to find out what the average team/player benchmark would be for those shots.

Using a mathematical technique called linear regression, I can see how well the model fits to each of the 80 team performances over the last 4 seasons. I can plot the theoretical (or expected) goal difference for each team against the actual goal difference they recorded that season. In other words, what’s the correlation between shot on target location (for and against) and goal difference? A perfect correlation would return a ‘r2’ value of 1. If there was no relationship it would return a ‘r2’ value of 0. As it turns out the r2 value returned is a healthy 0.878:

I’m still not sure where this is going?

Prozone’s Omar Chaudhuri has shown on his blog 5addedminutes why goal difference is a good indicator of the sustainability of a team’s long term results and why sometimes, the league table does lie.

If the relationship between Expected Goal Difference (ExpGD) and Actual Goal Difference is strong, and the relationship between Actual Goal Difference and Actual Points is strong, then it follows that ExpGD matters when it comes to placings and points in the league table.

Man United’s ExpGD was in the 20s in the first three years of this model. This year it was less than 12. Man City’s ExpGD in the first year of this model was barely over 2. It ballooned into the 30’s and 40’s the last three years and they’ve won the league twice in that time.

Liverpool’s ExpGD started in the late teens in the first year and it’s risen every year since and into the 30’s this season. Arsenal are the opposite of Liverpool and is posting ever decreasing ExpGDs. Arsenal keep having to battle right to the end to ensure Champions League football whereas previously it was almost a given. These baseline numbers matter – they are a great explanation of what actually happens over time.

If we looked at these team’s possession figures, it wouldn’t tell us anywhere near the same story. I tested the correlation of all teams’ possession% with its goal difference during the same period in the Premier League. The r2 value was a much lower 0.546.

Is the model any good at predicting what’s going to happen the following season?

If I plot a team’s ExpGD one year against the one it’s posted the following year, this is what it looks like:

Again the relationship looks decent with an r2 of 0.7082. However, there’s some whopping outliers – Man City’s hugely positive change from 2010/11 to 2011/12 being the main one.

How about predicting what’s actually going to happen the following year?

Ok, I can also plot ExpGD one year to ActualGD the next and it looks like this:

Again that Man City improvement in 2011/12 alone make the figures look worse than the overall trend might actually be. I expect to be able to show a better predictive value of ExpGD as more data is added to the model as seasons go by.

As the model currently stands, this is how it would predict the league table to look next season if I firstly just used last season’s numbers to predict it (left) or forecast it using 2011-14 numbers to predict it (right):

For the purposes of the newly promoted teams I just used the average of previously promoted teams ExpGDs. I also used the formula in that Omar Chaudhuri piece I mentioned up top, to simply convert GD to points.

Got any comments, questions or criticisms? I’m sure you have. My aim is to try and make the analytics movement as open as possible. I’m still learning myself every day and continually try to make things as understandable as possible. Inevitably things still turn out too mathy, but it comes with the territory.

Get in touch on Twitter @footballfactman or comment below.

Categories:

Sports

· Tagged:

expected goals, expected goals model, Man City, repeatability

48 responses to “A Shooting Model – An Exp(G)lanation and Application”

Vladimir

May 19, 2014

Excellent piece. I think the huge problem to predict what is going to happen next season is coach and player changes. For instance, Man United probably will play different football with new coach and new players next season.
Anton

May 19, 2014

Nice and explanatory! Wrong name on the y-axis in ExpGD y1 vs ExpGD y2 chart.
differentgame

May 19, 2014

Good spot! Will sort it later
differentgame

May 19, 2014

Yep, lots may change, but I don’t think positionally those predictions will be far off in general…
Aaron

May 19, 2014

Paul, another great piece!
I, too, have recently embarked on building an ExpG model, specifically with the view of investigating the appropriateness of using ExpG as a measure of phases of pressure during a game (perhaps something similar has already been done – I’m slowly working my way through the great work in the public domain).
Can you recommend useful data sources or is endless hours of data entry something us enthusiasts have to just grin and bear?
Many thanks
schmeet1

May 19, 2014

Hi.

Great post and well explained.
Out of interest, where is the location data coming from.

Also, could you briefly explain why Shots Off Target are not counted.

Cheers.
ajhsportanalyst

May 20, 2014

Hi

Very interesting article, particularly how the data started to spread as you went from ExpGD v ActGD into trying to predict for future seasons.

Out of interest, how did you work out the theoretical goal value for each of the different bins?

Thanks
differentgame

May 20, 2014

Simply no of goals from that bin divided by no of shots in that bin
differentgame

May 20, 2014

Yep its a grin n bear job. Squawka, statszone, whoscored etc
differentgame

May 20, 2014

See reply to other comment!
Predicting, Prospecting and Expecting Goals | differentgame

May 28, 2014

[…] whistles due to the quality of data they collect. Details of my main Expected Goals model is here. From what I can glean from the bits ‘n’ pieces Prozone leak from time to time, the […]
Safe hands? Is your keeper performing as well as expected? | differentgame

June 11, 2014

[…] I’ve previously discussed, my main Expected Goals (ExpG) Model is based entirely on where shots on target were taken from. […]
Barry, McCarthy and Gibson – The Long and Short of it | differentgame

July 15, 2014

[…] analysis is the kind of stuff you’ll find here on this site. As the season progresses we’ll be using many more tools to try and provide […]
Tim Howard – | differentgame

July 21, 2014

[…] 6 yards directly in front of goal is harder to save then a speculative pop from 30 yards out. The model uses shot location to establish the difficulty of a save. Howard’s performance against Belgium was ranked only 10th during the World Cup by this […]
Another Look at Lukaku | differentgame

July 30, 2014

[…] is that Lukaku, despite having an excellent per 90 minute strike rate, is still not meeting his expected goal […]
Leicester 2 Everton 2: The Warm Down | differentgame

August 17, 2014

[…] ExpG model says Leicester created the best chances in total. There was only an 11% chance Everton would score […]
soccerlogic

August 25, 2014

Like it! Very interesting. Any chance of sharing the data (you effort would be gratefully acknowledged). I would like to fit a non-linear model to get what I believe would be additional useful insights on the subject. Thanks!
Old Man Eto’o – Do the numbers still add up? | differentgame

August 27, 2014

[…] scored 9 goals in the minutes equivalent of about 14.5 games. This more or less matched the 8.37 expected goals tally for the shots he took. There is no sustainability issue around finishing skills here. I ran […]
Are Aston Villa really the worst team in the league? | differentgame

October 12, 2014

[…] ago? Maybe, but the underlying numbers tell a different story. Villa currently sit bottom of my Expected Goal Difference […]
Tim Howard – Leaving the Comfort Zone | differentgame

February 2, 2015

[…] % of any first choice keeper in the Premier League this season. Adjusting for overall shot quality (using this method) he fares a touch better and ranks 18th out of […]
Using R for Football Data Analysis – Monte Carlo | Stat Attack

February 24, 2015

[…] 4. Some knowledge of what expected goals is. Great explanation by @footballfactman here. […]
Left-Wing Soccer – 100 blogs to follow in 2015

March 1, 2015

[…] Must read: A shooting model – Exp(G)lanation and Application (click here to read). […]
The Premier League. Should he stay or should he go? | differentgame

April 15, 2015

[…] contract until 2018. Early in the season, Villa had been flying high and second in the league. The underlying numbers told a different story: Villa were actually the worst team in the league. Five months later, […]
Ed

June 9, 2016

why would you not include the ‘shots’ that are off target? If I shoot and I miss the target from one specific spot in the box and then I have another 10 shots from that same spot for the rest of the season and I score 7 and another 3 are saved, that means I’ve had 11 shots and only scored 7. on target, off target, it doesn’t matter. you have to consider the off target shot otherwise you’re overstating your conversion ratio. If it is definitely a shot, then it MUST be considered. Data collation leaves room for interpretation and that’s where variability will lie.
Sabermetrica nel calcio – Scientificast

July 11, 2016

[…] modello a “Bin” di Michael Clay la metà campo è divisa in 46 aree e segnare dalla casella 14 ha una probabilità teorica del 59% […]
On the data collection process – Minor League Soccer

October 25, 2016

[…] of analytics stuff online, specifically Paul Riley‘s blog differentgame – especially this post. Reading it made something click heavily which got me to spring into action – so a massive […]
On the data collection process – Minor League Soccer

November 17, 2016

[…] analytics stuff online, specifically Paul Riley‘s blog differentgame – especially this post. Reading it made something click heavily which got me to spring into action – so a massive […]
To Hull and Back | differentgame

December 5, 2016

[…] ranked at that time according to the volume and quality of chances created versus those conceded (xGD), then Hull were 17th […]
Fraser Forster’s Failings | differentgame

December 31, 2016

[…] looks poor, non? If I throw all those SoTs through my xG simulator, it tells me that there’s only around a 5% chance that it looks this bad. The model […]
Expected Goals using machine learning – Cricket Savant

January 22, 2017

[…] angles so should therefore have a similar xG. This is the concept of binning as described in this model. The number of goals divided by the number of shots from inside a particular bin gives us the xG […]
estilosdejuego

March 25, 2017

Hi there! Thanks a lot of sharing info! I am working in a observational sheet and in the order to implement some functions for the data analysis I would like to practice a bit with some data which has been already analyzed, maybe I didnt express myself good enough. Could you reupload the sample?

Thanks in advance, Daniel.
Is Manolo Gabbiadini The Striker Southampton Have Been Looking For? – POTP

March 27, 2017

[…] expected goals value in this piece are done using a similar method to @footballfactman detailed here. I split the pitch into pretty identical zones but used all shots rather than just shots on target. […]
The Data Scout Miniblog – Leon Goretzka | differentgame

April 19, 2017

[…] sides, Schalke’s attack is fairly inefficient and they aren’t as high in the table as xG suggests they should […]
Introducing: Window Shopping – POTP

May 31, 2017

[…] Goals (xG) – For this I took the same method as @FootballFactMan as he explains here. The pitch is split into the same zones, the difference is I’ve used all shots rather than […]
The Big Data tsunami hits the world of soccer as we know it

October 7, 2017

[…] goals are the likelihood for a particular attempt to score, according to a particular model. It is only one of the many metrics considered in match analysis, which is progressively turning […]
Calcio e Analytics: lo tsunami dei Big Data sta arrivando

October 7, 2017

[…] attesi, ovvero la probabilità che una particolare azione venga convertita in goal secondo un modello di calcolo ad hoc. Si tratta solo di uno dei tanti indici che vengono considerati oggi nell’analisi […]
Expected Goals. Un’introduzione for Dummies – Calcio Studiato

March 26, 2018

[…] però, se volete farvi un’idea di quello che c’è in giro potete dare un’occhiata qui e qui. Sono in lingua inglese e gli autori, rispettivamente, Michael Caley e Paul Riley sono considerati […]
Statistiche sabermetriche: una panoramica e una possibile applicazione nel Futsal – Mister C

September 19, 2018

[…] di 4 stagioni di Premier League ed ha elaborato un modello per la valutazione qualitativa dei tiri (qui trovate il suo studio […]
Il calcio in evoluzione: lo tsunami degli analytics sta arrivando

June 13, 2019

[…] attesi, ovvero la probabilità che una particolare azione venga convertita in goal secondo un modello di calcolo ad hoc. Si tratta solo di uno dei tanti indici che vengono considerati oggi nell’analisi […]
分析不同的预期进球模型 – 必高交易学院

June 26, 2019

[…] Paul Riley的模型是一个很好的例子：在建立一种xG模型时，它采用了一种稍微更加先进的分析射门位置数据的方法。 […]
Inteligencia Artificial y su rol en el éxito de Perú en Fútbol – Voxiva Perú

July 5, 2019

[…] https://differentgame.wordpress.com/2014/05/19/a-shooting-model-an-expglanation-and-application/ […]
«Удары по воротам даже более важный показатель, чем сами голы». Как xG-модель изменила футбол | INFOS.BY

April 17, 2020

[…] Полное описание … […]
«Сколотил состояние, регулярно обыгрывая букмекеров». Необычное применение xG-модели | INFOS.BY

April 22, 2020

[…] Полное описание … […]
ANALISI DEI DATI NEL CALCIO: INGEGNERIA DEL TIRO IN PORTA – Ingegneria del Calcio

May 12, 2020

[…] Per accennare al concetto, ho scelto lo studio di Paul Riley (LINK DIRETTO), il quale è stato tra i primi a fornire una definizione completa di “expected goals” (LINK DIRETTO). […]
تجزیه و تحلیل مدل گل های مورد انتظار | گل های مورد انتظار چیست؟ | فوتبالی

September 14, 2020

[…] مدل پائول رایلی مثالی خوبی از مدلهایی است که وقتی میخواهند یک مدل xG بسازند روش کمی پیشرفته تری برای تجزیه و تحلیل مکان شوت دارند. […]
The Big Data tsunami hits the world of soccer as we know it

January 7, 2021

[…] goals are the likelihood for a particular attempt to score, according to a particular model. It is only one of the many metrics considered in match analysis, which is progressively turning […]
تجزیه و تحلیل مدل گل های مورد انتظار | گل های مورد انتظار چیست؟ | مجله فوتبالی

March 21, 2021

[…] مدل پائول رایلی مثالی خوبی از مدلهایی است که وقتی میخواهند یک مدل xG بسازند روش کمی پیشرفته تری برای تجزیه و تحلیل مکان شوت دارند. […]
StatsBomb

March 1, 2022

[…] meaningful in the long term. The graphic below shows the the increase in total shots conceded and Expected Goals Against last season. This is almost always bad news: Thanks to the work of Colin […]