Expected Goals Models aren’t new. They’ve been around for a long time now. In October 2012, my blog here at differentgame was in its infancy and I was writing for other sites trying to get exposure. Being an Everton fan I was looking at them in particular and why, despite being shot kings of Europe, they weren’t converting their chances efficiently. I went over to fledgling stats site Squawka and had a poke around. They had this great graphic of where Everton where scoring its goals from:
I couldn’t believe how all the little balls were clustered together. I started to click down the list checking each team. The patterns were pretty standard. It’s all common sense of course – the closer and more central you get to goal the easier it is to score. I’d just never thought about it properly before and here it was visualised for the first time (well, that I’d seen anyway).
I’d been reading the work of Mark Taylor, James Grayson and Omar Chaudhuri or a while so it didn’t take long to realise the potential of recording this info for a shots model. It took me ages despite the fact I was only putting shots into a few baskets – some pretty big swathes of pitch while separating out pens and free kicks. I called it the Shot Position Average Model (SPAM) and put it up on the site. The graphic of it ended up looking like this:
As far as I’m aware it’s the first time anyone had put their specific ‘expected goals model’ inputs onto a public forum. Of course it transpired that the likes of Opta and Prozone had been on the case making shot models for a long time before me with far fancier and far more data inputs than I’d used. Turns out those involved in the betting industry had their own way of recording shot strength too.
I built on the original SPAM to get to the Chance Creation Model which took into account not only shot position but which area of the pitch the ball arrived to the shooter from. It was quickly easy to prove that balls teed up for the shooter from inside the box or from through balls were converted at much better rates than crosses from wide areas.
Colin Trainor and Constantinos Chappas had quickly joined the march to their own expected goals models, too – taking into account many more factors than I had originally. I decided to bite the bullet and leave them to it for a while. They were doing great work and collecting data from other leagues too – not just the Premier League stuff that I was concentrating on. I started to look at possible goalkeeper models as no-one apart from Colin had done much with them.
My goalkeeper stuff is still based around a shot model, but at a much more detailed level of shot position. It’s not xy co-ordinate specific, but it’s close enough for a part time fanalyst to gain some further insight.
Previously, I’ve always argued for using ALL shots in an expected goal model. However, I can’t record a keeper’s save if the shot’s been ballooned over the bar. So I simply had to use shots on target alone. I’ve been using it to measure a keeper’s shot stopping ability, and to try and put some stabilisers on the usual save% volatility that we see year on year. I think it does a pretty good job over time but it’s clear there are some circumstances it doesn’t always account for. Still, I use what I have access to and there’s other areas such as positioning etc I can continue to look at as I carry on the work.
What has become apparent is that this version of an expected goals model, i.e. using just shots on target and more specific location, looks to get closer to what’s actually happening on the pitch than anything I’ve ‘developed’ before. It does really seem to be that this whole shot modelling shebang is overwhelmingly Kirsty and Phil – Location, Location, Location.
I’ve called this new thing Shot on Target Position Average Model (SOTPAM). I’ve plotted the following graph using data from the three full seasons I have for the Premier League. It shows the actual goal difference for each of the 60 teams on one axis and what SOTPAM says ‘should’ have been the goal difference on ‘t’other:
But what about SOTPAM’s predictive qualities? I only have two full seasons to go on but here’s how it looks for the 34 of 40 teams not relegated in that time, plotting SOTPAM goal difference for one season against actual goal difference the next:
It’s less good so far, non? It’s actually had one poor year (mainly due to Man City’s dramatic transformation in their league title winning year) and one amazing year. So far this season it’s doing pretty well.
I’ve messed about with the sample – both including data in the model from the year i’m ‘predicting’ and taking it out of the model too. It really makes no difference (0.5 goals at most over a season) – the averages from the shot zones I use don’t vary much at all from year to year.
At present my original SPAM has been less able than SOTPAM to explain any given season but more stable for prediction in this short period:
SPAM is no better than TSR. In time, with enough seasons of data behind it, SOTPAM (and any other expected goals models fanalysts develop) could outstrip TSR, but let’s not get carried away just yet. While I remember it’s worth noting that SOTPAM excludes pens, direct free kicks and og’s but I’ve counted all those things for Actual GD.
One could also add game state into the mix as a proxy for defensive pressure to give a better outcome/explanation for games. However, as we don’t know how score effects will progress in any one particular game weeks or months down the line, it’s pretty useless if you’re looking to do some pre-season prediction.
Score effects definitely matter, but at this stage, for where I’m at and for what I want to do in the near future, it’s easier and more sensible to leave it out of the equation. If you don’t agree with any of this by the way, feel free to berate me in the comments section or on Twitter…
My next stop will be to start using SOTPAM on individual strikers, which to me is infinitely more interesting than teams.