Predicting, Prospecting and Expecting Goals

So I’ve started to go into the uses of Expected Goals Models here at Differentgame and today I’m looking at predicting striker goal returns. Once again I’m using a recent Prozone presentation (if you’ve got half an hour to spare) to introduce an idea. This time it’s Omar Chaudhuri (@omarchaudhuri) putting some context into using data to scout a striker:

Prozone obviously have their own Expected Goals model which I imagine has all sorts of fancy bells ‘n’ whistles due to the quality of data they collect. Details of my main Expected Goals model is here. From what I can glean from the bits ‘n’ pieces Prozone leak from time to time, the difference in final outcomes between my model and theirs is fairly small. Shot volume and location are the key inputs to any of these models.

I wanted to start by looking at a couple of the seemingly freak occurrences in the Premier League this year. First up is Michu. Hailed as a 2m super-buy in his first season at Swansea, the goals dried up this year. Although he’s been restricted by injury, the Spaniard’s return of 2 goals from 17 league appearances was a huge nosedive.

Can Expected Goals (ExpG) tell us something about this? I’ve previously looked at the pretty linear relationship between expected goals and actual goals over time in a piece on Rooney, Suarez and Van Persie. To be the best it ‘only’ needs a player to be just above average – they simply need to do it consistently.

What this tells us, is that any major-league outdoing of ExpG in a season by a player will likely be followed by a regression – you just have to wait for it. According to the model, Michu could have expected to score 12 goals from open play in his first season at Swansea. He actually ended up scoring 18. It was a fairly huge over-achievement.

If I do all the things Omar talks about in the video above -take out penalties, standardise things by using ‘per 90 mins’ rather than per game to make things fair, I can knock up rolling plots for players to compare their actual goal and ExpG tallies. If I stop at the end of the 2012/13 season, I can simply do a linear forecast for each tally to see what each would have predicted for Michu this year:

MichuForecastv2.jpgIf I’d have forecast based on what he did do in his first season, it’d have Michu as having around 26 open play goals by now. The ball on the chart represents where he actually is at – 20 goals. As Omar explained, for the vast majority of players what actually happens one season has little relationship to what happens the next. Forecasting using ExpG would have got me a lot nearer – it predicted that Michu wouldn’t trouble the electronic scoreboard at all. As we know, he barely did.

What of another player who bombed this season? What about Roberto Soldado? A monster year at Valencia followed by a shocker at Spurs. Here’s his plot:

SoldadoForecastv2Again, basing a forecast on what did happen previously sees me wide of the mark. Basing the forecast on ExpG gets me a lot nearer where Soldado is actually at today. Remember, if we’ve seen that Rooney et al are pretty much in line with their ExpG tallies over the last few seasons, why would we expect the likes of Michu and Soldado to continue to smash defences up in the Premier League?

Unlike Michu or Soldado, Suarez is a player who was under-performing his ExpG for a long time. Using his ExpG tally at the end of last season, again we could have got nearer to predicting his current actual goal total than if we used his actual past performance.

SuarezForecastv2However, the prediction was still 9 goals short. Also, if you study the plot a bit closer you’ll see that in some cases it takes more than a season for the red and blue lines to get close together. Suarez joined Liverpool in a January transfer window. The lines were close until the end of that first season, before 18 months of less than average finishing kicked in. A prediction using ExpG after that first few months would have been off too. It’s taken an almost superhuman (Messi and Ronaldo) effort on the Uruguyan’s part to overtake his ExpG in recent times. More on Suarez later…

So far I’ve looked at players who have over or underperformed ExpG. What of those who stay roughly on track? Have they continued to do so? I’ll start with Christian Benteke:

BentekeForecastv2Forecasting on ExpG gets it bang on in Benteke’s case. How about a player who changed teams last summer? Here’s Romelu Lukaku:

LukakuForecastv2Lukaku has very slightly under-performed ExpG during his Premier League career. His actual goal rate was a slightly better predictor for this season than his ExpG numbers. It was close though, and in general ExpG is the winner in the examples here.

If I use all the data I have for these players and assume they follow ExpG lines, this is how the model predicts each player will do next season:

strikerpredsThe Suarez prediction here looks undercooked. The model simply assumes that Suarez will regress right back after recent huge over-performance. No model is infallible – they are simply a guideline and one would expect a human hand on the tiller picking up where things don’t look quite right. That said, he’s recently undergone surgery, will have next to no rest during the summer and will have European football to contend with next season too.

Since around Christmas time this year, Suarez’s ExpG numbers have taken a real nosedive. ExpG suggests that Suarez should be a regular 20 goal a season striker – not a 25-30 goal a season one. Remember that’s based on open play goals only and you could probably add on a free kick goal or two to that.

Luis Suarez – Photo by Jimmy Baikovicius @flickr

Michu is also still suffering here from that massive first season in English football. The other reason is that his ExpG per game is getting lower due to Bony’s arrival and having to play a different role.

In conclusion, if you’re shopping around for a striker, high ExpG numbers per 90mins, not high actual goal tallies per 90mins should generally be on your check list. Of the strikers here, Lukaku leads the way with 0.58 ExpG per 90, narrowly beating Suarez’s 0.57. Benteke’s ExpG per 90 stands at 0.41, Soldado 0.34 and Michu 0.33.

Later in the summer I’ll be looking at more individual players in much greater detail, incorporating both player role and team styles into the mix. There’ll be more context and nuance than the few examples given above which will hopefully provide a slightly better platform for future prediction.

Many thanks to @jameswgrayson for technical help with the plots. Visit James’ Blog here.

As usual, any questions/comments get in touch below or on Twitter @footballfactman.





%d bloggers like this: