There have been many many interesting articles in the analytics community recently, most of which can be seen here. Much has been said about crosses and headers and why style matters. It’s all been discussed to death on Twitter too.
It got us thinking here at Differentgame that by using the current version of the shots model we’ve been missing a trick. The shot model only takes into account the final position of the shot and not how the ball is received. Obviously, this makes a huge difference.
A nice through ball on the deck for a one on one with the keeper isn’t the same as dealing with a huge hanging cross into the mixer under pressure from a 6ft+ galoot sticking his arms ‘n’ elbows in your mush.
In the last week we’ve seen some “key pass” statistics used as a justification of spending millions on a player or a reason why another should be kept. We’ll be straight. We detest the “key pass” statistic. Lumping every key pass together is like lumping all shots together – plain wrong. A ‘key pass’ is a pass that leads to a shot on goal bytheway.
There’s one good thing about key passes, though. If we collect the data, it could be a great proxy for what was missing from the original shots model – how the ball was received.
So we decided to conduct a little experiment that had two purposes. First up, we could show up just why lumping all key passes is bad by putting up some numbers. As we didn’t want it to take forever, we have deliberately chosen a fairly small sample size (gasp, choke proper stats people). The sample size we chose was Euro 2012 data. Small enough for it not to take forever, just big enough to get some decent results (based on previous shots model experience) and also, because we’ve been dying to look at tournament shot stats for yonks.
We looked at how Squawka break down the pitch for key passes and saw that it tallied quite well with our findings from this on assists. So we then recorded key passes from the area they came from to our usual final shot positions. Initially we looked at key passes into the main danger zone – the central area inside the box. The graph shows how many key passes from each area into the danger zone it took to score a goal during Euro 2012:
Surprisingly, we see that the least key passes were from deep wide areas. Probably a sample size issue as there were only 4 goals from there. 3 were deep crosses in England’s group of shame (Lescott, Carroll and Mellberg goals) and the other was a precision through ball by Pirlo for Di Natale to steer home. Although there are sample size issues here, there is one good thing in favour of a key pass from here – the ball is in front of the attacker.
We see that through balls from the central area of the field in the final third are the best kind of key pass. Again, the ball is in front of the attacker. This is perhaps one reason why balls played to the danger zone from within the penalty area itself aren’t scored as often. A lot of these were cuts backs from wide areas of the box meaning a) hitting the ball more against the grain and b) the likelihood of more defenders being in the way.
And finally to the humble cross from high wide areas. It took twice as many of these to convert as central through balls did.
Next we move onto key passes to the wide area of the box:
Again we see how few key passes from deep wide areas it took to score a goal. Again, the ball would be in front of the attacker. We also now see the preferable angles of key passes made from high wide areas to that of the central through ball. A central through ball to here would mean the ball running away from goal and 21 attempts needed to score a goal like this.
Finally we grouped all key passes to outside the box together to see how many were needed to score:
Having established that we considered how we could use the info. Well, we did what we did with our shots model. We mapped the averages here to the four semi-finalists team’s key passes to test the expected numbers of goals. For example, if Italy took 150 shots in the tournament and all of them were from key passes to outside the box, they’d expect to score 5 goals.
Between them Italy, Spain, Germany and Portugal scored 26 goals from key passes. The model predicted over 24 goals. If we then topped this up from the shots model with the shots they didn’t create themselves (blocks, rebounds, botched clearances etc), we’d predict 32 goals. They scored 34 between them. The shots model alone also came up with 32 goals predicted.
On this small sample size, adjusting the model to take into account how the ball arrived did no better than the shots model alone. It didn’t do any worse either. We’ll be testing it further in future!