Ok, so we all make mistakes.
Thanks to a couple of questions from readers (one in the comments section, one from @11tegen11) we spotted some flaws in our previous graphics. In trying to answer the questions we noticed that the curved lines in our previous plots were set at 3 standard deviations from the mean rather than 2.
We’re mathematical and statistical layman ourselves, so we’ll do our best to explain. Or indeed let wiki do it:“The reported margin of error is typically about twice the standard deviation – the half-width of a 95 percent confidence interval. In science, researchers commonly report the standard deviation of experimental data, and only effects that fall much farther than one standard deviation away from what would have been expected are considered statistically significant – normal random error or variation in the measurements is in this way distinguished from causal variation.”
Basically, for us to be confident there’s some ability or skill above the norm in what we’re measuring, we need to see some data points fall outside the curved lines. Due to some cut n paste/not double checking issues, we had set the plots at thrice the standard deviation rather than twice. Subsequently, some keepers are a lot closer to the curved lines in each plot. A couple even fall outside the curved lines. Perhaps there’s a greater difference between keepers at this level than we first thought.
As a reminder we’ve split shots faced into 4 different Zones. The Zones are determined by grouping areas together that have similar average save %s. This gives us bigger sample sizes while maintaining data quality (there are actually 46 smaller shot Zones that we record):
Whether a keeper saves a shot on target from here is pretty much like flipping a coin. The bold straight line is the league average save %. Everyone bar Michel Vorm is firmly within one standard deviation of the mean (first set of curved lines) which going back to the Wiki definition, means there’s normal variation here and there’s nothing special going on. Vorm’s numbers aren’t ‘statistically signifcant’ but footage of him from shots in this Zone is something we’ll definitely be looking at.
Zone 2 is where we originally found most variation between Premier League keepers. We focused on Wojciech Szczesny for this and felt shot-stopping from here had a lot to do with decision making. Here’s the updated plot:
We see Szczesny here looks statistically significant being further than 2 standard deviations from the mean (second set of curved lines). Ali Al-Habsi too. At the other end of the scale we see Simon Mignolet being perilously close to being significantly better than his peers. Forget about the boxing day gaff and look here for further bits on the Belgian.
In Zone 3 there’s quite a difference between the keepers too. Although we come close with veteran Mark Schwarzer and Amir Begovic, no one actually looks significantly better than the next:
Tim Krul is interesting here. To maximise sample size we have included the wide areas for both sides of the box. However, his performance from shots faced on the left side as we look at it leaves a lot to be desired. If the left side was looked at in isolation, alarm bells would be ringing for Newcastle’s coaching team:
Anyway, onto Zone 4. Many people have pointed out to us that David De Gea’s weakness when he arrived in the Premier League was thought to have been shots from distance. Well, it isn’t. The Spaniard is close to being 2 standard deviations from the mean average here. If he continues in this vein over the next season or two he’ll be looking signifcantly better than his peers. As an aside, next time you watch John Ruddy or Jussi Jaaskelainen, have a look at their aggressive positioning when facing these shots. They often don’t give themselves much reaction time – something De Gea does do by staying close to his line.
Next time out we’ll be re-plotting the top strikers of the last few years to determine who (if anyone) is showing ability significantly above league average.
Follow us here on Twitter.