On Goalkeeping Data, Scouting and Evidence Based Coaching

Finally this week I got around to doing something I’d been meaning to do for ages. Pull all the data from games to see what a goalkeeper spends his time doing during a match.

These are the on-ball actions of keepers in the Premier League so far this season: ‘Aerial’, ‘Ball Recovery’, ‘Ball touch’, ‘Challenge’, ‘Claim’, ‘Clearance’, ‘Cross not claimed’, ‘Error’,’Free Kick’, ‘Keeper pick-up’, ‘Keeper Sweeper’, ‘Pass’, ‘Punch’, ‘Save’, ‘Smother’, ‘Take on’

I picked a random Everton game which turned out to be v Man City on New Years Day. I grabbed the data and got the video ready.

The first on-ball event listed for Pickford was a ‘Keeper pick-up’. I roll the video. He actually mishandles a cross, drops it and then gathers safely:

This isn’t pristine data direct from source, it’s swiped. Where’s the ‘Cross not claimed’ event? Does the ‘proper’ actual source data actually have more than ‘Keeper pick-up’?

Next is a little ‘Throw-out’ (a subset of a ‘Pass’ on the data set) to Yerry Mina which all looks fine on the video.

Next come eight passes taking us to 6 minutes 23 seconds in. They’re all pretty standard too, no video needed.

On 10 minutes 12 seconds we have an ‘Aerial’ and ‘Ball touch’. Looks like another mishandled cross to me that gets dropped. Yet still no sign of the ‘Cross not claimed’ event in the data:

Four more passes and there’s over 20 minutes gone.

Not long after, however, we get something interesting on the data. Yay! This is what the ‘Sweeper Keeper’ event looks like, swiftly followed by a ‘Clearance’ that Peter Kay would be proud of:

After seven more passes (including a goal kick), Pickford finally makes a ‘Save’ 28 minutes into the game. He then makes another ‘Save’ here a minute later followed by another ‘Keeper pick-up’:

Four more passes for wor Jordan over the next 6 minutes. And then another ‘Save’ followed by another ‘Keeper pick-up’ in the data. If you can see any ‘Keeper pick-up’ here you’ve got better eyes than me:

At this stage, I’m starting to question the data.

Intrigued, I search further back this season for a ‘Cross not claimed’ for Pickford. That being a cross not caught. According to the data, Pickford hasn’t fumbled one all season. Hmm. Do they even exist?!

I go away and eventually find one. Sorry Bernd Leno. But at least you’ve proved they exist in the data set. This is an ‘Aerial’ and ‘Cross not claimed’:

So I’m getting the ‘Aerial’ thing as a duel with an opponent going for the same ball. But Leno isn’t even trying to catch the ball here. He’s trying to punch it.

In the Pickford vids up above he’s actually trying to catch the thing. And doesn’t. Yet those aren’t labelled as ‘Cross not claimed’. Bizarre.

At this stage, I’m almost ready to throw the data in the bin.

I hadn’t cherry picked this game to highlight this point about bad data, I just stumbled across it. It’s literally the first game I looked at for the real point of this article. Maybe I just picked a bad ‘un and the rest are fine…

Obviously, there are implications here if you want to use the data to compare keepers across the board against various skill sets. Considering the definitions of these same actions by other data and video providers appear to be different (judging by the amount of missing information), the whole thing is a minefield.

The data is supposed to help filter down the number of players scouts need to watch. Clearly, it doesn’t, because to do it properly you have to watch every damn minute again.


The real point of this article was to make a small point about evidence-based coaching. If 75% of the game for a keeper these days is distribution, how much time do coaches spend on it in training?

I recently spoke with Ostersunds FK keeper coach, David Preece, about making sessions more like matches.

“If we do too many game-realistic drills in training, Aly (Keita) doesn’t feel like he’s had enough of a work-out. It’s because it takes time to prepare and set it all up. At the end he’s like: ‘I’ve only made 20 to 30 saves!’ He wants to make more. Finding a balance is difficult.”

As a data guy, I’m not advocating spending 75% of training time on a keeper’s distribution. Goals win games, so making saves is always going to weight more heavily in terms of action importance. Working on the expected save data all these years makes me think saves are even more important (and difficult) than most people think they are.

Head of Goalkeeping at the FA and England, Tim Dittmer, said this:

“Actions from the game represent different ‘ratings’ in terms of impact and importance. This would help guide us to what training looks like and how much we do of certain topics and actions.”

Having personally been a thorough skeptic in the past about how important a keeper being good with the ball at their feet is, I’ve mellowed on it. Especially with the change in law this year with keepers being allowed to play goal-kicks to players inside their own box.

I send Tim a picture of a tweet I sent:

“Funny you should ask. We did a session that looked just like this today.”

I try and pick holes. Was it a stand alone keeper session, or did it include outfield players as well? If it didn’t it seems a bit sub-optimal?

Then comes the touché:

“The session had both keepers and outfield players included and two coaches took it. Your ‘sub-optimal’ comment would suggest that anything other than 11 v 11 on a full-size pitch falls into this category.”

It’s pretty clear that practical obstacles to evidence-based goalkeeper coaching exist even in the professional game at the highest levels. Realistically, you do the best you can, and plan sessions to suit the resources available and keep your goalkeepers happy and on board.

Tim continues:

“Of the 75% stat of the game being distribution, only 20% of these are regarded as ‘under pressure’. Firstly, what should that mean to the amount of meaningful time we spend on this area and secondly, what are the types of cues, triggers and pressure these actions need to be practiced within?”

This is the point to me. Ask questions of what you’re doing and why.

Feedback from other coaches was muted and mixed. Every single one I spoke to refused to even make a rough % guess of what time distribution made up their training routines. It doesn’t really work like that apparently.

I still feel like there’s too many sessions apart from the the rest of the team. Too many sessions not game specific enough because of resources available or because keepers themselves get restless.

As David said, maintaining a balance between what’s right, good, available and acceptable to players makes coaching a difficult business. It needs collaboration across the club. How do we stop making goalkeeping so much of a lonely business?

Maybe everyone’s too busy making fancy looking videos for YouTube that the rest of the Goalkeeper’s Union snipe away at on twitter. Get together and make things better. I’m not even a coach and two of the best national team goalkeeper coaches and coach educators in the world give their time to me….












Posted in Sports | 1 Comment

Why Possession Value Is Bollocks

Expected goals. Expected assists. Expected passes.

How likely is a shot from here to be a goal? How likely will a pass from here to there turn into a goal if the guy receiving the ball shoots? How likely will a pass from here to there be successful?

All models have flaws and these guys are no exception. But they all have one thing in common.

They pretty much work as intended.


Because they are fairly simple by design.

They are fairly simple by design because they are singular actions.

On top of this, 99 times out of 100 we can be sure of a player’s intention when making the play or in the case of expected assists, infer it afterwards because a shot happened regardless of the initial intention.

We know what we are trying to measure here, and its not difficult to do if we have the data.

The trouble is, these models are a bit boring now.

The new guy on the block in Analytics Town is the Possession Value type metric. Our chums at Opta define the idea thusly:

– OptaPro’s Possession Value (PV) framework establishes the probability of a team scoring from an individual possession.

– The framework assigns credit to individual players based on positive and negative contributions, covering key on-the-ball events.

Does it work?

When looking at preliminary results, we noticed a major negative influence on the scores for players who are often involved in attacking plays.

We believe it is crucial to assign blame and/or credit only where it’s due. Therefore, in our framework, the punishment for the loss of value of the possession is capped at 0.025 (the average value of a possession).

That’ll be a no.

I am not against the idea of expanding analytics into more exciting, more complex football actions. But let’s build on solid foundations.

Let’s look at the idea itself again:

– OptaPro’s Possession Value (PV) framework establishes the probability of a team scoring from an individual possession.

Is this fairly simple by design? Well, the actions are no longer singular, multiple actions take place in a possession.

Where does a possession begin and end? Up for debate. Where does the following Everton possession end from the initial kick off v Brighton yesterday?

Is it when Djibril Sidibe’s launch down the touchline gets cut out by Dan Burn? If so, does Alex Iwobi’s touch that finds Bernard start a new possession? Or does no team really have the ball under control here and it’s a possession for neither?

My view is that the whole passage is Everton’s possession until about 18-19 seconds when Brighton (to me) gain full control of the ball back. I’m sure some of you will disagree. I know for sure that the main data suppliers all do.

Immediately we have problems with definitions of what a possession actually is.

– OptaPro’s Possession Value (PV) framework establishes the probability of a team scoring from an individual possession.

Is the intent of every possession to score a goal? If not, then why are we measuring all possessions against it? How many players in the clip are thinking: “If I do this, we’re more likely to score.”?

What’s the actual thought process from the Everton players there? Is it more like: “Ooh fuck, I’m under pressure here, let’s just keep it and if it’s too risky we’ll send it long, so if we lose the ball further up the pitch, there’s less danger”?

If it is then what’s the point of measuring it against the probability of scoring?

At each stage of a possession, the intent in moving the ball is different. It can move from retention, to advancement, to retreat to deliberately giving the ball away in a less dangerous area.

Possessions are a building block to get from A to B to C to D etc on the pitch and then create openings for goal opportunity.

So measure them in separate blocks.

Measure attackers v each other in the final third.

Measure centre backs on ball advancement to the next thirds of the pitch.

Measure midfielders on how brave they are in not going backwards all the time to alleviate pressure.

A goal may be the ultimate team goal, but it is not the ultimate goal of every player. Stop creating models that pretend it is. Possessing one has no value.





Posted in Sports | 2 Comments