A lot of people want to get into the sports analytics industry, but it’s a long row to hoe from a traditional training in statistics to being a productive member of a sporting club. Employment paths for statisticians and data scientists traditionally cover careers like finance, medical research, and marketing. Sports data is different: a lot of it comes from adversarial situations with continual adjustment of environment. A successful path involves a complex network of players, coaches, sports scientists, opponents, plans and counter-plans.
Here’s the type of conversation that I hear between beginner sports statisticians (S) and experienced analysts / coaches (C). We all have to learn about the importance of context.
S: We’ve had a pretty good season, but our pass completion rate is in the bottom 20% of teams. I’ve done a regression and if we just improved that stat by 2% we would be the best team in the league. | ||
C: Let me have a look at that data. Being a good team means that we play less in our defensive half, where it’s easier to complete a pass. If you adjust for that, I bet we look better. | ||
S next day: OK, that made some difference. But when I isolate just passes in our defensive zone, we’re still below average. In midfield we’re well below average for passes that find a target. We have to fix this! | ||
C: But we encourage our players to take risks. As long as they are making good decisions about the type of pass that might lose possession, we come out ahead despite the raw success ratio being low. Have a look at whether our completed midfield passes lead to more attacks. | ||
S next week: it took a while but I filtered down to just our successful midfield passes. We’re still only a touch above average using a metric of goals per chain from a completed midfield pass. | ||
C: Did you correct for expected goals? | ||
S: Huh? | ||
C: We’re getting to a smaller sample if you’re looking at just goals. Get a more reliable measure of attacking quality by looking at the expected number of goals from those opportunities. | ||
S next fortnight: YOU WERE WRONG OLD MAN! I adapted an Expected Goals formula for our data and we get about the number we expected. We MUST complete more passes coming through midfield to set up goals. | ||
C: What did you do with the turnover data? | ||
S: We already know we’re turning over too many passes, stop changing the subject. | ||
C: I mean, what happens to the ball when we don’t complete the pass? It goes into dispute, or the opponent gets clean possession. Have a look at those chains of play. | ||
S mutters under breath | ||
S next month: Hey I’ve got something interesting. Did you know that when we lose the ball passing forward in midfield, our opponents hardly ever score on the counter-attack? Our equity* from those plays is the best in the league. | ||
C: Yeah, makes sense. We’ve designed our offensive structure with men covering the most productive routes out of defence, and we train them to anticipate the turnover. We don’t over-commit to speculative attacks. | ||
S: Why didn’t you just say that two months ago? Oh wait … how do I categorise defensive structures from our crappy tracking data? | ||
C: Now you’re thinking like an analyst, not just a statistician. |