This Month In Analytics
Published: 27th March 2025
– Updated: 27th March 2025
Baseball analysis and the Sapir-Whorf hypothesis. Arsenal’s results defying expected goal models. And the future of analytics research. Today on the blog Richard Whittall takes a look across the last month in the world of sports analytics.
Perhaps one of the best and most important articles in the analytics world this past month came from a sabermetric enthusiast, Ken Arneson. His Ten Things I Believe About Baseball Without Evidence includes some very *ahem* inside baseball concepts about pitch quality and decision making at the plate, but it should be read by anyone claiming to be a serious football analyst, or a serious football bettor.
Arneson's first belief is that sabermetric research is influenced by the Sapir-Whorf hypothesis, the idea that the specific language we speak shapes how we see the world. And the language of most sports data is SQL:
"The predominant technology we use to perform such analysis is SQL, which is the primary language used to query relational databases. SQL and relational databases are technologies which are built upon set theory. A set is basically an unordered collection of objects."
"And this is where I believe that a technological Sapir-Whorf hypothesis applies to baseball."
"Practically all of our analysis of baseball statistics treats its data an unordered collection of baseball events: pitches, plate appearances, games, series. Standard baseball analysis (the public kind anyway, who knows what is being done inside these organisations) treats its data that way because that's the way SQL treats its data. The available technology guides our conceptualisation of the world."
The consequence, Arneson argues, is that baseball analysts will sometimes work with discrete events if they are fundamentally "unordered," when, as in pitching sequences, they are demonstrably anything but.
I think this carries a host of interesting implications for football analysis, and for bettors. We know for example that game states, whether a team is tied, at a +1 GD or -1, affects team behaviour within a single 90 minute match. But are there effects that carry over from game to game? Do teams, as a rule, change behaviour after a two game losing streak? Do teams show immediate adjustments after a home stretch that settles after
Related Posts