We are now in the fifth year of the so-called “Statcast Era” (which is still the most misleading way to say “since 2015” in professional sports), and the use of the new data is becoming ubiquitous. We use it often here at Views, citing now-familiar metrics like exit velocity, spin rate, barrels to help paint a complete picture of a player’s profile. Despite being mocked fairly regularly, much of the Statcast data is useful for analysis. It really is.
However, it’s important to not get complacent in our use of data–and Jonathan Judge of Baseball Prospectus ably reminded everyone last week of Statcast’s limitations. The study is free to read even if you don’t have a Prospectus subscription (you should, they’re excellent), and I recommend you check it out for yourself. I did want to pull out a few top-level highlights, though, because I think it’s an important reminder that even the most neutral sounding data points are often biased in hidden ways.
- Park factors matter for exit velocity, too: This might seem counterintuitive (a ball is hit hard or it isn’t, right?), but the overall point of this study, and its accompanying series, is to determine the ways in which different stadiums change exit velocity/launch angle data. Elevation and other externalities (such as whether a stadium houses their baseballs in a humidor) may change the way balls are hit, and the reality is that, as Judge puts it, “different stadiums may have different Statcast installations in different orientations in varying states of operation.” It’s a significant challenge, and while the league has made improvements in this regard, it’s an important factor to keep in the back of your mind as you digest the slew of Statcast data thrown at you each day in broadcasts and online.
- Even seemingly-small park factors make a big difference: The study found variance of 0.5 mph in either direction per park to be common, and even found that, for individual players, the variance can be as high as 2 mph. That can equal 10 extra feet of distance. There is a lot of noise over the course of a full MLB season–players don’t play in just one stadium, of course–but this is worth keeping in mind.
- Not all batted balls are created equally: This is, I think, the most important reminder from the study. Given the amount of batted ball events that occur each season, it’s inevitable that some will not be properly recorded for one reason or another. When insufficient data is recorded, MLB’s “no nulls” policy means that an MLB statistician estimates the exit velocity/launch angle based on similar plays. It does not, however, denote which plays are manually-inputed and which are recorded. As you can imagine, that is a problem–with no way to tell what is real and what is not, it creates a real analytical problem for the public.
Check out the full study for a more in-depth (and, to my delight, significantly wonkier) explanation of the finding, and stay tuned for updates in the series. It’s worth reading yourself. Also, as a brief aside, the introduction of new park elements to Statcast is yet another way to peel back an additional layer of analysis to describe what we see on the field. Exciting!
I quite like the Statcast revolution, and I find the data to be useful from an analytical perspective. As a Yankee fan, you should too–good batted ball profiles are how they found Luke Voit and DJ LeMahieu, after all. But it’s important to always recognize the limitations of the data we use here at 314ft.
This new series at Baseball Prospectus is a good reminder that even though Statcast data is useful and informative, it does not tell us everything.