Our friends at Brock for Broglio.com give us a look at when sample sizes become relevant.
Here is a link to the article. The article basically analyzes when a sample size becomes relevant for looking at a player’s underlying skills. Of course they are not saying that these numbers are immutable, but only that at some point the numbers pass from being too small to be at all useful to at least giving us an idea of where the underlying skills will come out.
There are a few that are telling. The figure for HR rate is 300 PA. I suspect that there are confounding factors at play here. For example 300 PA may be enough for a player that hits only a few home runs, but for someone like Ryan Howard there will be far more variability even in 300 AB. We can be fairly sure that Juan Pierre’s HR rate will be within a narrow range after 300 ABs. But for a guy like Howard it could be 13% or it could be 25% and we really wouldn’t be sure exactly where it would regress to, if at all.
It is often said that even a full season is not enough to tell us what a player can do. In the case of OBP and SLG it takes 500 plate appearances to even be relevant, so that is farly close for many players to a full season. The old adage apparently carries some truth. The reason is that small samples can have a large effect. To take a simple example, in 500 PA ten hits is 20 points of OBP. The standard deviation is far higher than ten hits you can be sure. A difference of ten hits is less than one per week. So the article really show us that all else being equal you cannot judge OBP or SLG within almost a full season, and this comports with logical analysis.
Within the context that the article presents, namely that the figures are not meant to be true evaluations but only to when the numbers become meaningful, the research appears to comport with logical analysis, and it is not meant to be an in-depth study of the actual figures.
The pitching analysis is even more interesting. Essentially it confirms that for most pitchers a full season will tell us very little. K/BB rate and BB rate do not become relevant at all until 500 batters faced (BF). You won’t know to much more even over 700 BF than what type of contact guys make (GB or LD or FB) and have a general idea about K/BB rate.
It is a good read and is typical of their good work.

1 response so far ↓
1 Brian Joura // Apr 21, 2008 at 10:49 am
David Luciani wrote an article a couple of years ago on contact rate and how he could use spring training stats to adjust his forecasts for some players because most guys got enough ABs in the Spring to make meaningful comparisons. I think he started with an estimation, like player A has a contact rate of 70%. From there he figured out the standard deviation and for each AB threshold there was a range for which the player could still be within the 95% range of his estimation. It’s whenever the player moved outside that 95% range (in either direction) that he would change his forecast.
If Luciani’s premise is correct, it seems like you could come up with forecasts that a player was over/under performing his projection much earlier than what is claimed in this article.
To use an extreme example, we had to know way before 300 plate appearances that Brady Anderson was experiencing a real increase in HR rate in 1996 than his forecast coming into the season.
Leave a Comment