*We continue our Thanksgiving week of pleasant stories, today with a guest post by Jim Fedako.*

“When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it means just what I choose it to mean — neither more nor less.’” (Alice in Wonderland)

Ah, the Sports Illustrated (SI) Curse.

Billy Batter has an almost unbroken string of hits late one August. The editors at SI, taking a short break from opining on all things woke, notice his performance and chose him as their next cover story. While stats and story seem to show Billy soaring to epic heights, his hitting slumps, relatively, as soon as the magazine hits the stands.

Fans and commentators alike blame souring slugging on the Curse, the one that seemingly takes down every athlete who graces an SI cover. Some say the Curse is real, others that it has a psychological explanation. The more scientifically-minded observers claim, instead, statistics has the explanation – the regression to the mean. This assumes an existential mean with a gravitational power that pulls deviant series of events back toward that persistent mean.

Select a sample of data. If the sample mean strays far from the actual mean, subsequent samples will likely be be closer to, or less than, the actual mean. So we say the subsequent samples regressed to the mean. This pseudo force is regularly observed in situations where a mean is known or can be closely calculated – series of dice rolls on a craps table, for example. However, means in a sporting context are something completely different.

If asked to defend the regression to the mean explanation of the SI Curse, Humpty Dumpty would likely say, “When I use the word ‘mean,’ it means just what I choose it to mean – neither more nor less.” And that’s a lot of means.

However, what is the mean the regression spies in the distance, pulling athletic performance its way? Is it a never changing, existential attribute of every major league hitter? Sort of like a video game where each player has a halo that blinks their lifetime mean. Or is it a loose concept understood in its looseness – “The mean is simply the expected performance of the hitter based on some recent period of at-bats.” In this, we know what the speaker is saying, or trying to say. But this is far from a rigorous definition of a mean.

Consider an example: Sammy Slugger recently graduated high school with a batting average of .350 during his senior year. He went straight to the minors, struggling his first few months to achieve .245. Was this reduction in hitting prowess a regression to his mean? Or was it the affect of a young professional struggling against other desperate pros? If that is true, and it’s likely, what is his existential mean – the lifetime mean inferred but never defined?

Now, assume Sammy works hard and reaches .290 by the end of the season. Was this a regression back toward his senior year mean, or lifetime mean, or some other mean? What is to be said when his batting average continues improving during his second year in the minors, only to dip after being called up to the majors?

Regressions seem to be going in all directions, searching for a mean that appears to be elusive and ever changing. Means in a sporting context are asserted to be both existential and persistent – an attribute assigned at birth, one that accounts for all at-bats from dust to dust, so to speak. So we would never know a specific player’s mean, nor be able to identify a regression toward it, until a player draws his last breath. Unless, of course, all deviations from recent averages are assumed waiting to be regressed.

But this is neither satisfying nor strict.

The regression to the mean explanation of the Sports illustrated Curse is nothing but a tautology. It is neither real nor distinct. Oh, sure, it’s a concept, but that’s it. Your only challenge is to say, “The mean, and it’s associated regressions, mean what I mean them to mean – neither more nor less.”

Note: None of the above is a refutation of the observed exceptional week followed by a more standard one. It is just calling into question the inexact language used as if it’s a rigorous definition. The use of the regression to the mean in terms of the Curse is analogous to a colloquial use of probability. Both may convey meaning, but both are not technically correct.

*Subscribe or donate to support this site and its wholly independent host using credit card click here*. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.

Agreed.

In other words, regression to the mean is not “an explanation” – i.e. it does not describe a cause of differences.

Because it assumes that variation is random, when in real life there is (strictly) no randomness, and most variation is systemic – due to real differences in causes that affect outcomes.

In a sense, regression to the mean is just an artefact of biased sampling procedures. In the best real science, RTTM is not a problem, because real science does not use statistical sampling to infer causality.

https://iqpersonalitygenius.blogspot.com/2017/12/regression-to-mean-and-iq.html

Very nice!

Now, how about helping me understand something? (I know.. 12 labors and a rock etc); however..

Many governments run lotteries. One of these, in Canada, is called ‘6/49″ – details here:

https://www.lottodatabase.com/lotto-database/canadian-lotteries/lotto-649/details

Basically you pick 6 positive integers under 50 and win cash if the numbers drawn match three or more of your choices. The website cited above includes a basic freq analysis suggesting that the numbers 1 11 13 14 18 28 have been drawn less often than the others. (eyeball estimate: 480 times for these vs 500 for the rest).

Since each number has an equal chance of being chosen why wouldn’t buying tickets using only these low frequency numbers pay off? i.e. what’s the causal link to regression to the mean here?

Of course we can’t know the mean of an entire career until the career is over. I always took “regression to the mean” to mean the mean up to that time in that season. So if Sammy’s batting .350 so far during his senior season, then he has a better week, the mean he’d be regressing to after that would be .350. Whatever. I don’t even watch baseball.

Fedako, dude, that is one

meanpost.what is the difference to 100% for ‘unemployment claims unexpectedly plunge’, in words? and sufficiently precise?

The average (mean) isn’t real. It’s imaginary. You can think of it, and calculate it, but you can’t see, hear, touch or taste it. The average never actually occurs in reality.

Suppose Billy Batter is hitting .333. He cannot hit his average in his next at bat, because you can’t get 1/3 of a hit. Billy might get a hit, a whole hit, or strike out. In either case his average will change. He may be regressing towards something, or progressing, but not to his average, which he can never achieve, not in any one at bat.