Update In order that it does not get lost over the weekend, part two of this post will run on Sunday.
This is part one of a two part post, but not called that (because people tend only to read the first parts). Part two is juicier and the one I want people to see—actual logical probability data analysis! But to get there we need this seemingly odd stuff first. “Did Briggs not understand the question?” Yes, he did.
The following is an edited email from reader Kip Hansen (the A and B are mine):
I keep a record of the temperatures each day on my porch, using my handy-dandy Max-Mix thermometer. I write them in my journal. I know that my accuracy cannot exceed 0.5 degrees, because my thermometer is marked only in single degrees, and I estimate the temps up or down to the whole number as I see fit.
I then figure my average temperature for the day, the mathematical mean between the Max and Min. I know that both numbers, Max and Min…
The I create a weekly average…then I create a monthly average…
A Can I claim an accuracy–represented by an error range—greater than that of the original measurement? B Can I claim that through averaging, the errors “average out”, reducing the width of what would be error bars in a graphical representation?
A No. Or maybe yes.
Thanks for writing, Kip!
Okay, let’s talk. I love these questions because they highlight nicely the difference between classical and logical probability and statistics. Foremost is the change in language: if you grew up with classical definitions, they are hard to escape. I mean, when words and phrases are used in ways not consonant with classical language the very natural tendency is to interpret the new words and phrases as if they belonged to the classical language. Be on guard for this.
Logical probability answer to A
Because it doesn’t alter the spirit, and because whole numbers are easier to work with, suppose your thermometer measured only to the nearest degree Fahrenheit. For the same reason, suppose you only measured Max temperatures, and were interested in means of the Maxes. This is (you’ll see) an equivalent problem, but it starts easier and carries less verbal baggage (it can get confusing talking of means of means).
If you took observations for just one day, the mean of the Max would be whatever the observation was. Given this observation as fixed (this clause crucial), you know the mean with certainty, and to infinite precision. That does not imply that you know the temperature to infinite precision: that you only know +/- 1, given the assumption that the error bound is strict (we trust implicitly the manufacturer).
The mean is a function. Assuming—or given: it’s always given “something”!—no errors in calculation, you will always know what the mean was exactly, and to infinite precision. Assume first three days—leaving off the oF, which is ever present!—saw 70, 71, 71, which makes the mean precisely 70.66666…etc., with so many sixes, only God can see the end of them.
This accuracy of the observed mean is thus infinitely greater than the accuracy of the instrument. So one answer to A is yes.
The proof of this was backward, and relied on hiding the assumption of potential measurement error. If there were no measurement error, then the precision of the measurement would match that of the mean. It’s only because of measurement error that the precision of the mean is greater. And I cheated (with the initial “Given this observation…”), because there’s something else hidden, too, which is important, but not to the conclusion about precision.
The measurements were not strictly 70, 71, and 71, but 70 +/- 1, 71 +/- 1, and 71 +/- 1. Now what does “70 +/- 1” mean? One interpretation is that the instrument could only register 69, 70, or 71 when the temperature really was 70, or would be 70 by rounding. Assume this. We’re still stuck because now we need to know something of the characteristics of the instrument. Does it register 70 preferentially? Or any of the three numbers equally? Since we have no other information, assume the latter.
So the mean after just the first day could have been 69 with probability 1/3, 70 with probability 1/3, and 71 with probability 1/3. That’s the answer: there’s no longer a unique number (which only comes by using whatever registered on the instrument as the basis for calculation). After the second day, the number of possibilities for the mean increases: could be (69+70)/2, (69+71)/2, (69+72)/2, and so on for the other six, all with 1/9 probability (of course, some of these 9 are duplicates with correspondingly higher probability). Again, no unique answer. In that sense, the precision is less!
But—the big but!—the probabilities are still known with higher precision. And did you notice that the problem is discrete? We’re talking quantum probability here! (As we almost always should be.) I hope you can see how this applies to the original question: the day’s mean is (Min + Max)/2, which is discovered as above.
Next: what about monthly means? How do we analyze this in practice?