What Do Means Mean?

Update In order that it does not get lost over the weekend, part two of this post will run on Sunday.

This is part one of a two part post, but not called that (because people tend only to read the first parts). Part two is juicier and the one I want people to see—actual logical probability data analysis! But to get there we need this seemingly odd stuff first. “Did Briggs not understand the question?” Yes, he did.

The following is an edited email from reader Kip Hansen (the A and B are mine):

I keep a record of the temperatures each day on my porch, using my handy-dandy Max-Mix thermometer. I write them in my journal. I know that my accuracy cannot exceed 0.5 degrees, because my thermometer is marked only in single degrees, and I estimate the temps up or down to the whole number as I see fit.

I then figure my average temperature for the day, the mathematical mean between the Max and Min. I know that both numbers, Max and Min…

The I create a weekly average…then I create a monthly average…

A Can I claim an accuracy–represented by an error range—greater than that of the original measurement? B Can I claim that through averaging, the errors “average out”, reducing the width of what would be error bars in a graphical representation?

A No. Or maybe yes.

B No.

Thanks for writing, Kip!

Okay, let’s talk. I love these questions because they highlight nicely the difference between classical and logical probability and statistics. Foremost is the change in language: if you grew up with classical definitions, they are hard to escape. I mean, when words and phrases are used in ways not consonant with classical language the very natural tendency is to interpret the new words and phrases as if they belonged to the classical language. Be on guard for this.

Logical probability answer to A

Because it doesn’t alter the spirit, and because whole numbers are easier to work with, suppose your thermometer measured only to the nearest degree Fahrenheit. For the same reason, suppose you only measured Max temperatures, and were interested in means of the Maxes. This is (you’ll see) an equivalent problem, but it starts easier and carries less verbal baggage (it can get confusing talking of means of means).

If you took observations for just one day, the mean of the Max would be whatever the observation was. Given this observation as fixed (this clause crucial), you know the mean with certainty, and to infinite precision. That does not imply that you know the temperature to infinite precision: that you only know +/- 1, given the assumption that the error bound is strict (we trust implicitly the manufacturer).

The mean is a function. Assuming—or given: it’s always given “something”!—no errors in calculation, you will always know what the mean was exactly, and to infinite precision. Assume first three days—leaving off the oF, which is ever present!—saw 70, 71, 71, which makes the mean precisely 70.66666…etc., with so many sixes, only God can see the end of them.

This accuracy of the observed mean is thus infinitely greater than the accuracy of the instrument. So one answer to A is yes.

The proof of this was backward, and relied on hiding the assumption of potential measurement error. If there were no measurement error, then the precision of the measurement would match that of the mean. It’s only because of measurement error that the precision of the mean is greater. And I cheated (with the initial “Given this observation…”), because there’s something else hidden, too, which is important, but not to the conclusion about precision.

The measurements were not strictly 70, 71, and 71, but 70 +/- 1, 71 +/- 1, and 71 +/- 1. Now what does “70 +/- 1” mean? One interpretation is that the instrument could only register 69, 70, or 71 when the temperature really was 70, or would be 70 by rounding. Assume this. We’re still stuck because now we need to know something of the characteristics of the instrument. Does it register 70 preferentially? Or any of the three numbers equally? Since we have no other information, assume the latter.

So the mean after just the first day could have been 69 with probability 1/3, 70 with probability 1/3, and 71 with probability 1/3. That’s the answer: there’s no longer a unique number (which only comes by using whatever registered on the instrument as the basis for calculation). After the second day, the number of possibilities for the mean increases: could be (69+70)/2, (69+71)/2, (69+72)/2, and so on for the other six, all with 1/9 probability (of course, some of these 9 are duplicates with correspondingly higher probability). Again, no unique answer. In that sense, the precision is less!

But—the big but!—the probabilities are still known with higher precision. And did you notice that the problem is discrete? We’re talking quantum probability here! (As we almost always should be.) I hope you can see how this applies to the original question: the day’s mean is (Min + Max)/2, which is discovered as above.

Next: what about monthly means? How do we analyze this in practice?


  1. DAV

    An old meanie once told me the means are determined by the ends which sounds a lot like (Min + Max)/2.

  2. Ray

    I’m going to be mean and nit pick. Half way between the maximum and minimum temperature is the median temperature, not the average. To obtain the average you have to integrate the temperature with respect to time or you can approximate the integration like you learned in numerical analysis.

  3. Briggs


    Not quite a nit pick. I changed the problem, as I said. We’re trying to characterize the uncertainty in have in the average (mean) Max temperature in the presence of measurement error. It’s the measurement error which makes the average (mean) uncertain.

    We know what the numerical average of the observed data is, true. But we do not know with certainty, because of the measurement error, what the actual mean was. That’s our goal.

    Measurement error, measurement error, measurement error, measurement error.

  4. Nullius in Verba

    Mean is the sum of values divided by the number of values. Median is the value half-way down the sorted list, or midway between the two choices if there are an even number. (min+max)/2 matches either if there are two elements, but neither if there are more. You would have to invent a new term (like the “mid-range”) for it, if you wanted to use the same formula with a bigger sample.

    The article is confusing about whether it is the mean of the observed values, or the mean of the true values that is being asked for.

    The theory is that there is a true value of the temperature at each measurement time, with some joint distribution over time, and then there are limited-accuracy measurements of the true temperature made, the observed temperatures, with an error distribution that is some function of the true values. The question being asked is what does the mean of the observed values tell you about the mean of the true values (what the distribution of their difference is), and the answer is, it depends on the measurement error distribution.

    A common approximation is that the true values are uniformly distributed over some large range, and the measurement rounds the value to the nearest whole number, meaning the measurement error distribution for every measurement is uniform on the interval (-1/2, 1/2). They are also often assumed to be independent. Given this approximation, you can calculate the distribution of the difference in means (observed mean minus true mean) and you get a uniform distribution for 1 sample, a triangle distribution for two samples, and a series of ever-finer polygonal distributions approaching a Gaussian for higher numbers of samples. With these assumptions, the spread of the observed-minus-true-mean distribution gets ever narrower, its mean is always exactly zero.

    But these assumptions are only approximate. If the distribution of the true temperatures is not uniform, neither will be the individual measurement errors. There are psychological effects around the boundary – it can depend on the height of the observer, as to where their eye line is when judging which line is closer. It depends on how often you buy a new thermometer, or recruit a new observer. It depends on how accurately the thermometer is calibrated, and whether it ‘ages’ and drifts over time. It can depend on the observer’s mood, or whether a higher temperature would constitute a record. It is literally a question of whether they are a “glass-half-full” sort of person.

    So in general, the individual measurement errors are not quite uniform, their individual means not quite zero, and the different measurements not quite independent. The result is that the error in the true mean shrinks more-or-less as expected for a while, but then comes up against these barriers and shrinks no more. Then adding more samples does *not* improve the accuracy.

    And indeed, it is easy to postulate circumstances where this applies right from the start. Suppose for example that the true temperature every day was always exactly the same. It never changed. Then you would always round in the same direction, and the error in the mean would be fixed, and would never shrink.

    In the absence of any information about the value and error distributions, you cannot rule such a situation out, and so you cannot claim that the accuracy will be better or that the error bars will shrink. If the measurement errors are independent, then the spread will shrink. If the mean of the measurement errors is also zero, this will improve the accuracy. But in practice it’s virtually certain that these assumptions are not exactly true.

    But this is referring to the distribution of the difference of observed and true values. The mean of the observed values is always (post-experiment) a simple number, known exactly. The mean of the true values has its own distribution that shrinks or not depending on whether true temperatures day to day are independent, but there’s no sense in which these are “more accurate”, they are what they are, and they’re not what the questioner was asking. The difference between them is what the question is about.

  5. Scotian

    Temperature is a bad example to use Briggs since it is an intensive parameter and thus the average is meaningless. There is no physically justifiable way of doing the calculation. Why not a geometric average over the arithmetic or a correction for heat capacity? You are averaging readings on a thermometer and pretending that it tells you something useful. It doesn’t. Yes I know that the climatologists are hoping that the temperature like result tells them something about trends, but I have my doubts. Sure I know what you are really trying to do, but I just think this is a bad example. Try height instead since it also varies throughout the day as well as from person to person and the average would mean something in that case.

  6. DAV

    “Temperature is a bad example to use Briggs since it is an intensive parameter and thus the average is meaningless.”

    So, saying humans have an average body temperature of 98.6F is meaningless?

  7. 98.6 for a human isn’t totally insane. You are dealing with what is effectively water. Air temp on the other hand is pretty meaningless. The specific heat for the body is fairly well bounded. The specific heat of air on the other hand…

  8. DAV

    If air temperature is meaningless you gotta wonder why so many are interested in it. It’s in every weather report.

    What seems to be forgotten here is that the post is about averaging measurements and in particular averaging the assumed values of a maximum measurement.

    “The specific heat of air on the other hand…”

    I would think it depends a lot on the density of the air which also seems well bounded.

  9. Scotian

    You are quite right DAV it is, but as a typical temperature as measured by standard methods it is a good guide as to whether you have a fever or not. Scientific terms are often misused, e.g. the constant misstatements that refer to the voltage through an object.

  10. DAV

    “it is a good guide as to whether you have a fever or not”

    Which implies that the amount of meaning depends upon context. It’s may also handy to know what the average (or expected, if you like) daily temperature will be in mid-July. knowing that it may be closer to 80F than 32F can be carry meaning.

  11. DAV

    Broken English provided at no additional cost.

  12. Scotian

    No one said that air temperature is meaningless DAV. The problem occurs when you try to calculate an average. Nobody has forgotten anything – reread my post. The specific heat of air is a strong function of humidity which is not well bounded although the absolute value is low.

  13. DAV


    No one is calculating the average temperature of the air. What is being asked (and answered) concerns the average of measurements. Not the same thing.

    “No one said that air temperature is meaningless”

    Actually, someone did.

  14. Scotian

    To quote myself, DAV: “Temperature is a bad example to use…”. This is my main point and to say that any numbers can be averaged misses the point. Feynman had a similar objection when he was reviewing public school textbooks that asked students to average the temperatures of stars. It is important to use proper examples that don’t mislead.

    He said “pretty meaningless”. Context is everything.

    Every time I write DAV I think of HAL in the movie 2001: A Space Odyssey.

  15. DAV

    Only bad if you assume the air temperature is what is being averaged otherwise it’s totally irrelevant that measurements are of temperature.

    HAL is my older brother. If you must know, DAV are my initials and the beginning of my first name — apparently, not accidentally.

  16. Scotian

    I have no idea what you are trying to say. One of us is not being clear, DAV.

  17. DAV

    The subject appears to be dealing with errors in an instrument and not what the instrument is measuring.

  18. Scotian

    If that is all you are saying, DAV, then you should go back and read my original post. When you do you will see that I have already said exactly that, but encouraged Briggs to use a better example.

    Briggs, I am beginning to see the logic of your preferred method of explaining to exhaustion. But I will never change and will continue to insist that people need to carefully read, and if necessary reread, my comment posts to understand exactly what I am saying. It is clear.

  19. DAV

    Form what I think is your first post: Temperature is a bad example to use Briggs since it is an intensive parameter and thus the average is meaningless. There is no physically justifiable way of doing the calculation.

    Lead me to believe you misunderstood the subject. Temperatures are not being averaged. Temperature readings are.The subject is dealing with the readings and their precision.

  20. The 98.6°F is one of those statistical artifacts.

    When originally measured from a sample population on the Celsius scale to the closest degree, the data was arithmetically averaged to the closest degree — 37°C.

    I leave the conversion of that number to Fahrenheit as an exercise to those with short-term memories.


  21. Briggs


    Warnings often go unheeded. Maybe larger government is a good thing. On the off chance it will be read, here it is again:

    Foremost is the change in language: if you grew up with classical definitions, they are hard to escape. I mean, when words and phrases are used in ways not consonant with classical language the very natural tendency is to interpret the new words and phrases as if they belonged to the classical language. Be on guard for this.

  22. JH

    Briggs, don’t philosophical discussions need to have correct definitions of terms? Knowing the definition of precision in this case will help show that you do understand what you are criticizing.

  23. BioBob

    Should you be tempted to imagine that you are actually comparing many daily air temps, consider that each day’s temperature SET is NOT drawn from the same population as ANY OTHER day’s temperature SET and therefore supposedly only non-parametric stats may be employed.

    The probability that any particular day’s temperature population is the same as another is very low, depending on the precision of your measurements.

    Fun and games with chaos !

  24. Kip Hansen

    It is my stupid question in the first place.

    Here’s what I have discovered so far If I had wanted a practical everyday language sort of answer, I should have asked a “practician”, not a statistician. Nullius in Verba seemed to be honing in on answering my lamentably pedestrian question, but somehow ended up losing me.

    I understand that if I measure the temperature on my porch simultaneously with 12 different thermometers, each measurement with an assumed random error of + 0.5 or – 0.5, then when I average these results, I will stand a good chance of having an answer that is closer to the true temperature (with a narrower range of error) than any one of the individual measurements at that instant.

    Major Briggs? Someone, anyone, please? Yes or no (without too many well….maybe’s 🙂

    Now, if you had asked me the following: So, given that, if I measure the temperature at noon every day this week, and average the 7 numbers, do I arrive at a number (result) that is closer to the true average temperature (with a narrower range of error) than any one of the individual day measurements? I would have said no, you can not get a meaningful practical result that is any closer than the original error measurement which was X+0.5 to X-0.5–a one degree spread.

    Any takers? (or am I getting sillier and sillier?)

    I have learned though that I must narrow the definition of the “average” to be something more along the lines of the “average of the noontime temperatures as observed on Kip’s porch the second week of October 2013 with a certain uncalibrated “CorrectTemp” thermometer scaled at single degrees Fahrenheit…..” (the details might go on for a while).


    Kip Hansen

  25. Briggs


    See part two. Assume instead of “30 days” (the words I used) there were “30 thermometers”, and the results are exactly the same.

    Of course, for ease of explanation I used +/0 1, but you can see how it works for +/1 0.5.

  26. Kip Hansen

    Nullis — R U Out There? Ground Control to Major Nullis….

    I think I’m getting somewhere. Major Briggs (our erstwhile Space Oddity statistician — h/t David Bowie) says “After the second day, the number of possibilities for the mean increases: could be (69+70)/2, (69+71)/2, (69+72)/2, and so on for the other six, all with 1/9 probability (of course, some of these 9 are duplicates with correspondingly higher probability). Again, no unique answer. In that sense, the precision is less!”

    There is no unique answer to the question of mean given measurement error — but there are precise probabilities about what the mean might have been.

    After ONE day, given observed “70”, measurement error +/- 1, the mean is one of the following, equally probable 69, 70, or 71. To correctly display this on an x-y axis graph, x=time, y=degrees F, we would draw a vertical line at x=today from 69 to 71 (OK, now I’m cheating — strictly, given our givens, we would put dots at 69, 70 and 71).

    Have I got this right so far?

  27. Briggs


    Well, to display for one day you’d have spikes at 69, 70, & 71 (x-axis) all at 1/3 probability (y-axis).

    Or if you want +/- 0.5, then 69.5, 70, 70.5, etc. See also the comments I made to JH in part two. Other assumptions about measurement error characteristics are possible.

  28. Kip Hansen

    Major Briggs — “Assume instead of “30 days” — it is my cherished understanding (sympathy, please) that I can only get rid of that pesky +/- 1 measurement error by averaging when making multiple measurements of exactly one same thing at one point in time. This then implies that each daily measurement is distinctly different, in time certainly, and can have no effect on the next days measurement, including a reduction of measurement error through division or probability.

    My ONE day mean is certainly 69, or 70, or 71, equally probable. My ONE day actual real world temperature is unknown, but assumed to be in there somewhere.

    The mean of my second day measurement, which was observed to be 67, is certainly 66 or 67 or 68, equally probable.

    The mean of the TWO DAYS (66+69)/2, (66+70)/2, (66+71)/2, (67+69)/2, (67+70)/2, (67+71)/2 (68+69)/2, (68+70)/2, (68+71)/2, all with 1/9 probability (of course, some of these 9 are duplicates with correspondingly higher probability) — 67.5, 68, 68.5, 68, 68.5, 69, 69, 69, 69.5. This gives us equal probability spread of 68 or 68.5 or 69, with 1/2 as much probability for each of 67.5 or 69.5.

    This gives me a mean that is 2 whole degrees wide at equal probabilities, which is the same as the ONE day spread. (Do I dare ! ?)

    Now, if I slavishly go ahead and do the fourth day, using daily observations with a 5 degree spread over those days for reality, will I find the SAME 2 degree spread of mean with equal probability?

  29. Kip Hansen

    Briggs — If I were the professor and you were the student, I’d make you do at least the third day…so give me an hour or so and I’ll get back with a three day example and answer my own question, I think.

    I predict that there might be some simple twist of fate (long division) that will make my answer different.

  30. Briggs


    Have you seen part two yet?

  31. Kip Hansen

    Correction — for 2 days that gives me 2 equally probable points with a 1 degree spread — narrower than 2 degrees. (Note to students: If your answer is what you wanted to see, double check yourself!)

  32. Kip Hansen

    Alas, now with three days, we find we have 3 points with equal probability, each of 2/9–>69.3s, 69.6s, and 70. There are also two points (69 and 70.3s) each with a 1/9th probability, and two more (68.6s and 70.6s) at 1/27th.

    So, our two edges are still 2 degrees apart, and our area of equal probability is is still 2/3 of the whole width. The difference is that the numerical difference between the lowest and highest figure of the equal probability area (which I suppose has a special name) is NARROWER — now only 0.666s (whereas at 2 days, it was 1.0 degree wide).

    Am I getting anywhere?

  33. Kip Hansen

    Major — I am trying to wrap my head around the introduction first — Part Two warns me of dire consequences if I fail to do so….

    I admit to have thrashed my way through Dicing With Death twice so far, and can apply it generally to large scale ideas, but I’m having trouble with this one!

  34. Kip Hansen

    Between now and tomorrow, I am going to hand-compute the values for four days and see if such punishment enlightens me….I have read Part Two, couldn’t resist even if it meant a ruler to the knuckles, but what I am hoping to get for myself is an answer I could take into the engineering world with confidence.

Leave a Reply

Your email address will not be published. Required fields are marked *