Update I see that I failed below to demonstrate the ubiquity of the problem. So your homework is to search “testing trend time series” and similar terms and discover for yourself. Any kind of hypothesis test used on a time series counts.
My impetus was in reading an article about a paper some colleagues and I wrote about atmospheric ammonia. The author wrote, “The statistical correlation between hourly ammonia concentrations between measurement stations is weak due to large variability in local agricultural practice and in weather conditions. If data are aggregated to longer time scales, correlations between stations clearly increase due to the removal of noise at the hourly timescale.”
There’s the belief in “noise”, which does not exist, and there’s also the second (bigger) mistake, which is measuring correlation of time series after smoothing, which increases (in absolute value) the correlation (as has been proved here and Uncertainty_ many, many, many times). This happens even for two strings of absolutely unrelated, made-up numbers. Try it yourself.
So you just look for mentions of “noise” in stock prices, and so on and see if I’m right about the scale of the problem.
Two weeks ago the high temperature on the wee island upon which I live was 82F (given my extreme sloth, I am making all details up).
Now for the non-trick question: What was the high temperature experienced by those who went out and about on that day?
If you are a subscriber to the signal+noise form of time series modeling, then your answer might be 78F, or perhaps 85F, or even some other figure altogether. But if you endorse the signal form of time series modeling, you will say 82F.
Switch examples. Three days back, the price of the Briggs Empire stock closed at $52 (there is only one share). Query: what was the cost of the stock at the close of the day?
Signal+noise folks might say $42.50, whereas signal people will say $52.
Another example. I was sitting at the radio AM DXing, pulling in a station from Claxton, Georgia, WCLA 1470 AM. The announcer came on and through the heavy static I thought I heard him give the final digit of a phone number as “scquatch”, or perhaps it was “hixsith”.
Here are two questions: (1) What number did I hear? (2) What number did the announcer say?
The signal+noise folks will hear question (1) but give the answer to (2) (they will answer (2) twice), whereas the signal folks will answer (1) with “scquatch or hixsith”, and answer (2) by saying, “Hey signal+noise guys, a little hand here?”
We have three different “time series”: temperature, stock price, radio audio. It should be obvious that everybody experiences the “numbers” or “values” of each of these series as they happen. If it is 82F outside, you feel the 82F and not another number (and don’t give me grief about fictional “heat indexes”); if the price is $52, that is what you will pay; if you hear “scquatch”, that is what you hear. You do not experience some other value to which ignorable noise has been added.
For any time series (and “any” include our three), some thing or things caused each value. A whole host of physical states caused the 82 degrees; the mental and monetary states of a host of individuals caused the $52; a man’s voice plus antenna plus myriad other physical states (ionization of certain layers of the atmosphere, etc.) caused “scquatch” to emerge from the radio’s speakers.
In each case, if we knew—really knew—what these causes were, we would not only know the values, which we already knew because we experienced them, but we could predict with certainty what the coming values would be. Yet this list of causes will really only be available in artificial circumstances, such as simulations.
Of the three examples, there was only one in which there was a true signal hidden by “noise”, where noise is defined as that which is not signal. Temperature and stock price were pure signal. But all three are routinely treated in time series analysis as if they were composed of signal+noise. This mistake is caused by the Deadly Sin of Reification.
No model of any kind is needed for temperature and stock price; yet models are often introduced. You will see, indeed it is vanishingly rare not to see, a graph of temperature or price over-plotted with a model, perhaps a running-mean or some other kind of smoother, like a regression line. Funny thing about these graphs, the values will be fuzzed out or printed in light ink, while the model appears as bold, bright, and thick. The implication is always that the model is reality and values a corrupted form of reality. Whereas the opposite is true.
The radio audio needs a model to guess what the underlying reality was given the observed value. We do not pretend in these models to have identified the causes of the reality (of the values), only that the model is conditionally useful putting probabilities on possible real values. These models are seen as correlational, and nobody is confused. (Actual models, depending on the level of sophistication, may have causal components, but since the number of causes will be great in most applications, these models are still mostly correlational.)
We agreed there will be many causes of temperature and stock price values. One of the causes of temperature is not season—how could the words “autumn” cause a temperature?—though we may condition on season (or date) to help us quantify our uncertainty in values. Season is not a cause, because we know there are causes of season, and that putting “season” (or date) into a model is only a crude proxy for knowledge of these causes.
Given an interest in season, we might display a model which characterizes the average (or some other measure) of uncertainty we might have in temperature values by season (or date), and from this various things might be learned. We could certainly use such a model to predict temperature. We could even say that our 82F was a value so many degrees higher or lower than some seasonal measure. But that will not make the 82F less real.
That 82F was not some “real” seasonal value corrupted by “noise”. It cannot be because season is not a cause: amount of solar insolation, atmospheric moisture content, entrainment of surrounding air, and on and on are causes, but not season.
Meteorologists do attempt a run at causes in their dynamic models, measuring some causes directly and others by proxy and still others by gross parameterization, but these dynamical models do not make the mistake of speaking of signal+noise. They will say the temperature was 82F because of this-and-such. But this will never be because some pure signal was overridden by polluting noise.
The gist is this. We do not need statistical models to tell us what happened, to tell us what values were experienced, because we already know these. Statistical models are almost always nothing but gross parameterization and are thus only useful in making predictions, thus they should only be used to guess the unknown. We certainly do not need them to tell us what happened, and this includes saying whether a “trend” was observed. We need only define “trend” and then just look.
Why carp about this? Because the signal+noise view brings in the Deadly Sin of Reification (especially in stock prices, where everybody is an after-the-fact expert), and that sin leads to the worse sin of over-certainty. And we all know where that leads.
“But, Briggs. What if we measured temperature with error?”
Great question. Then we are in the radio audio case, where we want to guess what the real values were given our observation. There will be uncertainty in these guesses, some plus-or-minus to every supposed value. This uncertainty must always be carried “downstream” in all analyses of the values, though it very often isn’t. Guessing temperatures by proxy is a good example.
I have more on this topic in Uncertainty: The Soul of Modeling, Probability & Statistics.
Categories: Class - Applied Statistics, Statistics
But the statistical model is where the street-light is shining…
Why would so called signal+noise or signal folks make claims as stated in this post? Who are those people? As usual, I don’t get it!
Predicting stock price is difficult because it’s hard to distinguish signal from noise. But, is it really pure signal!? Can anyone know the “true” signal?
Life sure would be so much easier if we knew all causes for everything. My lovely children would respond to such deepty with “No kidding, Mom!”
CONSIDER: “If it is 82F outside, you feel the 82F and not another number (and don’t give me grief about fictional “heat indexes”); if the price is $52, that is what you will pay; if you hear “scquatch”, that is what you hear. You do not experience some other value to which ignorable noise has been added.”
If the above statement used “measure” instead of “experience” it would be ok.
In reality one’s “experience” is highly subjective and skewed, often in ways we do not perceive. One illustration is the now famous cases of backwards messages in music. The satanic message that isn’t there—until one reads the words—is available at the link, below.
Listen to the backwards message without reading the words displayed; you will hear/perceive gibberish. Then, listen while reading the words attributed to the gibberish. After that you cannot not hear the nonexistent backwards message.
So much of our “experience” in so many things is like that (and worse) … though we pretend otherwise without consciously recognizing we are pretending.
Which brings us to:
““But, Briggs. What if we measured temperature with error?” Great question. Then we are in the radio audio case, where we want to guess what the real values were given our observation.”
The failure [or potential failure, depending…] there is failing to recognize that our “observation” itself is very likely distorted (if not outright false). Unless we can reconcile the types of errors (like recognizing in the case of Zepplin’s Stairway to Heaven there is no backwards message really there no matter how clearly we hear/perceive it) we will perpetuate whatever delusions we apply with a smug conceit our personal “observation” is something that it isn’t — objective.
Listen first, then, listen and read, then listen again: http://www.youtube.com/watch?v=IXpEtF4i1oI
A For Your Information, in case you were not aware of it. The thermodynamic temperature is the geometric mean of a defined sample of matter’s internal kinetic energy and *only* its kinetic energy. Why? This follows from the kinetic theory of [collections of] matter. Take 32 grams of molecular oxygen. There are physical and chemical properties extrinsic to this sample and there are physical and chemical properties intrinsic to the sample. By definition, if the sample is pure oxygen 16, we have one mole of oxygen molecules, which in turn, is by definition 6.022…………. x 10 to the 23rd power of them, also called Avogadro’s number of molecules. That’s a large number, so we cannot know given the limits of our measuring abilities’ resolution, accuracy and precision; the precise momentum each molecule contains, its mass, its velocity, its translational vector of motion, how fast it is spinning about some axis of rotation, and more properties needed to know the kinetic energy possessed by one molecule, let alone all of them. So we can only guess and make statistical summary statements about the collection. We know, for instance, that all of these molecules possess kinetic energy because we (and the collection) exist in conditions where the thermodynamic temperatures are well above absolute zero (either Rankine or Kelvin). Heat, classically speaking, thus is a statement about kinetic energy. Heat is not the same thing as light. You can transform light into heat and you can transform heat into light; but don’t conflate them.
Quoting Cmdr. Briggs:
“But, Briggs. What if we measured temperature with error?”
Great question. Then we are in the radio audio case, where we want to guess what the real values were given our observation. There will be uncertainty in these guesses, some plus-or-minus to every supposed value. This uncertainty must always be carried “downstream” in all analyses of the values, though it very often isn’t. ”
For example, all surface temperature measurements before the digital era are recorded as whole degrees (F in the US) — yet in reality are RANGES….Briggs’ 82° really being any value from 81.5° to 82.5° (one or the other of the exactly .5s being excluded). thus ALL calculations done with these temperature records must treat the values as 1°-wide ranges, not discrete values (which they never were).
That’s a good point, in some contexts. But if you include the rounding rule as part of the measurement process, then the measurement is still without error, and we are in the signal and not signal+noise situation.
Cmdr. Briggs ==> quite right — not a signal+noise — but neither is the signal a discrete value.
Some in CliSci, well, almost all really, ignore the fact that the record of the signal is a record of ranges, simply denoted with their central value (a whole degree). If the +/- .5° is ever mentioned, it is mentioned as if it consists of error that will “average out”.
A great deal of fuss is subsequently made when, with the matter of the ranges ignored, long-term averages (often, averages of averages of averages) show changes, up or down, all well within the already ignored +/- .5°, one degree-wide range.
Interesting that you start with an example featuring Claxton, Georgia. As you know, Claxton is the Fruitcake Capitol of the World. This little south Georgia town is known far and wide for its fruitcakes, sold primarily in the Christmas season.
In contrast to California fruitcakes Claxton fruitcakes are almost edible, making perfect Christmas gifts for those unappreciative friends, or family. They can be regifted in perpetuity.
In this case a season is a cause. Indeed, the Christmas season is the cause of lots of heartaches and bellyaches.
Fruitcake Capital of the World