Class 76: That’s Not Noise. That’s Signal!

Class 76: That’s Not Noise. That’s Signal!

The signal+noise notion of time series is usually wrong. Most time series are pure signal, no noise. And since they are accidental series, models of them are correlational not causal.

Video

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Find your own time series with Deadly Sin of reification

Lecture

I still remember my old grandpa adjusting the tinfoil on the rabbit ear antenna in the basement of his Dearborn house, in order to better tune in the Red Wings. I don’t know if they tinfoil helped or hindered, but he swore it worked. And the picture did often come in. Other times it looked like this:

Incidentally, it’s hard today to find pictures of old analog TV poor reception. I found this wandering around on the interwebs, and don’t know whether it’s genuine. It’s close enough for us.

There is a signal behind the noise. There is meant to be a man standing behind an object (a car?). We see that, but we also see noise on top. Noise is something added to the signal. It is extra to the signal. And it is different than the signal.

This kind of situation in which there is signal + noise is common. It happens for instance when you are at the edge of cellphone reception and the signal breaks. But it’s not as common as implied by a great amount of formal work in time series. Because for many accidental time series, there is no noise, only signal. But people treat the series as if there is both.

This leads to error and over-certainty. And is in fact yet another instance of the Deadly Sin of Reification.

Let’s first prove that (modifying an old article on wrote on this). We begin with two time series of ice extent at either end of the world, as claimed by NASA:

We aren’t here interested in any criticisms of these numbers; we’ll take them as they come, and today will assume they are immaculate and error-free.

The numbers are in blue (by “deviation” they mean difference from some constant). The Deadly Sin of Reification come in two vivid colors, green and red. The green and the red are there because somebody thought the blue numbers were signal+noise. Which they are not. They are signal alone.

The temptation is to suppose, which many do, that the green or red lines are the “true” data, and that the blue are “random” deviations from this truth. This is not so. Yet the eyes are instantly drawn to red and green. They become real to us. We can’t help but interpret that real numbers through the lens of the Deadly Sin. We begin to tell stories about the data informed by the red and green.

Those red and green become causes. That’s the worst part. It is now difficult to free yourself from the idea that a linear cause has been applied, one that gives equal force to every point in time. Such a cause may even exist, but it can’t be see in the data. It has to be inferred. Worst, those green lines are highly dependent on the starting and stopping points, which must then mean the implied cause changes by date.

The effects of this and why people do this we’ll discuss below. For now, let’s think about those blue lines.

Two weeks ago the high temperature on the remote archipelago on which I live was 22F, in civilized units.

Now for the non-trick question: What was the high temperature experienced by those people and objects who were out on that day?

If you are a subscriber to the signal+noise form of time series modeling, then your answer might be 18F, or perhaps 25F, or even some other figure altogether. But if you endorse the signal form of time series modeling, you will say 22F.

Another example. Three days back, the price of the IBM common stock closed last Friday at \$303.20. Query: what was the cost of the stock at the close of the day?

Signal+noise folks might say \$312.50, or perhaps another other number, where signal people will say \$303.20.

Another example. I was sitting at the radio AM DXing, pulling in a station from Claxton, Georgia, WCLA 1470 AM. The announcer came on and through the heavy static I thought I heard him give the final digit of a phone number as “scquatch”, or perhaps it was “hixsith”.

Here are two questions: (1) What number did I hear? (2) What number did the announcer say?

The signal+noise folks will hear question (1) but give the answer to (2) (they will answer (2) twice), whereas the signal folks will answer (1) with “scquatch or hixsith”, and answer (2) by saying, “Hey signal+noise guys, a little hand here?”

We have three different “time series”: temperature, stock price, radio audio. It should be obvious that everybody experiences the “numbers” or “values” of each of these series as they happen. If it is 22F outside, you feel the 22F and not another number (and don’t give me grief about fictional “cold indexes”); if the price is \$303.20, that is what you will pay; if you hear “scquatch”, that is what you hear. You do not experience some other value to which ignorable noise has been added.

For any time series (and “any” include our three), some thing or things caused each value. A whole host of physical states caused the 22 degrees; the mental and monetary states of a host of individuals caused the \$303.20; a man’s voice plus antenna plus myriad other physical states (ionization of certain layers of the atmosphere, etc.) caused “scquatch” to emerge from the radio’s speakers.

In each case, if we knew—really knew—what these causes were, we would not only know the values, which we already knew because we experienced them, but we could predict with certainty what the coming values would be. Yet this list of causes will really only be available in artificial circumstances, such as simulations.

Of the three examples, there was only one in which there was a true signal hidden by “noise”, where noise is defined as that which is not signal. Temperature and stock price were pure signal. But all three are routinely treated in time series analysis as if they were composed of signal+noise. This mistake is caused by the Deadly Sin of Reification.

No model of any kind is needed for temperature and stock price; yet models are often introduced. You will see, indeed it is vanishingly rare not to see, a graph of temperature or price over-plotted with a model, perhaps a running-mean or some other kind of smoother, like a regression line. Funny thing about these graphs, the values will be fuzzed out or printed in light ink, while the model appears as bold, bright, and thick. The implication is always that the model is reality and values a corrupted form of reality. Whereas the opposite is true.

The radio audio needs a model to guess what the underlying reality was given the observed value. We do not pretend in these models to have identified the causes of the reality (of the values), only that the model is conditionally useful putting probabilities on possible real values. These models are seen as correlational, and nobody is confused. (Actual models, depending on the level of sophistication, may have causal components, but since the number of causes will be great in most applications, these models are still mostly correlational.)

We agreed there will be many causes of temperature and stock price values. One of the causes of temperature is not season—how could the word “winter” cause a temperature?—though we may condition on season (or date) to help us quantify our uncertainty in values. Season is not a cause, because we know there are causes of seasons, and that putting “season” (or date) into a model is only a crude proxy for knowledge of these causes. Same thing for date in stock prices.

Given an interest in season, we might display a model which characterizes the average (or some other measure) of uncertainty we might have in temperature values by season (or date), and from this various things might be learned. We could certainly use such a model to predict temperature. We could even say that our 22F was a value so many degrees higher or lower than some seasonal measure. But that will not make the 22F less real.

That 22F was not some “real” seasonal value corrupted by “noise”. It cannot be because season is not a cause: amount of solar insolation, atmospheric moisture content, entrainment of surrounding air, and on and on are causes, but not season.

Meteorologists do attempt a run at causes in their dynamic models, measuring some causes directly and others by proxy and still others by gross parameterization, but these dynamical models do not make the mistake of speaking of signal+noise. They will say the temperature was 82F because of this-and-such. But this will never be because some pure signal was overridden by polluting noise.

The gist is this. We do not need statistical models to tell us what happened, to tell us what values were experienced, because we already know these. Statistical models are almost always nothing but gross parameterization and are thus only useful in making predictions, thus they should only be used to guess the unknown. We certainly do not need them to tell us what happened, and this includes saying whether a “trend” was observed. We need only define “trend” and then just look.

Why carp about this? Because, as said, the signal+noise view brings in the Deadly Sin of Reification (especially in stock prices, where everybody is an after-the-fact expert), and that sin leads to the worse sin of over-certainty. And we all know where that leads.

“But, Briggs. What if we measured temperature with error?”

Great question. Then we are in the radio audio case, where we want to guess what the real values were given our observation. There will be uncertainty in these guesses, some plus-or-minus to every supposed value. This uncertainty must always be carried “downstream” in all analyses of the values, though it very often isn’t. Guessing temperatures by proxy is a good example. We will do these kind of time series at a later date.

The Sin vanishes above, for instance, if the red and green lines are used in making skillful predictions, or in pointing to verifiable causes of kind the lines imply: such as strictly linear (ignoring the start and stop date effects), or whatever causes are supposed behind that red line.

Otherwise, those red and green lines are the Sin. They imply more is known than it is. Maybe it doesn’t seem so to you, because “obviously” those lines are “close enough”. Close enough to what?

Your homework is to find examples of the Sin. The best place to look at “research shows” regressions.

Here are the various ways to support this work:


Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *