Statistics

# Just What Is Signal And What Is Noise?

From the paper

There are two senses of Signal and Noise (SN). The first is obvious: a signal, some known phenomenon, is corrupted by outside forces, where these caused corruptions are the noise. The second is statistical, where what may or may not be a signal is identified as a signal and what may or may not be noise is labeled noise, and where causal notions are oft confused. Models, physical or mixed physical-probability, can be used for either sense.

As you might guess, there are plenty of abuses of the second kind, mostly in economics, but not a few in physics, either, such as in the paper “Reconciling the signal and noise of atmospheric warming on decadal timescales” by Roger Neville Jones and James Henry Ricketts (as recommend by Debrah Mayo; the paper is hosted at her site).

Before the mistakes, the right way. Old people will remember tuning a television to a weak station when broadcasts were analog and before cable. Kids can try this at home with their AM radios, stations which are still allowed to broadcast an analog signal. The signals in these cases are known, and the noises which corrupts them are only partially known. If you’ve listened to the radio in the vicinity of a thunderstorm you’ve heard some powerful noise indeed.

Now the causes of each departure from the signal are not known precisely, but their character has come to be known through their regularity in typical circumstances. This has allowed filters—which are mixed physical-probability models, if you like—to be built which attempt to remove the noise from the signal. These filters do a fine upstanding job in the presence of a strong signal, and do less well when the signal is weak, which is why you see or hear the noise, and which you know is noise because you know what a TV picture looks like or a human voice sounds like.

I often use the examples of license plate and bar code recognition systems, which are the same thing as known-signal-mixed-noise as TV broadcasts. Models are built which incorporate knowledge of the signal—what it looks like, what are its properties—and which incorporate typical knowledge of the noise. That these models/machines don’t always work proves the noise isn’t always typical. No model is perfect.

Very well: enter pure statistical models where the characteristics of the signal and (usually) of the noise are not known but only guessed at. There has to be some kind of guess or these algorithms can’t get off the ground (even smoothers make a guess). A typical, and often unwarranted guess, is that the signal is linear (a straight line) and the noise additive (linear departures from the line).

The picture above is Figure 2 from the paper in question. The caption for the figure reads:

Mean global anomalies of surface temperature with internal trends. The annual anomalies (dotted lines) from five records (HadCRU, C&W, BEST, NCDC, GISS) are taken from a 1880-1899 baseline. Internal trends (dashed lines) are separated by step changes detected by the bivariate test at the p<0.01 error level…

The jagged lines are temperatures estimated (without an error bounds!) from various sources and temperatures are caused by multifarious atmospheric phenomena. Dynamicists have a handle on some of these causes, at least the big ones. They do not know all of the causes; nobody does. That is because these temperatures represent, or what are purported to represent, global averages. That means there are innumerable causes of these numbers.

The straight lines on the plot are the result of SN models.

What the authors are implying by their use of statistical SN models is that the causes are linear and awfully prone to abrupt switches. If the models are right there must have been some mysterious atmospheric force which caused the temperature from 1880 to 1920 to be a strict unbending straight line, but a force which changed its mind all at once in 1920 to be another strict unbending straight line. The force changed is mind several other times, too, as the other changes imply.

What made the force change its mind? Hey, don’t ask. The statistical model is silent on this, just as statistical models are silent on all causes. This being so we have to ask ourselves: what kind of force is so scatterbrained? Answer: the force surely isn’t, and the model used by the authors is wild overreach. The mere presence of the lines fools the eye into thinking signals are there, so the mistake is self-re-enforcing.

This kind of model, which wasn’t needed and there was no call for it, also replaces the data with non-data, the straight lines. Thus the model is a kind of smoother. The data disappears and becomes the model. The Deadly Sin of Reification has struck! Why modern science is so intent on replacing actual data with wild models will be a subject for historians to answer. I think it’s because quantification is addictive. Add that to the background hum of scientism and you get these kinds of things.

I have a whole section on abuses with time series in my book Uncertainty: The Soul of Modeling, Probability & Statistics.

Update Besides re-touting the book in which I go into these things in greater detail, I want to emphasize that (making the false assumption those estimated temperatures are accurate and without uncertainty), that those temperatures are the signal. There is no noise. We experience the temperature that is. There is no “real” temperature “behind” the one actually felt. The data is the data.

Bonus!

From reader John Baglien comes this example of trend fiddling.

Thought might enjoy one of the more egregiously bad examples of analysis I have encountered: http://www.nature.com/articles/srep25061#f1. The scatter plots show enormous variation, but by comparing trends before and after the onset of the industrial revolution, they have concluded that warming trend since roughly the end of the little ice age is greater than it was before it ended. Wow! They also declare “unreliable” about 150 years of data from Lake Suwa, which allows them to draw a seriously cherry-picked trend from 1900 to present tha is totally contradicted by the data they excluded. Then they use simple bimodal comparison of freeze or no freeze, or ice breakup before or after an arbitrary date to classify the number of “extreme” events that occurred. I’m no statistician, but the incompetence of this analysis is mind-boggling.

Categories: Statistics

### 22 replies »

1. Why replace data with models? You always get the answer you want.

Skeptical Science says only “deniers” use that stepped graph, which is why I use it when ever possible. I proudly wear the label, since it means I deny that models are the reality.

John B’s example shows why the global warmists need plenty of various recipes for cherry pie. (I do love scatter-graph interpretations. Reminds me of Rorschach tests!)

2. Robert says:

Regarding your book, which I would love to read but frankly don’t want to spend 60 dollars on — when is the paperback version coming out?

3. Briggs says:

Robert,

I apologize for the cost. It is too high, I agree. We tried to have to changed, but it’s set at the corporate level.

4. Nate says:

I’m confused as to what is even being measured here. “Anomaly” from what? The starting point? Doesn’t picking your starting point affect this a good deal then?

5. Ken says:

Data/numbers analysis aside, “they” will point out physical evidence such as record declines in Arctic ice…and lately…coral bleaching from warming. Here’ an interesting link about coral bleaching from NOAA:

This caught my eye there:

“In January 2010, cold water temperatures in the Florida Keys caused a coral bleaching event that resulted in some coral death. Water temperatures dropped 12.06 degrees Fahrenheit lower than the typical temperatures observed at this time of year.”

For some reason the alarmists are pointing to coral bleaching as evidence of warming, nary a peep out of them about warming causing cooling or whatever the above was about (and presumably similar has/is happened/ing elsewhere)….

6. In nmr signal processing the signal is distinguished from noise by averaging many times. The signal will build up coherently, the noise not. The signal/noise increases by squareroot N, where N is the number of averages. Now, with respect to temperature data, what would be the equivalent? Averaging over a number of similar universes, as perhaps might be done in Quantum Leap? Averaging over different locations? Suggestions wanted.

AIA — Broken Record —

We live in a time when plotting all of the data is now possible. Once upon a time, it wasn’t. I couldn’t manually plot 100 Million data points. I don’t have to manually plot them anymore. The most misleading thing about the plots at the top of this graph is the -0.4 to +1 spread of the data. The range of the data for each year on that graph goes from -80C to +50C. Plot all of the data and you scratch your head wondering WTF is wrong with these highly intelligent, enlightened individuals. Anomaly charts shown without the absolute chart right next to it are liars who don’t know they are lying. The problem with having such charts next to one another is people might start questioning what is going on.

The other problem with such charts is that they are BORING. OMG they are so boring. Unfortunately, most of life can be borderline boring. I don’t have mongol hordes beating at the wall to make it exciting. The horse shit has to be mucked out of the stalls. The garbage has to go to the dump. The dishes have to be washed (or new stacks of paper plates bought).

Rational behavior is boring.

8. Matt Czu says:

Bob K: In X-ray crystallography I was using much longer exposure times (i.e. repeating scans much more times: instead several seconds, for 15-30-60 minutes long). Rationale was exactly the same ‘square root’ clinch. So it works!
As to “GW” T data, this would mean of course quite a challenge. Fast readout should be possible. But then the problem of defining the acceptable and exact time and like (interval, place and height etc.) arises. A few minutes average temp would be nonsense, so perhaps collecting a “day average” temps would be of use. But the minimum – maximum T differences would well make an average useless at first glance. Perhaps doing this for 500 years would give a clearer picture, if at all. But even then speaking a ‘global average’ would well mean nothing.
Nee, I quit thinking on this. It is a mess anyway. It is NOT a laboratory scale, anyway.

9. Matt Czu says:

I just tried reading this Nature(!) article highlighted in the J. Baglien citate: simply incredible! I struggled for 10 mins or so, but then I’m giving up. How in the hell relate ‘scientists’ two places at very different geographic sites and virtually not a single common reference (apart from seeing ice somewhere forming or melting)? Feeding all this and all with meaningless uncertified data and plots and so on. Nature accepts this??
Happy that I no longer subscribe to it.

10. Gary says:

One man’s noise is another man’s signal. E.g., temperature and precipitation both affect the growth of tree rings, yet either may be ignored when it suits the researcher. And of course, a myriad of other things (herbivory, nutrients, wind, etc.) influence cambial growth. So noise is relative to one’s perspective. Wouldn’t it be fairer to say that true “noise” results from measurement error and analytical procedures and all the rest is just unaccounted for signals?

11. Dodgy Geezer says:

Look at how they collect their data – and, more importantly, how they adjust it later. The only signal there is the one they put in.

Use satellites. Less opportunity for fiddling the data….

12. Milton Hathaway says:

BobK – We used to refer to the type of averaging that I believe you are describing as “synchronous averaging”. This type of averaging needs a synchronizing ‘signal’ in order to work as you describe. An example: assume that the climate never changed, and you collected a century of daily temperature readings. You divide the data into 100 year sets, and add all the sets to each other, day by day. You would then have an improved SNR for, say, the ‘underlying’ temperature on day 100. The amount of improvement depends on the nature of the ‘noise’. One case where the improvement might be poor is if the noise is not zero-mean (say if, over time, the mercury bulb slipped relative to the etched temperature scale).

We often did not have these handy synchronizing signals available. Once I had the brilliant idea of creating a synchronizing signal from the measured signal itself. Initial results were very promising. Then I tried the procedure on measured data that contained no signal, just noise. The procedure created a signal anyway! It was very disturbing, because the signal that was created from nothing had all the earmarks of a valid signal. (In hindsight it should have been obvious that this would happen, so I don’t often tell this story.)

To answer your question about how to apply this to world temperature data, I think that one would have to find, or create, a suitable synchronizing signal. Here’s an idea: we magically get the countries of the world to control their CO2 emissions on a carefully chosen predefined schedule. Run all the carbonated beverage canners on full tilt and build up a huge surplus, then open the cans all at one? You get the idea. You’d want to design the schedule to be as orthogonal as possible to other potential synchronizing signals.

Gary – one possible flaw in your definition of noise is that often unwanted signals come from the desired signal. For example, if the desired signal is sinusoidal in nature and nonlinearities are present, a sinusoidal with half the period can appear in the measured data. This unwanted signal is still often called “noise”, or maybe “correlated noise”.

Briggs – would you say your new book has a lot of real-life horror stories? I’ve been on the fence about buying it, but I really like horror stories. I seem to absorb content a lot better in the form of “if you do A, this bad thing B will happen, and here’s an example of just that”.

13. SteveK says:

I’m not a stats guy but if I look at any stock chart I have a heck of a time knowing the trend – because the trend depends on the time frame. I can manipulate “the trend” by picking the right data.

Zoom in close, you’re in an uptrend. Widen the view, you’re in a downtrend. Widen even more, you’re in a steep downtrend. Take in all data and the trend is slightly up.

14. However you look at the data, or Lake Suwa, we can at least see it’s getting warmer out there.

JMJ

15. Dodgy Geezer says:

…However you look at the data, or Lake Suwa, we can at least see it’s getting warmer out there….

Warmer since when?

2015/6 is an El Nino – warmer than 2010. But not as warm as 1997/98. The 2000 plateau is definitely warmer than 1970 (which was a cold spell), but similar in temp to the mid 1940s. The 1890s were really cold, but the 1880s peaked at similar temperatures to today.

If we look at the entire planet’s history, we see long cold periods (glacials), interspersed with short warm periods (interglacials). These interglacials grow rapidly from a cold beginning to a peak, and then gradually fall back. During this fall there can be wide temperature excursions: temperatures either rising or falling for periods of 10,000 years are not uncommon. During the last several thousand years we have had the Medieval warm period, the Roman warm period, and the Minoan warm period – all occasions when it was hotter than today.

It is therefore not surprising to find it getting hotter today. It would not be surprising for it to get hotter naturally for a continuous period of 10,000 years. If anything, it is rather surprising that the temperature has remained so equitable since around 1920….

16. Actually, JMJ, unless you’re Michael Mann (and he took it back when people mentioned if we didn’t need data, we didn’t need him), you cannot tell it’s getting warmer. Local climates, the only ones we can observe directly, show no trends in most cases. Only after homogenization, statistical manipulation and making a colorful graph (which is how Mann knew warming was “serious”) can we “see” it’s getting warmer. The difference between hottest years average anomaly—no real temperatures used—is less than 1/2 a degree C. I defy any human being to stand in a room and tell me the temperature has changed by 1/2 a degree C. For that matter, I doubt most can do 2 or 3 degrees difference. What we “know” is the pretty graphs and the government say it’s getting hotter. That would be the noise, not the signal, by the way.

Milton: Very useful explanation. Thank you.

Dodgy: Agreed.

17. Joy says:

JMJ,
Whatever ‘we’ thinks, rest assured I do not think the same necessarily.
Have you ever read or listened to Gavin Schmidt’s thought on modelling the climate or modelling in general?
I am really curious to know the answer to this and I promise not to reply.

18. Doug M says:

Believe it or not, some people still receive over-the-air television.

This is how my girl friend watches TV. Its is a digital transmission and a digital antenna, but you we still have the joy of playing with the antenna. She makes little tin foil hats that go on the antenna for some channels and off the antenna for others. And there is still that moment of telling everyone to hold still because we have a good single.

19. Doug M: I have over-the-air television with a converter box (and three TV’s that still have built-in VCRs!). I have an outside antenna from the analog days. Inside, I have a connection that allows me to disconnect from the outdoor antenna since one channel rarely comes in if I’m connected to the outside antenna. I also do the “toss the antenna around” move to try and get a better picture. Actually, most of the time, I have really good pictures and once I figured out how to toss the antenna and/or disconnect the outside one, things work well. The biggest difference now is you don’t get “snow” with a bad signal, you get pixelation.
I would note that when I first moved to Wyoming, there were two television stations. One shared two networks. I now get 9 channels, including the CW! It’s a quantum leap!

20. When doing calorimetry, the only temperature that matters is the one of the local sample and that’s determined strictly by the local material composition and the energy content of the system. You can’t average out of sample temperatures with the in-sample ones and expect the analysis to mean anything when each sample isn’t one of the same kind of experiment or system. This is akin to averaging the numerical labels on ping-pong balls and trying to predict the result of the next trial, in my opinion.

Now about weather and climate, climate is a summary statement of the previously realized local weather. Climate isn’t what I have to survive in and/or adapt to; for it is the weather that I live in as a realized state.