There are two senses of Signal and Noise (SN). The first is obvious: a signal, some known phenomenon, is corrupted by outside forces, where these caused corruptions are the noise. The second is statistical, where what may or may not be a signal is identified as a signal and what may or may not be noise is labeled noise, and where causal notions are oft confused. Models, physical or mixed physical-probability, can be used for either sense.
As you might guess, there are plenty of abuses of the second kind, mostly in economics, but not a few in physics, either, such as in the paper “Reconciling the signal and noise of atmospheric warming on decadal timescales” by Roger Neville Jones and James Henry Ricketts (as recommend by Debrah Mayo; the paper is hosted at her site).
Before the mistakes, the right way. Old people will remember tuning a television to a weak station when broadcasts were analog and before cable. Kids can try this at home with their AM radios, stations which are still allowed to broadcast an analog signal. The signals in these cases are known, and the noises which corrupts them are only partially known. If you’ve listened to the radio in the vicinity of a thunderstorm you’ve heard some powerful noise indeed.
Now the causes of each departure from the signal are not known precisely, but their character has come to be known through their regularity in typical circumstances. This has allowed filters—which are mixed physical-probability models, if you like—to be built which attempt to remove the noise from the signal. These filters do a fine upstanding job in the presence of a strong signal, and do less well when the signal is weak, which is why you see or hear the noise, and which you know is noise because you know what a TV picture looks like or a human voice sounds like.
I often use the examples of license plate and bar code recognition systems, which are the same thing as known-signal-mixed-noise as TV broadcasts. Models are built which incorporate knowledge of the signal—what it looks like, what are its properties—and which incorporate typical knowledge of the noise. That these models/machines don’t always work proves the noise isn’t always typical. No model is perfect.
Very well: enter pure statistical models where the characteristics of the signal and (usually) of the noise are not known but only guessed at. There has to be some kind of guess or these algorithms can’t get off the ground (even smoothers make a guess). A typical, and often unwarranted guess, is that the signal is linear (a straight line) and the noise additive (linear departures from the line).
The picture above is Figure 2 from the paper in question. The caption for the figure reads:
Mean global anomalies of surface temperature with internal trends. The annual anomalies (dotted lines) from five records (HadCRU, C&W, BEST, NCDC, GISS) are taken from a 1880-1899 baseline. Internal trends (dashed lines) are separated by step changes detected by the bivariate test at the p<0.01 error level…
The jagged lines are temperatures estimated (without an error bounds!) from various sources and temperatures are caused by multifarious atmospheric phenomena. Dynamicists have a handle on some of these causes, at least the big ones. They do not know all of the causes; nobody does. That is because these temperatures represent, or what are purported to represent, global averages. That means there are innumerable causes of these numbers.
The straight lines on the plot are the result of SN models.
What the authors are implying by their use of statistical SN models is that the causes are linear and awfully prone to abrupt switches. If the models are right there must have been some mysterious atmospheric force which caused the temperature from 1880 to 1920 to be a strict unbending straight line, but a force which changed its mind all at once in 1920 to be another strict unbending straight line. The force changed is mind several other times, too, as the other changes imply.
What made the force change its mind? Hey, don’t ask. The statistical model is silent on this, just as statistical models are silent on all causes. This being so we have to ask ourselves: what kind of force is so scatterbrained? Answer: the force surely isn’t, and the model used by the authors is wild overreach. The mere presence of the lines fools the eye into thinking signals are there, so the mistake is self-re-enforcing.
This kind of model, which wasn’t needed and there was no call for it, also replaces the data with non-data, the straight lines. Thus the model is a kind of smoother. The data disappears and becomes the model. The Deadly Sin of Reification has struck! Why modern science is so intent on replacing actual data with wild models will be a subject for historians to answer. I think it’s because quantification is addictive. Add that to the background hum of scientism and you get these kinds of things.
I have a whole section on abuses with time series in my book Uncertainty: The Soul of Modeling, Probability & Statistics.
Update Besides re-touting the book in which I go into these things in greater detail, I want to emphasize that (making the false assumption those estimated temperatures are accurate and without uncertainty), that those temperatures are the signal. There is no noise. We experience the temperature that is. There is no “real” temperature “behind” the one actually felt. The data is the data.
From reader John Baglien comes this example of trend fiddling.
Thought might enjoy one of the more egregiously bad examples of analysis I have encountered: http://www.nature.com/articles/srep25061#f1. The scatter plots show enormous variation, but by comparing trends before and after the onset of the industrial revolution, they have concluded that warming trend since roughly the end of the little ice age is greater than it was before it ended. Wow! They also declare “unreliable” about 150 years of data from Lake Suwa, which allows them to draw a seriously cherry-picked trend from 1900 to present tha is totally contradicted by the data they excluded. Then they use simple bimodal comparison of freeze or no freeze, or ice breakup before or after an arbitrary date to classify the number of “extreme” events that occurred. I’m no statistician, but the incompetence of this analysis is mind-boggling.