Search Results for “p-values” – Page 68

Model Selection and the Difficulty of Falsifying Probability Models: Part II

I hope all understand that we are not just discussing statistics and probability models: what is true here is true for all theories/models (mathematics, physics, chemistry, climate, etc.). Read Part…

Philosophy Statistics

Do Americans Still Dislike Atheists?

The answer is: Americans hate atheists just as much as they hate theists. Residents of these grand United States had always disliked prigs of any stripe and are not shy…

Statistics

The Purpose Of All Statistics. Ithaca Teaching Journal, Day5

First, and most strongly, probability need not have anything to do with data. For example, we can compute a value for the probability that "Matt wears a hat" given the…

Statistics

Wall Street Journal: Better than a statistics textbook.

On Thursday 14 August, the Wall Street Journal had two excellent articles, which expertly described the statistics and uncertainty of their topics. Several readers have wrote in asking for an…

Statistics

Suicides increase due to reading atrocious global warming research papers

I had the knife at my throat after reading a paper by Preti, Lentini, and Maugeri in the Journal of Affective Disorders (2007 (102), pp 19-25; thanks to Marc Morano for the link to World Climate Report where this work was originally reported). The study had me so depressed that I seriously thought of ending it all.

Before I tell you what the title of their paper is, take a look at these two pictures:

The first is the yearly mean temperature from 1974 to 2003 in Italy: perhaps a slight decrease to 1980-ish, increasing after that. The second pictures are the suicide rates for men (top) and women (bottom) over the same time period. Ignore the solid line on the suicide plots for a moment and answer this question: what do these two sets of numbers, temperature and suicide, have to do with one another?

If you answered “nothing,” then you are not qualified to be a peer-reviewed researcher in the all-important field of global warming risk research. By failing to see any correlation, you have proven yourself unimaginative and politically naive.

Crack researchers Preti and his pals, on the other hand, were able to look at this same data and proclaim nothing less than “Global warming possibly liked to an enhanced risk of suicide.” (Thanks to BufordP at FreeRepublic for the link to the on-line version of the paper.)

How did they do it, you ask? How, when the data look absolutely unrelated, were they able to show a concatenation? Simple: by cheating. I’m going to tell you how they did it later, but how—and why—they got away with it is another matter. It is the fact that they didn’t get caught which fills me with despair and gives rise to my suicidal thoughts.

Why were they allowed to publish? People—and journal editors are in that class—are evidently so hungry for a fright, so eager to learn that their worst fears of global warming are being realized, that they will accept nearly any evidence which corroborates this desire, even if this evidence is transparently ridiculous, as it is here. Every generation has its fads and fallacies, and the evil supposed to be caused by global warming is our fixation.

Below, is how they cheated. The subject is somewhat technical, so don’t bother unless you want particulars. I will go into some detail because it is important to understand just how bad something can be but still pass for “peer-reviewed scientific research.” Let me say first that if one of my students tried handing in a paper like Preti et alia’s, I’d gently ask, “Weren’t you listening to anything I said the entire semester!”

Statistics

Demonstration of how smoothing causes inflated certainty (and egos?)

I’ve had a number of requests to show how smoothing inflates certainty, so I’ve created a couple of easy simulations that you can try in the privacy of your own home. The computer code is below, which I’ll explain later.

The idea is simple.

I am going to simulate two time series, each of 64 “years.” The two series have absolutely nothing to do with one another, they are just made up, wholly fictional numbers. Any association between these two series would be a coincidence (which we can quantify; more later).
I am then going to smooth these series using off-the-shelf smoothers. I am going to use two kinds:
1. A k-year running mean; the bigger k is, the more smoothing there is’
2. A simple low-pass filter with k coefficients; again the bigger k is, the more smoothing there is.
I am going to let k = 2 for the first simulation, k = 3 for second, and so on, until k = 12. This will show that increasing smoothing dramatically increases confidence.
I am going to repeat the entire simulation 500 times for each k (and for each smoother) and look at the results of all of them (if we did just one, it probably wouldn’t be interesting).

Neither of the smoothers I use are in any way complicated. Fancier smoothers would just make the data smoother anyway, so we’ll start with the simplest. Make sense? Then let’s go!

Here, just so you can see what is happening, are the first two series, x0 and x1, plotted together (just one simulation out of the 500). On top of each is the 12-year running mean. You can see the smoother really does smooth the bumps out of the data, right? The last panel of the plot are the two smoothed series, now called s0 and s1, next to each other. They are shorter because you have to sacrifice some years when smoothing.

The thing to notice is that the two smoothed series eerily look like they are related! The red line looks like it trails after the black one. Could the black line be some physical process that is driving the red line? No! Remember, these numbers are utterly unrelated. Any relationship we see is in our heads, or was caused by us through poor statistics methodology, and not in the data. How can we quantify this? Through this picture:

This shows boxplots of the classical p-values in a test of correlation between the two smoothed series. Notice the log-10 y-axis. A dotted line has been drawn to show the magic value of 0.05. P-values less than this wondrous number are said to be publishable, and fame and fortune await you if you can get one of these. Boxplots show the range of the data: the solid line in the middle of the box says 50% of the 500 simulations gave p-values less than this number, and 50% gave p-values higher. The upper and lower part of the box designate that 25% of the 500 simulations have p-values greater than (upper) and 25% less than (lower) this number. The outermost top line says 5% of the p-values were greater than this; while the bottommost line indicates that 5% of the p-values were less than this. Think about this before you read on. The colors of the boxplots have been chosen to please Don Cherry.

Now, since we did the test 500 times, we’d expect that we should get about 5% of the p-values less than the magic number of 0.05. That means that the bottommost line of the boxplots should be somewhere near the horizontal line. If any part of the boxplot sticks below above the dotted line, then the conclusion you make based on the p-value is too certain.

Are we too certain here? Yes! Right from the start, at the smallest lags, and hence with almost no smoothing, we are already way too sure of ourselves. By the time we reach a 10-year lag—a commonly used choice in actual data—we are finding spurious “statistically significant” results 50% of the time! The p-values are awful small, too, which many people incorrectly use as a measure of the “strength” of the significance. Well, we can leave that error for another day. The bottom line, however, is clear: smooth, and you are way too sure of yourself.

Now for the low-pass filter. We start with a data plot and then overlay the smoothed data on top. Then we show the two series (just 1 out of the 500, of course) on top of each other. They look like they could be related too, don’t they? Don’t lie. They surely do.

And to prove it, here’s the boxplots again. About the same results as for the running mean.

What can we conclude from this?

The obvious.

BORING DETAILS FOLLOW

Philosophy Statistics

Why probability isn’t relative frequency: redux

(Pretend, if you have, that you haven't read my first weak attempt. I'm still working on this, but this gives you the rough idea, and I didn't want to leave…

Anybody see this one?

The book is The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives by Deirdre Nansen McCloskey and Steve Ziliak. From the description at Amazon:…

Cookie on Bipartisan Means The Fix Is InApril 28, 2024
So what does this situation say about who are the best statistician or who have the best understanding of statistics.…
Brian (bulaoren) on Student Protests To End After Finals: TWIDApril 27, 2024
When I was a teenager, I used to spend my Saturdays wandering around the local shopping mall. Why?.. Well, that…
Apollo on The Hierarchy Of Intelligences & IQ: We’re Not As Smart As We ThinkApril 27, 2024
For you, maybe. I know things I shouldn't know. My thinking is beyond your standard "genius"
Heresolong on Student Protests To End After Finals: TWIDApril 27, 2024
"Note that the Republican Speaker is willing to use force to shut down anti-Jewish protests, but cares not one whit…
C-Marie on Student Protests To End After Finals: TWIDApril 26, 2024
Without God at the forefront and wholly throughout, a nation has no backbone, and the children suffer tremendously. God bless,…