Philosophy

# How to Cheat: Stats 101 Chapter 14

I’ve decided to jump ahead a few chapters. Chapters 10 – 13 are very important and cover material that comprises 90% of all the actual statistics that is practiced by civilians. Topics like “testing” and regression—how they are done in classical and Bayesian statistics, why these methods are too sure of themselves, and why observable statistics is the only proper way.

But I can tell that I am testing the patience of my audience, so I will leave these more technical chapters for the book itself.

Thus, here I return to something eminently practical: HOW TO CHEAT WITH STATISTICS.

It is important these days for people to know how to get away with as much as they possibly can. This chapter shows you how to do it.

There are no cheap methods like data fudging or just plain lying—those techniques are for pikers. No: what I give you is genuine, sophisticated gold. Tricks you can actually use and get away with. Tricks that work.

I must be out of my mind to give these secrets away for free, but it is a measure of how much I love you, my audience, my faithful readers.

Only an excerpt is in this posting. To get the whole Chapter, you’ll have to download it. Here is the link.

#### 2. Conditioning

A typical academic study is one, say, that gathers two groups of college kids, maybe about 50 in each set, and has them do some task or asks them to rate something. Another study gathers data from a small area, say a neighborhood in a city, where the sample size may be as high as a few hundred, and asks sociological and economic questions of the people that live there. A medical study might try two treatments in two groups of a hundred or so people. When the data from these studies are in, the results are compiled and papers are published. Claims are made in these papers. The college kids paper will say that people act one way and not another; the city paper will say that poor people have less money; and the medical paper will claim treatment A is better than treatment B.

We already know that if all these researchers wanted to do was to say something about their datasets, then they do not need statistics or probability models. They can look at their data and say, yes, more people got better under treatment A than under treatment B. They would be finished. Evidently, the creators of these studies do not want to make statements only about the past data; they want to imply their findings are more widely applicable.

By far the majority of these kinds of studies, published in academic journals, concern humans. As of this writing, there are over 6.6 billion humans alive, about 100 billion are dead, and God only knows how many more are yet to live. Incidentally, whatever you do, do not mention these facts in your results (unless, of course, you happen to be writing about demography), it will weaken your argument.

Are the results from the college kids study applicable to all humans? All those that lived in the past, those that will live in the future, even those that live now but not in the town in which the college lie? Those who are in their 50s?, 80s? who are less than 10? Poorer people and those with enough money to “get a degree”? (Kids go to college to “get a degree” nowadays, and not usually for anything else. Well, maybe socialization. These are rational choices given the way things are.) Kids at other universities? Let’s be clear: the researchers will gather data on their 100 kids, create a probability model, and since they have read this book, they will not just make a statement about the parameters, but calculate the probability distribution of future observables. The only problem is, about whom do we apply this probability distribution?

Before we answer that, think about the medical trial, which was conducted at a hospital in a city on the East Coast of the United States of America. The physicians also use their data to create a probability distribution of future patients. But who exactly are these patients? People who live in other cities on the east coast?, anywhere in the USA? Canada, too? Or only cities of a certain size? Or do the future patients merely have to “look like” the patients in the old data; that is, be of the same ages, sex ratio, weights, economic condition, have eaten the same things in their lifetimes, traveled to the same places, engaged in the same activities, and so on. Would it have applied to the people who used to be alive, and to people not yet born, indefinitely into the future?

Nobody knows the answers to these questions, which is highly in your favor, especially if you have just completed a study using data “at hand”, that is, that was easy for you to collect. You certainly want to imply that your results are as broadly applicable as possible because this makes you more of an expert than somebody who merely claims to know the habits of a small group of college kids in the year 2008 only, in city C and who are unmarried, between 19 and 22 years old, and whose parents are upper middle class, etc. Openly stressing these limitations might be noble and correct, but it will not get you far. State your results in terms of all people. For example, say “People choose option A over B which gives weight to our theory of psychology.” Do not say, “College kids in our freshman psychology class, who might not be anything like the rest of the population, carried out an experiment for us?and surely they took this task seriously?and…”

Same thing in the medical trial. Emphasize your small p-value, spend more time talking about how the two groups of patients (those that received treatment A and those that got B) were not different than one another. Tell how there were roughly equal numbers of men and women in both treatments, and the same with age, weight, etc. This is an excellent strategy because it is useful information: if the two groups did differ, then your results may be biased. Well, this is a wonderful distraction because it allows you to ignore or downplay the discussion of how your results might only be useful for a small subset of patients.

#### 5. Publishable p-values

Most journals, say in medicine or those serving fields ending with “ology”, are slaves to p-values. Papers have a difficult, if not impossible, time getting published unless authors can demonstrate for their study a p-value that is publishable, that is, that is less than 0.05. Sometimes, the data are not cooperative and the p-value that you get from using a common statistic is too large to see the light of print. This is bad news, because if you are an academic, you must publish papers else you can?t get grants, and if you don?t get grants, then you do not bring money into your university, and if you don?t bring money into your university, you do not get tenure, and if you do not get tenure, then you are out the door and you feel shame.

So small p-values are important. I of course advise against using classical statistics methods, but if you are forced to (and some journal editors insist on it), then all is not lost if an initial large p-value is found. In fact, I would go so far to say that if you cannot find a publishable p-value in any situation, then you are not trying hard enough. There are several ways to lower your p-value.

The most well known is to increase your sample size. This one is a lock. Let?s take a look at the t-test statistic from Chapter 10 to see why.

(see the book)

There is a mathematical phrase that begins “without loss of generality” which I now invoke by letting, for ease of notation, nA = nB = n and s2 = s2 = s2 , so that t(x) becomes

(see the book)

Remember that we want a large statistic, a large t, the larger the better, because larger ts mean smaller p-values. Do you see the trick? A larger n means a larger t! All you have to do is to increase your sample size and just wait for the small p-values to start rolling in. This trick always works in any classical situation, even when the difference xA ? xB is too small to be of interest to anybody. This is why having a small p-value is called attaining statistical significance and not practical or useful significance.

Incidentally, this trick also works in Bayesian statistics in the sense that the posterior distribution of μ A ? μ B will have most probability above or below zero. But it fails miserably in modern observable statistics because a trivial difference in μ A ? μ B won?t make a tinker?s dam worth of difference in the probability distribution of future observables.

The next trick, if you cannot increase your sample size, is to change your statistic. This comes from the useful loophole in classical theory that there is no rule which specifies which statistic you must use in any situation. Thus, though some creativity and willingness to spend time with your statistical software, you can create small p-values where other people see only despair. This isn’t so easy to do in R because you have to know the names of the alternate statistics, but it?s cake in SAS, which usually prints out dozens of statistics in standard cases, which is one reason SAS is worth its exorbitant price. Look around at the advertising brochures of statistical software and you will see that the openly boast of the large number of tests on offer.

For example, for use in “testing differences between proportions”, just off the top of my head I can think of the z statistic, the proportions test with and without correction for continuity (two or three to choose from here), chi-squared test, Fisher’s exact test, McNemar’s test, logistic regression. There are dozens more and teams of academic statisticians constantly add to the pile. Don’t believe it? Here?s a small table of these tests for the TSD/Sex data from Chapter 11.

Test p-value
Prop test 0.78
Fisher’s 0.70
Logistic Reg. 0.52
chi-squared 0.50
z test 0.49
McNemar’s 0.24

Because I was only able to get to 0.24 just means I didn?t try hard enough. Which is the correct p-value? They all are; that?s the beauty of this trick. Not one of these p-values is more “right” than any other one. Each is valid. If all you know is classical statistics, let this knowledge sink in. It should prove to you that p-values are not what you probably thought they were.

For ‘testing differences between means”, there is the t-test (a couple of versions of this, actually), Wilcox test (also called Mann- Whitney), sign tests, Spearman correlation tests, Kendall’s tau, Kruskal-Wallis test, Kolmogorov-Smirnov test, permutation test, Friedman two-way analysis of variance—I’m running out of breath—and many more. Here?s some of those tests for the advertising data:

Test p-value
Spearman 0.87
Perm. 0.20
t-test 0.19
Wilcox 0.14
Kol.-Smi. 0.08

Nearly there!

Please remember that in this example, like the previous one, the data is the same; the only thing that changes is that classical statistical test.

The key to this deceit is to never admit what you did. When it comes time to write up your result boldly and authoritatively state, “We used Johnston’s (Johnston, 1983) frammilax test for differences in means.” Tossing in a citation always cows potential critics; tossing in two or more guarantees editorial acquiesence. Do not tell the reader that you went through a dozen tests to find the lowest p-value. Act as if “Johnston’s test” was what you had in mind all along.

This technique is unavailable in Bayesian or observable statistics. True, you can change your default prior distribution on the parameters or even change the model (see below), but editors in most fields are still suspicious of modern methods and tend to be conservative and will likely insist on a well-known default. There will be more room for creativity in, say, ten years when modern methods become familiar.

Our last option, if you cannot lower your p-value any other way, is to change what is accepted as publishable. So, instead of a p-value of 0.05, use 0.10 and just state that this is the level you consider as statistically significant. I haven?t seen any other number besides 0.10, however, so if your p-value is larger than this the best you can do is to claim that your results are “suggestive” or “in the expected direction.” Don’t scoff, because this sometimes works.

You can really only get away with this in secondary and tertiary journals (which luckily are increasing in number) or in certain fields where the standard of evidence is low, or when your finding is one which people want to be true. This worked for second-hand smoking studies, for example, and currently works for anything negatively associated with global warming.

Categories: Philosophy

### 5 replies »

1. Ray says:

When I had my first statistics course in the early 1960s, the professor suggested we read the Huff book. I bought and still have a copy. Cheating has become more blatant the past 40 years. Sometimes I think they just make up statistics, aka lie.

2. Oh Matt–

We all know we can become even bolder!

There’s an even better way to cheat. Insist the thing you are trying to prove is true. Pick a p value of 0.01, by some measure. Use a small sample, and then, show the data is ‘not inconsistent with’ the hypothesis.

Then, imply that you’ve now found evidence to support your hypothesis. In the press release suggest it’s proven!

3. Even better, tailor your press release to fit the biases of a special interest group, and then have them issue it. This is actually the preferred method today. It gives arms-length plausible deniability just in case your actual peers question you, and strongly discourages them from doing so for fear of incurring the wrath of the special interest group. If this method is mastered, you don’t even need to do the actual sampling or twisted analysis! Just state your a priori conclusion, plug into the PR system, and then reap the rewards: fame, grants, Nobel prizes, etc.

4. “Virtual Reliability Statistics or How to Cheat With Reliability Statistics” http://www.fieldreliability.com/VirtStat.htm, may entertain you for a short time. This web page shows examples I’ve seen or done. I will put a link to Briggs’ chapter 14 on that web page so that reliability people will have more ways to cheat.

Your readers may be interested in “Virtual Reliability” too, http://www.fieldreliability.com/Virtrel.htm. Virtual Reliability is the common practice of believing what you want about reliability, regardless of reality.