From reader Ernst, this interesting probability/statistics question. First, NNT = number needed to treat, which is, stealing from this site to save time, “The Number Needed to Treat (NNT) is the number of patients you need to treat to prevent one additional bad outcome (death, stroke, etc.).”

NNT = round( 1/(control_rate – treatment_rate) ).

The rates are for the “bad” outcome, e.g. staying sick vs. being cured. I’m not a big fan of NNT, though some people like it. I prefer the cure rates for both groups (the opposite of these rates), cast as predictive probabilities of course.

Now to the question (lightly edited):

The thought experiment:

4 decks of cards. Shuffled together into one pile of cards. Ace of hearts is the only card the responds to treatment by real physical means. NNT = 52.

Coin toss used to sort the 208 cards into two groups/stacks (placebo/intervention).

My focus and confusion is regarding the coin toss when used to “sort” the 4 ace of hearts. I assume all other cards/tosses are irrelevant since they don’t respond to treatment. This is where I begin to stumble. It is not clear to me how many ace of hearts are actually needed to be “confident” that the ace of hearts are reasonably balanced to the point where the NNT =52 is “trustworthy.”

If all 4 aces of hearts end up in the placebo group the trial fails. If all ace of hearts end up in the control group the NNT is distorted by a factor of 2.

Lacking any better ideas, I asked myself if estimating a proper balance of ace of hearts is essentially the same as asking how many coin tosses are need to judge a coin “fair” within a margin of error and confidence level. If this approach is the same thing as judging a coin fair, then it seems to me that sample size might be driven by the logic of coin tossing as described here:

https://en.wikipedia.org/wiki/Checking_whether_a_coin_is_fair

I don’t pretend to understand the link fully, but it seemed to imply extremely large sample sizes since it would be only applied to ace of hearts. Thus the gist of the idea: to properly measure an NNT of 52 an extreme sample is required, not the typical text book sizes I seem to run across around 380.

If I managed to go off the deep end here into the abyss of layman confusion, then any critique/clarification would be much welcomed.

Lot of ideas here, so let’s keep them straight. There are 4 Ace of hearts and 208 total cards. In any cut, we can get 0 AHs, 1 AHs, etc., up to all 4 AHs in the control pile, and the opposite in the drug/treatment pile. Ignore the chance of these cuts for a moment and let’s instead look at how the data would analyzed in each case.

There are two ways to look at this: (1) all AHs get better, i.e. are cured, by either the placebo/control or the intervention/treatment/drug, or (2) all AHs get better only if they are in the drug group; any AHs in the control aren’t cured.

**(1) All AHs get better regardless of group.**

Let a = total number of AHs, m = number of AHs in control group, C = control bad outcome rate, D = drug bad outcome rate, and n = total number of cards.

Then

NNT = 1/(C-D),

with

C = (n/2 – m) / (n/2),

D = (n/2 – (a-m)) / (n/2),

since we know there are n/2 in each pile of cards by design.

So

NNT = n/ (2a -4m), m = 0, 1, …, a.

This is rounded in practice. For other readers, yes, NNT can be negative. Incidentally, if you have a pre-specified desired NNT, and a and m is known in any cut, then you can solve for n.

n = NNT x (2a -4m), m = 0, 1, …, a.

In any case, while we know a, we don’t know m in any cut. But we can figure the chances, which we’d like for other reasons. That calculation makes sense in seeing how decisions would change with a.

The chance of getting m AHs when there are a total, in a “hand” of n/2 cards can be calculated. Think of this way. You have the deck of n cards with a AHs in it, and you deal a “hand” of n/2. The probability is:

choose(a,m) x choose(n-a, n/2 – m) / choose(n, n/2).

The reason is that there are choose(a, m) ways to get m AHs out of a. The remaining number of cards are n – a and we want n/2 – m of them. And there are choose(n, n/2) ways to getting hands of size n/2. I don’t see any obvious simplifications of this algebraically, though in code there are if the numbers get big. Here it is in R (mixing code and output):

```
a = 4 # number of AHs
n = 52*a # 52 cards per deck
i = 0
p = NA
nnt = NA
for (m in 0:a){
i = i + 1
p[i] = choose(a,m) * choose(n-a, n/2 - m) / choose(n, n/2)
nnt[i] = n/ (2*a -4*m)
}
cbind(0:a,nnt,p)
m nnt p
[1,] 0 26 0.06069282
[2,] 1 52 0.24998233
[3,] 2 Inf 0.37864970
[4,] 3 -52 0.24998233
[5,] 4 -26 0.06069282
# sanity check
sum(p)
[1] 1
```

As you can see, the highest chance is for an infinite NNT, which happens when the bad-risk for both control and treatment is the same. That infinity bars any kind of “expected value” calculation we might make.

**(1) All AHs get better only when in drug group.**

The probabilities of AH distribution per pile, control or drug, remain the same. The only thing that changes is the NNT calculation.

C = (n/2) / (n/2),

D = (n/2 – (a-m)) / (n/2),

where again since we know there are n/2 in each pile of cards by design, we get

NNT = n / (2(a -m)), m = 0, 1, …, a.

We only have to make one obvious change in the code, which now gives (recalling m is number of AH in the control/placebo group):

```
m nnt p
[1,] 0 26.00000 0.06069282
[2,] 1 34.66667 0.24998233
[3,] 2 52.00000 0.37864970
[4,] 3 104.00000 0.24998233
[5,] 4 Inf 0.06069282
```

**Homework**

Now everything we did is as if an oracle told us in advance there would be a AHs; we just didn’t know where they’d be. In most applications, we don’t know a. But the above analysis can still work if we put a probability on a, which in theory anyway, can go from 0, 1, 2, …, n. The probability you put on this distribution for a will be entirely *ad hoc*, unless you can argue from some other outside premises what a are more likely than other.

Anyway, put a distribution on a, then re-do the analysis. Hint: there will be an outside loop on the a.

Many would analyze the “data” from trials like this, with C and D groups, using testing and p-values (bleck). Or they could form a predictive probability (fun). How would those analyses change with m? Use both assumptions (1) and (2).

*To support this site and its wholly independent host using credit card or PayPal (in any amount) click here*