Hypothesis testing has told us what probably isn’t true. It can’t see all evidence. Here’s how to do it right.
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Given below; see end of lecture.
Lecture
This lecture, when I write it in full, will appear in the updated version of So, Your Think You’re Psychic?
Forgive the lack of a complete written lecture. If you read last week’s lecture, this is simplicity itself.
If a person succeeds in a psychic test, it may be he got lucky. If we consider the performance of multiple tests, on this individual or across several, it becomes more difficult to conclude the supposed psi power exists. All we do here is extend the calculations we made last week to reproducibility.
In which lies much of the solution to reproducibility crisis.
Please review last week’s lecture first or this one will make no sense. Recall we ended with two probabilities for each of our three psi hypotheses, given 4 “hits” in an experiment, and based on two initial beliefs, my skeptical one and your agnostic, maximum entropy one.
Recall this is the chance (in percentage form) for each hypothesis $\Psi_i$ (= 1/52, 2/52, or 3/52) of getting X hits (shown only up to 5; but recall it goes to 52).
X (hits) | X|$\Psi_1$ | X|$\Psi_2$ | X|$\Psi_3$ |
---|---|---|---|
0 | 36 | 13 | 5 |
1 | 37 | 27 | 14 |
2 | 19 | 28 | 23 |
3 | 6 | 18 | 23 |
4 | 1 | 9 | 17 |
5 | 0 | 3 | 10 |
The chance, for each hypothesis, of getting more than 4 hits was 0.015, 0.09, and 0.17.
Here is the chance of getting more than 4 hits, for each hypothesis, in at least one experiment over a total of m experiments:
m (experiments) | X>4|$\Psi_1$ | X>4|$\Psi_2$ | X>4|$\Psi_3$ |
---|---|---|---|
1e+01 | 14 | 61 | 85 |
1e+02 | 77 | 100 | 100 |
1e+03 | 100 | 100 | 100 |
1e+04 | 100 | 100 | 100 |
1e+05 | 100 | 100 | 100 |
1e+06 | 100 | 100 | 100 |
There’s a 14% chance if you test one person with no psychic power 10 times, or 10 individual with no powers, at least one of the experiments will have a score greater than 4 hits. If you test 100 times (across or within people), there’s a 77% chance. Once you get to 1,000, not really a large number, it’s almost certain at least one person will do well on the test, even without psychic powers.
Here’s where it gets a little trickier, because counting becomes difficult. Suppose we conduct 100 experiments: 10 on 10 people, all 100 on one person, etc. We entertain all three hypotheses, and we’ll use your maximum entropy prior, which assigns equal likelihood to each hypothesis.
Now suppose that in these 100 experiments, each of 52 guesses, we had just 1 person score a 4. The rest scored 0s, 1s, 2s, and 3s, in any combination. That’s a lot of different combinations! For instance, the first is in 99 experiments all people got exactly 0 hits, and in 1 experiment one person got 4 hits. And so on, in all the ways we can realize 100 experiments with just 1 of 4 hits. That turns out to be multinomial (which we covered a long while back).
We next want the probability, given all this and for each hypothesis, that we get this outcome. That is
about 0.41 for $\Psi_1$, 3.5 x 10^-6 for $\Psi_2$, and 3.6 x 10^-18 for $\Psi_3$.
Finally, given your E_2 (max entropy), we want the probability each hypothesis is true, given all this information. That is about 0.99999 for $\Psi_1$, 8.5 x 10^-6 for $\Psi_2$, and 8.7 x 10^-18 for $\Psi_3$.
These are the posteriors for your generous starting belief.
“But Briggs, this sounds like P-values in a way. What if our guy got two 4s, or some other combination, like one 10, maybe. Then what?”
Then you made that calculation. You take the exact results you had and do calculate the probability of that. You had so many 0s, so many 1s, and so on, one 4 and and one 10. You use the exact numbers in the experiments and put those into the multinomial calculation. Then you calculate the posterior belief in each hypothesis as usual. There is nothing P-value-like about any of this.
We each time use the exactly evidence we have and none other. For purposes of illustration I had to pick something, so I imagined the scenario with just one 4.
For instance, suppose in 100 experiments we had experiments with 0 hits 34 times, 1 hit 37 times, 2 hits 25 times, 3 hits 1 time, 4 hits 1 time, 10 hits 1 times, and all other hits 0 (a result I got from running rmultinom in R, and adding in the one with 10 hits). The chance of this outcome for each of the hypotheses is 3.6 x 10^-11, 1.9 x 10^-22, and 4.2 x 10^-48 respectively.
Note each of these outcomes is not too likely. But that’s only because there are a huge number of possible outcomes of 100 experiments. We don’t really care about the absolute probabilities of the hits anyway, but their relative contributions to the posteriors. Because we want, we always want, you should always want, the probability each hypothesis given all information.
Here the posteriors (again beginning with max entropy) are to order of magnitude 1, 10^-12 and 10^-37. Meaning 1 experiments with 10 hits is not exciting evidence for psi abilities. You’d conclude no powers here.
If instead of 10 hits, suppose the person scored a perfect 52 hits on 1 test, and got the same as just noted for the other experiments, then probabilities of this outcomes are (to the nearest order of magnitude) 10^-93, 10^-91, 10^-109 respectively (now we can see why Jaynes advocated casting probabilities in dB). The posteriors are then 0.02, 0.98, and 10^-18 respectively.
We would be nearly convinced psi is real to the tune of $\Psi_2 = 2/52$ with one perfect score if we ran 100 experiments.
“But Briggs, this doesn’t sound like it would work if we tested people with varying powers.”
Right, we’d have to have per-person within-a-group model, which is more work, but it’s no problem. That’s the point. It’s always the same. We calculate the probability of the hypothesis given the evidence we have, and nothing else.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use PayPal. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.
Discover more from William M. Briggs
Subscribe to get the latest posts sent to your email.