Class 73: How To Tell If You Have Bad Dice

By Briggs December 4, 20259 Comments

Class 73: How To Tell If You Have Bad Dice

This is the most in-depth, complicated Class yet. But it has everything. Every probability problem turns out to be just like asking Pr(Dice loaded|Evidence). I hope you stick by this one. It’s not easy.

Video

https://youtu.be/q9jTL8yZdZA

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Given below; see end of lecture. Do the code!

Lecture

An important question from a long-time reader, about how you can tell if they dice you’re using, and possibly risking money on, are shaky. The question is not only about dice, of course, but all probability and the hope of discovering cause.

Williis Eschenbach: This is my third time asking, and I’m beginning to think you’re avoiding me.I have a single die. I want to determine if it is “loaded” to prefer one face. I can roll it as many times as I want.What mathematical statistic can I use to determine if it is loaded? I say the p-value, but you say that’s useless … so what is useful?

The answer is: Pr(“loaded” | Evidence accepted).

That is the solution to every probability problem, where “loaded” is changed for whatever you want to know.)

Willis is not satisfied by the answers I have been giving, because, I’m guessing, he like many wants a simple formula that can be applied to situations like this. That’s what P-values do for you: give you a straight and simple procedure and answer.

The wrong answer, a fallacious answer, but an answer. One that “because math” looks like Science. And here I am, repeatedly telling Willis that he has to decide for himself. That sounds like an attempt to flee responsibility, an avoidance of Science.

Alas, it is the right answer. There is no unique formula. The answer to Willis’s question is “It depends” on what he means by the evidence in Pr(“loaded” | Evidence). And must be that, because, you will recall, all probability is conditional on the evidence we consider, and that decision is not probability.

If we can keep those two truths in mind, we have mastered all probability.

To see this, let’s take a look at Wokepedia’s entry on the Law of large numbers.

In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists.^[1] More formally, the law of large numbers states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

The law of large numbers is important because it guarantees stable long-term results for the averages of some random events.^[1]^[2] For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game.

This was written by someone, or some people, who believe probability is a real thing, a property, perhaps even a causal force. It is wrong: all of it. Those who followed the Class from the beginning ought to be able to say why. If you haven’t, let’s see why, and see why Willis won’t get what he’s looking for in simple formulae.

What the law of large numbers really says is that averages of certain numbers tend to a limit. A “limit” always speaks of “the long run”, the time, which Keynes reminds us, when we shall all be dead. Here’s another series that goes to a limit:

$$\sum_{k=0}^n a r^k.$$

Both a and r are constants here, mere numbers. This sum (a kind of average) goes to a/(1-r) at the limit, as long as the absolute value of r is less than one. No one would pause over this, or give it any sort of mystical interpretation. What we might do, if we ran across this series in any decision we needed to make, is to say “a/(1-r)” is close enough for whatever finite n we have, for our n will always be finite.

There is no answer for how big n has to be for the limit approximation to be “good enough” (the limit is the approximation here). That depends on what we’re doing with this equation and its approximation (the a/(1-r)). My “good enough” might not be your “good enough.” To say there ought to be a rule, in Science, that “n greater than m count as good enough” would be silly.

And would be taken as silly. Unless we were to give a probabilistic sheen to the (partial) sum. Then, suddenly, it seems we have invoked Powers of Nature! That is what the Wokepedia writer(s) did. Re-read their casino example. Hidden—occult–powers are called forth here. Spins from the future look back to correct the spins of the present, maybe. Or a demon keeps track of the partial sum of spins, so that he knows which way to shade future spins (out at infinity), to make converge to the proper profit-making number obtain. It’s possible, anyway.

Or maybe the wheels knows the individual probabilities (of each slot), which ensures each current spin does what is required of it. The wheel, like all things, is conscious. Probability thus becomes a measure of consciousness.

Or the entire thing is incoherent and none of these interpretations are right. Here is what is.

Take the evidence (not for roulette wheels, but a die, as in Willis example), E_1 = “An object has six sides, differently labeled, that when tossed must show one of these sides.” Let S be “side.”

Then Pr(S=i|E_1) = 1/6, for i = 1, 2,…,6.

Suppose you watch the die and record its falls for some period of time. You will record counts m_i for each i (with m being the vector of counts) . Now what is the answer?

Pr(S=i|E_1, m) = ?

It’s 1/6. It does not matter if, for n spins, m_1 = n, and m_2 = 0, m_3 = 0, …, m_6 = 0. It doesn’t matter what you observed, Pr(S=i|E_1, m) remains 1/6 forevermore. Because there is nothing in E_1 that tells how to tie m to S.

Suppose instead your evidence is E_2 = “For some large but finite n, there is a device which will record n_i = m_i counts of states for i = 1…6 states, with n = n_1 + n_2 + … + n_6.”

Then

Pr(S = i | E_2, n=1) = 1/6, for all i.

Same probability. Though now after observing m counts, probabilities of future tosses can be different. For instance,

Pr(S = 1 | E_2, n=m) = p_1,
Pr(S = 2 | E_2, n=m) = p_2,
…
Pr(S = 6 | E_2, n=m) = p_6,

where all that is required is that the sum of p_i = 1. The complexities of the inferred model which ties the m to E_2 are not written out here, you understand, but models like this were discussed in origins of parameters.

At what point from departures of p_i = 1/6, if any, do you declare “This die is loaded”?

Well, what does “loaded” mean? One definition is that one side, or sides, are preferentially (or detrimentally) weighted. Call the evidence that this is so E_3. Then

Pr(S = 1 | E_3) = q_1,
Pr(S = 2 | E_3) = q_2,
…
Pr(S = 6 | E_3) = q_6,

where the q_i have to sum to 1, but where we require that at least one q_i is different than the others.

Understand very carefully that E_3 is not that the probabilities equal q (as a vector), but that the die itself has whatever properties required that cause certain sides to come up preferentially. We deduce q from those properties, whatever they might be.

What happens if we add m, i.e. our observations? This:

Pr(S = 1 | E_3, m) = q_1,
Pr(S = 2 | E_3, m) = q_2,
…
Pr(S = 6 | E_3, m) = q_6,

i.e. nothing happens to the probabilities, because our evidence E_3, which is a causal piece of evidence, specifies the probabilities, which are deduced. Just like with E_1, the probabilities don’t and can’t change with observation. They only do with E_2.

If there are other definitions of “loaded”, they are handled in the same way. For the moment, E_3 is sufficient.

What Willis seems to want is this:

Pr(E_3|m,B),

which is the probability the “loaded” hypothesis is true given our observations m, and background information B. We’ve always had this B, or forms of it, but I’ve been suppressing its notation. At the least, B is the implicit and tacit evidence of what all the symbols and words mean and things like that. But here it means there are other hypotheses beside E_3 under consideration. After all, if E_3 was the only hypotheses we consider, then it is and must be (locally) true. Pause and reflect.

Another hypothesis might be that the die is “fair”. Now what could this possibly mean? Be careful.

It must mean that the die is perfectly symmetric along any dimension, weight etc., that is in any way causally relevant to the sides that come up in tosses.

But it must also mean that that manner of tossing is in some way symmetric along all possible dimensions. And that the milieu in which the tosses happen—the table top, friction, moisture, and so on—is also symmetric along all possible dimensions. Dice don’t toss themselves nowhere.

Let’s call this perfect symmetry hypothesis—perfect symmetry in the die, the manner of toss, and the milieu—E_4. Obviously, this is a causal hypothesis. It says the causes and conditions work toward generating “fair” outcomes.

If we knew each of these causes and conditions—call that happy state of knowledge C for knowing all causes and conditions—then our probabilities would be for any single toss:

Pr(S = 1 | C, m) = 1,
Pr(S = 2 | C, m) = 0,
…
Pr(S = 6 | C, m) = 0,

where here we suppose our knowledge C told us S = 1 had to happen, from which we deduce the other states were (locally) impossible (C changes from throw to throw). It’s also true that if we know the causes, all previous knowledge in m is irrelevant. Do not think such a situation is impossible. It works perfectly well in all areas of life. Including coin flips where you know the causes!

But we also know:

Pr(S = 1 | E_4, m) = 1/6,
Pr(S = 2 | E_4, m) = 1/6,
…
Pr(S = 6 | E_4, m) = 1/6,

where the perfect symmetry says each side is equally likely, but where just knowing symmetry doesn’t tell us the cause in any particular throw. And again m is irrelevant. Recall the Gambler’s Fallacy. Gambler has seen 6 come up “too many” times lately, and says that another number is “due”. If E_4 holds, this gambler will not fare well.

“But Briggs, if I see nothing but 6s, for instance, then I know there’s a problem!”

Indeed. But that does not change that Pr(S = i | E_4, m) = 1/6 for any i = 1…6. If you see only 6s, then that is evidence that E_4 might be false, but it does not change the probabilities conditional on E_4. Nor on E_1, nor etc. You simply must remember what side of the equation you are on.

Again, Willis seems to want

Pr(E_3|m,B),

or perhaps

Pr(E_2|m,B).

But what is B? Well, what rivals to E_3 or E_2 are we considering? The first hypothesis E_1 now seems redundant in the face of E_4. Plus there is no way to link new evidence to that hypothesis. The probabilities given E_1 are always 1/6, regardless of any observation. If we know the causes C, then we know the causes.

Knowing the causes was one of the unsatisfactory answers I gave Willis. If you crack open a die and see the “load”, or learn about it via some other means (somebody talked), then Pr(E_3|mB,see load) = 1.

Now here’s where it gets tricky, because not only do you have to remember what side of the equation you’re on, but what precise information you’re conditioning on, and what you’re not conditioning on.

Call the following evidence about physics K (I don’t want to use P to confuse it with probability). K = “E_4 is impossible given any real die, any real manner of throwing, any real milieu”. Let B = “Either E_3 or E_4 is true” (or possibly with E_2 for E_3). Then

Pr(E_3 | m,B,K) = 1

and of course

Pr(E_4 | m,B,K) = 0.

We’re done! It doesn’t matter what m is, neither. That’s the answer. Because K is, given our (again) background knowledge about physics, surely true. Perfect symmetry in cause and condition of dice throws evades us. It cannot be reached. Thus every toss is necessarily “unfair”, if we condition on K.

This, though true, doesn’t seem especially helpful. But it is. It tells us (what I told Willis) that since we know K is true, or if we condition on K, then because we know causes and conditions won’t be symmetric, some form of E_3 has to be true.

Of course, that doesn’t mean that we can’t by assiduous care make those causes and conditions mostly symmetric. Call this diligent assiduity knowledge (which incorporates K) J, which is also causal knowledge. Thus we might arrive at

Pr(S = 1 | E_3, m, J) = 1/6 + epsilon,
Pr(S = 2 | E_3, m, J) = 1/6 – epsilon,
…
Pr(S = 6 | E_3, m, J) = 1/6 – epsilon,

or something similar (and the same with E_2), where the probabilities are “close enough” to 1/6 for it not to matter to us in any way that we deem relevant.

As above, be careful: E_3 i& J (together) is a causal hypothesis: it says something about the die itself, and how and where it is tossed. It has information about the properties of the toss from which deduce the probabilities. The probabilities themselves are not the hypothesis. Though many lapse into shorthand saying they are, which can be most dangerous, because it gives the false idea probabilities are real things.

Again, “close enough” for me won’t necessarily be the same as “close enough” to you, or the casino. In casinos J will be true. J won’t be true for (I almost said “dime store”) cheap dice, and I recall old videos of certain 20-sided dice geeks play with had odd measurements, which departures from symmetry caused them to have large “epsilons”.

“Yeah, yeah, yeah. I get it, Briggs. But that’s a lot of words, and you still haven’t told me what to do. I’m seeing nothing but 6s. What test do I use?”

You’re seeing nothing but 6s and you need a test? What for? Peer review?

No, no. Don’t object. I take your point. Look: we know J is true in casinos, and maybe expensive dice and settings. We know it can be false in cheap dice, or cheesy settings (the manner and milieu still count). When J is false, we know K is false, because perfect dice and situations aren’t going to exist. Imaginary dice—for probability is not limited to real things—can be perfect. But we’re not interested in that here.

We can now see that E_2 must be our model, given J or its contrary, for real dice. E_1 is out, E_4 is impossible, and E_3 says we know what the load is. But it doesn’t do us much good to know that. So we are left two different hypotheses we haven’t really considered up to now. Call the E_g an E_b, for good and bad.

E_g means the properties of the dice toss are “close enough” to symmetric that the probabilities we deduce from those properties are “close enough” to 1/6 per side that we call the die good. The contrary is that we are “too far” from symmetry that the probabilities we deduce from these properties differ from 1/6 “too much”, so that the die are bad.

Both E_g and E_b are causal hypotheses. The probabilities are not the hypotheses: those probabilities are deduced from the properties of the dice. A “loaded” die is obviously a “bad” die.

Most would say a die which only shows 6 is bad. If we use E_2 & J as our model then we might after m compute something like

Pr(S = 1 | E_2, J, m) = epsilon,
Pr(S = 2 | E_2, J, m) = epsilon,
…
Pr(S = 6 | E_2, J, m) = 1 – 5*epsilon,

But that’s not directly helpful, because it doesn’t say whether these departures are good or bad. We need a definition of good or bad for that.

At last we come to our answer. (Gasp.)

We need the idea of “good” and “bad”, where bad is some “departure from uniformity”. We have E_g and E_b. Both of these hypotheses incorporate J, which itself incorporates K. So we don’t need to keep writing them.

Here is one good deduced distribution from a die’s properties: (1/6,1/6,1/6,1/6,1/6,1/6). That, of course, is uniformity. But since J is true, we might accept as good something like the deduced-from-properties (1/6-a,1/6+a,1/6,1/6,1/6,1/6), for some “small” a. Now a should be very small indeed; casinos are not likely to tolerate any “noticeable” a. But, give J, we can see there are a myriad other possible good or acceptable hypotheses: deduced-from-properties (1/6,1/6+a,1/6-a,1/6,1/6,1/6), and many more beside.

Notice always it is the properties of the dice that are the hypotheses, and not the probabilities we deduce from those properties. Though it can be helpful shorthand to speak only of the probabilities, it is always dangerous, as mentioned above.

The same is true for bad distributions, which are, mathematically, good distributions gone wrong. One such deduced-from-properties is (1/6-b,1/6+b,1/6,1/6,1/6,1/6), for some not “small” b. Consider b an a gone bad, or grown up. And so on.

We really have quite a large and complex problem on our hands, because of course the number of ways dice can go bad is very large indeed. We want the probability any good hypothesis is true, which gives us the probability any bad is hypothesis is true.

ALERT! As far as I know, nobody has worked out the solutions for this at continuity. The problem is ripe for “research” (it pains me to use this word). Remember, the limit is an approximation to the real finite situation we’ll find ourselves in. It is no the other way around: the finite is not an approximation to the limit, which doesn’t exist in actuality.

Let’s suppose a = 0; i.e. we’re a casino and intolerant of departure from uniformity, even though we acknowledge the die-manner-milieu guarantees symmetry is impossible. Still, a may be so small that we can treat it as 0. So let’s try that.

That gives the deduction:

E_g = (1/6,1/6,1/6,1/6,1/6,1/6)

What to pick for E_b? We’re still left with a lot of hypotheses. Many of them, in the context of J and casinos are absurd, though. No loaded die which gives, say, nearly all 6s will escape notice. Casinos are interested in bad dice manufacturing, cheaters, and flawed tables. Willis wasn’t interested in the latter two, only bad dice.

One bad die is to edge the weight of 6 up, which subsequently downweights its opposite (1). Thus one bad dice hypothesis is a 6-load, from which we deduce these probabilities:

E_b = (1/6-a,1/6,1/6,1/6,1/6,1/6+a)

for some a > 0 (but with the obvious limitations; remember, that a is deduced from the loaded die’s properties!). For fun, let’s try a = 0.001. That may not seem large, but to a casino using dice from a bad set like this, they would notice, and would lose money (rather, make less).

We have our B, which says either E_g or E_b is true. And nothing else. Remember, it is we who bring cause to data. If all we wanted was the probabilities of future rolls, then we already had E_2 above. E_2 is rather an admission of “loading” if we join it to J. So that we’re not using just plain E_2, but E_2 & J. The same is true for both E_g and E_b, at the casino, as we saw.

We still one more thing from B, which, in the face of J, is some idea before we take data how likely either of these hypotheses are. Fifty-fifty is absurd. A lot goes into J, so the casinos are pretty sure what they’re getting is good, even before testing. How about Pr(E_g|B) = 0.999? Which makes Pr(E_b|B) = 0.001 (recalling B incorporates J). That even sounds too high for E_b at casinos, but is good enough for illustration.

If you think J is false, then something more like 50-50 could be better for E_g and E_b. It is whatever evidence you think counts that matters.

Now we take some data. We rule out bad manner and milieu: not that we have evidence against them in the roll; we just refuse to consider them here. Suppose we get counts (for sides 1…6)

m = (1,2,3,4,5,6)

That is, an n = 21.

And finally, at long last, comes the conditional answer to Willis’s question (recalling the answer is necessarily and always conditional):

With our m and “prior”, and assuming a multinomial distribution (implied from the premises assumed), we get

Pr(E_b|mB) = 0.00103.

Rather dull the answer turns out to be Bayes’s theorem, yes? But also not in the least surprising. We proved that within the first few weeks of Class.

Now this 0.00103 is a departure, and an increase, from the 0.001 we began with. Is it a large enough increase to “do something”? That depends on the casino and the troubles they face. There is no universal cut off. There is, and ought never to be, an answer.

Still, we’ve only taken n = 21 observations. Doesn’t seem like a lot, which is why the departure isn’t very big. Let’s multiply by 10, so that n = 210 and suppose we saw:

m = (10,20,30,40,50,60)

Thus

Pr(E_b|mB) = 0.0013.

That may or may not seem like a large departure from 0.001, depending on your decision. A casino might think it large. You playing a board game at home with the family won’t care.

With this weird m, though, it seems we have a third possible hypothesis (relabeling the first bad one E_b1; a recall this is shorthand):

E_b2 = (0.048, 0.095, 0.143, 0.190, 0.238, 0.286).

We can either now ignore E_b1, or still consider it. Let’s try ignoring it at first, reusing the prior so that Pr(E_b2|B) = 0.001, and leaving Pr(E_g|B) = 0.999.

Then we get

Pr(E_b2|m(n=210)B) ~ 1.

And we would have calculated

Pr(E_b2|m(n=21)B) = 0.0015.

I hope the notation is obvious.

A ready hypothesis for any m is always that E_m from which we deduce the observed frequencies. That will always give the highest Pr(m|E_mB) among ordinary hypotheses (there are always hypotheses that say we would have got exactly what we did get, and these always give Pr(m|E_oB) = 1, where “o” means oracle).

Evidently, we wouldn’t have, absent m, given much weight to E_m—which we’d reflect in B in the “prior”. I use scare quotes to show that there is nothing wrong with considering post-hoc hypotheses, but you have to remember just what information you are conditioning on at all time. The time any information arrives is itself irrelevant, but where you put it is everything. In B there is no m, i.e. no observations, and we must fashion our “prior” supposing we do not know m.

Let’s suppose we would have considered that E_m a one in a million shot, and the other “priors” are adjusted accordingly. Thus

Pr(E_g|B) = 0.999
Pr(E_m|B) = 10^-6
Pr(E_b|B) = 1 – .999 – 10^-6

With our large m(n=210), we get (modifying Bayes in the obvious way; i.e. adding in the extra hypothesis in the denominator):

Pr(E_g|m(n=210)B) = 2 in a million
Pr(E_m|m(n=210)B) = 0.999998
Pr(E_b|m(n=210)B) = 2 in a billion.

It all comes down to this: what hypotheses are you considering? Here we can easily consider E_g as a reasonable approximation to dice throws we know won’t be perfectly symmetric, and where we know we won’t know the exact causes and conditions of each throw (to be able to predict perfectly).

But what range of E_b ought we to consider? If we have J in E_g, we can give Pr(E_g|B) = 0.999, or something similar. Thus, we can collectively give all E_bj a prior of 0.001; i.e. sum_j Pr(E_bj|B) = 0.001. Bayes’s theorem is easily modified by including these E_bj in the denominator. Nt all E_bj need to have the same weight, of course.

The problem becomes combinatoric, which might be happily approximated by assuming continuity. We want to have the deduced probabilities from any E_bj to have the form a = (p_1, p_2, p_3, p_4, p_5, p_6), where sum p = 1, and where all p_i do not equal 1/6. Real dice will only allow discrete and finite departures from 1/6, but if we allow, saw, departure steps of 0.001 (or maybe even 0.0001), then the problem becomes large fast.

I did that. Let p = (1_1, p_2, p_3, p_4, p_5, p_6), a vector, where always sum_i p_i = 1$ and where $0 < p_i < 1$ for all i. Let a = 0.01. Subject to these constraints, we generate all possible vectors q in increments of a, excluding uniformity. This set of q all indicate “loaded” conditions.

For instance, once such q = (0.01, 0.01, 0.01, 0.01, 0.01, 0.95). That we be the deduction for a die loaded toward 6 alone. And so on for all the other possibilities.

If we let a = 0.01, there are about 72 million possible q. If we instead let a = 0.02, we’ll get less resolution, and have 2 million possible q, but we greatly speed up the calculations. You have to decide what kind of dice you think you’ll be dealing with. How far from symmetry they’ll be, I mean.

There is no one right universal p-value like answer. It does not exist.

That’s the probability the die is good. For bad, take 1 – Pr(E_g|mB).

With m = (1,2,3,4,5,6) x 2, and with a= 0.01, Pr(E_b|mB) = 0.00063.

With m = (1,2,3,4,5,6) x 2, and now a = 0.02, then Pr(E_b|mB) = 0.00074.

So the choice of a here doesn’t matter much; probability. Whether that difference is “large” is entirely up to you and the decision you would make. You might even change the prior so that more extreme p are downweighted.

With m = (1,2,3,4,5,6) x 10, and a = 0.02, then Pr(E_b|mB) = 0.99998. So the weird m would indeed indicate a bad die.

The answer turns out to be as promised: Pr(“loaded” | Evidence accepted.)

Here are the various ways to support this work:

Subscribe at Substack (paid or free)
Cash App: $WilliamMBriggs
Zelle: use email: matt@wmbriggs.com
Buy me a coffee
Paypal
Other credit card subscription or single donations
Hire me
Subscribe at YouTube
PASS POSTS ON TO OTHERS

Here is the code:

# Incidentally, some of this is from Grok. See the video. It seems to give the right answer, but only the Lord knows. There are also obvious simplifications that can be made, but I left it like this so it's easier to read.

install.packages('RcppAlgos')
library(RcppAlgos)

strict_simplex_grid <- function(a = 0.02, n = 6) {
  total_units <- as.integer(round(1 / a))
  if (abs(total_units * a - 1) > 1e-12) stop("1 must be integer multiple of a")

  sum_y <- total_units - n
  if (sum_y < 0) stop("a is too large: cannot have n strictly positive parts")

  # Stars and bars: choose positions of (n-1) bars
  bars <- comboGeneral(sum_y + n - 1, n - 1)

  # Build the y vector (gaps between bars)
  y <- matrix(0L, nrow = nrow(bars), ncol = n)
  y[,1] <- bars[,1] - 1
  if (n > 2) {
    for (j in 2:(n-1)) y[,j] <- bars[,j] - bars[,j-1] - 1
  }
  y[,n] <- sum_y + n - 1 - bars[,n-1]

  q <- (y + 1) * a
  q_df <- as.data.frame(q)
  colnames(q_df) <- paste0("q", 1:n)

  cat(sprintf("Generated %d vectors on the %.3f-spaced simplex (dim %d)\n",
              nrow(q_df), a, n))
  return(q_df)
}

# Example
q_df <- strict_simplex_grid(a = 0.02, n = 6) # about 2 million
#q_df <- strict_simplex_grid(a = 0.01, n = 6) # about 72 million

# in case we generate E_g, take it out of E_b
w = q_df==1/6
i = which(rowSums(w)>0)
if(length(i)>0) q_df = q_df[-i]

n = dim(q_df)[1]
b = 0.999
prior=c(b,rep((1-b)/n,n))
# sum(prior) # should = 1
# Pr(E_g|B) = b
# Pr(E_bj|B) = (1-b)/n for all j

m = c(1,2,3,4,5,6)
 m = m *10 # or * 1 or whatever
 eg = c(1/6,1/6,1/6,1/6,1/6,1/6)
pg = dmultinom(x=m, prob=eg)

mmnorm <- function(x,m){
  p= dmultinom(x=m,prob=x)
  return(p)
}
pb = apply(q_df,1,mmnorm,m=m)

# Pr(sum(E_bj)|mB); i.e. probability of bad die
(1 - pg*prior[1]/(pg*prior[1] + sum(pb*prior[-1])))

Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

Last updated on November 19, 2025

Briggs

Briggs is an internationally reviled thoughtcriminal, listed as One Of The Top 7 Dangerous Minds by the Hague.

View All Posts

9 Comments

Brian (bulaoren)

December 4, 2025, 7:26 pm

So, after a full day, nobody has commented. Well, I tried to wade through it all but, as William Safire might have said; “M.E.G.O.” (my eyes glaze over).
Here’s my question; How can you tell if the main stream media (including NPR) is loading their dice?
John Pate

December 5, 2025, 3:27 am

This is why I laughed at the nonsense about Casinos using probability. All they care about is the House wins and that’s accounting, not probability. Empirical measurement beats this nonsense every time.

I sat through an hour long presentation by Carl Sargent on his probability calculations used with his Ganzfeld experiments: that’s when I realized it was all nonsense that you could use to “prove” anything you liked – and get a PhD from Cambridge doing it.
Briggs

December 5, 2025, 6:17 am

John,

Now that’s very interesting. I would have loved to have heard that talk.
John Pate

December 5, 2025, 7:49 pm

You would have been in your element Briggs. I was quite heavily into the statistics nonsense at the time due to doing my honours thesis in Parasychology – this would have been not long after Sargent got his PhD. He really went into the weeds to justify P so wee it was truly mind-boggling. It definitely broke me.
Rudolph Harrier

December 6, 2025, 2:02 pm

I think the problem is that many people want to have a consistent process above all else. They don’t want to have any ability for the user to interpret the problem. There are some laudable reasons for this, like a desire to reduced the ability of a biased actor to put his thumb on the scale. Or maybe a teacher wants to be able to expose lower level students to complex statistical issues, with the hope that they realize that this is a restricted technique that can only be used in very special circumstances. But usually the answers are more prosaic: a teacher wants to have one “right” answer to make grading easier, or he wants a process that he can explain through rote memorization to increase the number of students who can learn the material, or an executive gets it into his head that if we just use one method we can hire less skilled workers.

If you have that mindset and you want to answer the question “do these results come from a random distribution or not” you’re going to develop something like the p-value. By the nature of what you are doing you need to have a simple number whose size you look at to say “yes” or “no.” You can’t just calculate the probability of the exact event and have a test that you can use with a straight face, since in most cases any specific observation has low probability and NOTHING would be said to follow a distribution. So you need to relate one specific observation to the probability of an event in a larger class happening, and the simplest way to do that is to use a test statistic. Thus the p-value becomes inevitable.

Of course none of the laudable goals materialize in reality. Biased actors can “prove” basically anything with P-values, and they have. Furthermore, due to the way that the p-value is taught as a process without a good explanation of how the probabilities are actually calculated, these actors can freely change distributions or test statistics and get away with it most of the time (though often you don’t even need to do that if you have enough data to play around with.) Even when students are taught that this is a restricted technique they end up using it universally anyway. (This isn’t unique to p-values; I’ve seen students who were trained to use Cramer’s rule to solve linear systems without explanation of the rule, and those students will argue in upper level classes that if you have a non-square system or a square system with zero determinant that there cannot be any solutions, even when you show them specific solutions to the system.)

But lazy teachers are happy and dim-witted bureaucrats are happy. Unfortunately, those groups are often the groups that shape society.
John Pate

December 6, 2025, 6:59 pm

The problem came when computers made it easy to plug numbers without you really knowing what you’re doing. It became a case of training the barely above room temperature IQ students how to use SPSS and get some fancy graphs and a P value. Nobody knows anything, they just make it up as they go along.
gareth

December 7, 2025, 3:33 pm

Beware anything that says “install.packages(”)”

Only yesterday I went to install an upgrade of some VNC package. “1 to install, 6 to uninstall” it said and (foolishly) I OK’d. Whereupon it uninstalled my (entirely unrelated) wine 32 bit stuff. Fortunately I was able to recover this when I found out today that a useful Win 32 bit app no longer worked. Others might not be so “lucky” – wot’s the chances?

Anyway, now off to watch the lecture…
JH

December 8, 2025, 12:06 pm

With m = (1,2,3,4,5,6) x 10, and a = 0.02, then Pr(E_b|mB) = 0.99998. So the weird m would indeed indicate a bad die.

So, you need to specify the value of a out of the infinite possibilities.

It would be easier to argue that no perfectly fair coin exists, just as no perfect triangle can be found in reality. Alternatively, it might be better to tell people to trust their guts. After all, why trust math or statistics to science or anyone (except Briggs)?

Knowing the causes was one of the unsatisfactory answers I gave Willis. If you crack open a die and see the “load”, or learn about it via some other means (somebody talked), then Pr(E_3|mB,see load) = 1.

How can you open a die and check its load? Could you make a video to demonstrate this? I can provide you with dice for your video, and a cracking tool within a $50 budget.

Yeah, I am just another ultracrepidarian commenting here.
gareth

December 8, 2025, 2:35 pm

Well, having made it all the way through, my takeaways were:

1. The answer to the question, “what is the “probability” of whether a particular die is loaded” is a function of what you mean by loaded and the evidence you consider in evaluating this question.

2. Given you said “…this is the entire class in eighty minutes… every other problem that we are going to do from now on is exactly like this”, I can now go back to watching Japanese Anime of Tank Girls and such like with no fear of missing anything important. {not really ;-)}

3. Concerning Grok and other LLMs: So you have to correct it. Until you no longer spot errors. Whereupon you accept the output?

4. “Clean code … who cares if it’s ugly?” Well, I do, and Our Lord does. As is written: cleanliness is next to Godliness. As is its friend Good Documentation :-)

BTW: on LLMs. I read a post the other day on whether what we see with model breakdown is not limited to AI but also a thing in society. Echo chambers and “You cannot train people on regurgitated data any better than AI.” Food for my thoughts anyway. See “Urban Bugmen and AI Model Collapse: A Unified Theory” on alwaysthehorizon substack.

Tracy Platt on The Loss of Ephemera — Guest Post by The Blonde BombshellFebruary 26, 2026
What I miss even more than the ticket is a day skiing at Brighton for $1.10. Carumba! O.K, i dont…
hudbwu on The Loss of Ephemera — Guest Post by The Blonde BombshellFebruary 25, 2026
P.S. And don't for a moment think we've escaped from the crutches of oblivion that ate all those books! A…
hudbwu on The Loss of Ephemera — Guest Post by The Blonde BombshellFebruary 25, 2026
Here's a video that talks about all the books you will never, ever read. Watch and weep. https://www.youtube.com/watch?v=Fcb2oLSb7Cs
Johnno on Scientists Seeking To Escape Criticism Run To The Warmth & Comfort Of BlueskyFebruary 24, 2026
Not only must we Follow THE SCIENCE (tm), but we must LIKE and hit the Notification Bell to be updated…
Johnno on The Loss of Ephemera — Guest Post by The Blonde BombshellFebruary 24, 2026
NLR - My favorite pet peeve are restaurants where there are no printed menus, just QR codes to the menu…

Class 73: How To Tell If You Have Bad Dice

Video

Lecture

Related

Discover more from William M. Briggs

9 Comments

Leave a Reply

Video

Lecture

Share this:

Related

Discover more from William M. Briggs

9 Comments

Leave a Reply