Anon sent a question about Bertrand’s Paradox. The paradox is supposed to show something has gone wrong with our thinking in probability. And it has, but not in the way its proponents imagine.
There’s a video at Aeon which explains a simplified version of the paradox. Watch it if you can’t follow my written explanation. Here’s their introduction:
The unresolved probability paradox that goes to the heart of scientific objectivity
The principle of indifference states that, without any evidence, all potential outcomes should be considered equally probable. For example, if there’s a 10-horse race then, without any additional information, one should assume that each horse has a 1-in-10 chance of winning. It’s an important epistemological principle at the foundation of probability that might seem as safe and sound as it is obvious. But, as this video from Wireless Philosophy (Wi-Phi) lays out, a paradox first described by the French mathematician Joseph Bertrand in 1889 can make starting from a position of true indifference impossible. And, because probability is at the core of almost every scientific field, this paradox has rippled through science for more than a century, leaving in its wake disagreements, workarounds and, so far, no clear solution.
The simplification goes like this. Imagine a factory that makes boxes, anywhere from 1 foot to 3 feet in length. This is our evidence, from which we must deduce our probabilities. Call it “E” for short.
What is the probability box’s length is between 1 and 2 ft?
Pr(length between 1 and 2 ft| E) = 1/2.
We get that from reasoning the length can be 1 to 3, 2 is halfway, and there we go. The video, and others, say the 1/2 deduction comes from implicit premise which is the “principle of indifference”. Meaning, here, that we are indifferent (whatever that might mean) to boxes of any length, and knowing only E—rather, E augmented by the principle of indifference.
Easy enough, no? Now the twist. What is the probability a box’s area is between 1 and 4 square feet? Well, a box of length 1 ft has 1 ft^2 area. And a box 2 ft in length has 4 ft^2 area. So maybe
Pr(area between 1 and 4 ft^2| E*) = 1/2,
where the * in E indicates we’re guessing.
Alas, no. For consider the areas of boxes can be between 1 ft^2 and 9 ft^2, and the distance between 1 and 9 is 8. And 4 is only 3/8 of the way from 1 to 9. Here, in case you can’t see that:
1 2 3 4 5 6 7 8 9.
So the principle of indifference answer is
Pr(area between 1 and 4 ft^2| E) = 3/8.
Yet asking the chance a box is between 1 and 2 ft in length is identical, and this is true, and must be, with asking the chance a box is between 1 and 4 ft^2 in area. But the probabilities aren’t the same.
Conclusion? Probability is broken! All is subjective!
What most don’t notice is there is yet another hidden or tacit premise that causes all to go awry. The video maker almost noticed it: came damn close. But just as he keyed in on it, he was distracted to other matters.
The problem, or tacit premise, is infinity. Not only is infinity unimaginably huge and mysterious, it is itself of different sizes. Which size are we using in the box example? Don’t know. It’s never specified. But it’s sort of assumed—another tacit premise—that’s it’s not the so-called counting infinity, nor it is power sets of real numbers, or still others, but the infinity of the continuum. One of many infinities, and a common one in math.
Infinity is like a teeming metropolis, in the sense that the road you ride to get there takes you to different neighborhoods. Infinity is not a point, in this sense, but a place. Length Road will lead you to the downscale 1/2 neighborhood, whereas Area Boulevard brings you to the tonier 3/8. The trick to get the same answers is to take the right road.
Now if you don’t understand that, think of it this way. No possible factory can cut an infinite number of lengths, no matter which infinity you’re using. All possible box lengths are finite and discrete in actuality. And we can certainly only measure to finite and discrete levels.
Let’s take a side-on close up of a box’s material that is three feet in length, here drawn using my masterful Gimp skills.
For whatever reasons of physics, mechanics, and materials, the factory can only cut lengths of 1 ft, 2 ft, or 3 ft. That is, the box that comes out can be measured down to some finite, discrete level, which here turns out to be 1 foot chunks. There is nothing special in this number; I could have made it thousandths of a inch, or millionths, or whatever, as we’ll see. But whole numbers are easy to work with.
We need to augment our E with knowledge of our finite discrete limitations. Call that evidence, for shorthand, F.
What is the probability box’s length is between 1 and 2 ft?
Pr(length between 1 and 2 ft| F) = 2/3.
What is the probability a box’s area is between 1 and 4 square feet?
Pr(area between 1 and 4 ft^2| F) = 2/3.
Let’s walk through the answer using a modified picture of our box (I am available for all art awards).
The thin blue lines indicate the only possible box length cuts. We can, due to the limitations noted, only cut a length of 1 ft, or a length of 2 ft, or a length of 3 ft. Between 1 and 2 ft (inclusive) is 2 out of 3. And the probability is 2/3.
The thin red lines indicate the only possible box area cuts. We can, due to the same limitations, only cut an area of 1 ft^2, or an area of 4 ft^2, or an area of 9 ft^2. Between 1 and 4 ft^2 (inclusive) is again 2 out of 3. And the probability is again 2/3.
Both match, and there is no crisis.
You will have noticed that neither probability is 1/2. This is crucial. Here’s a third box, which is probably more realistic upon viewing our box in a microscope.
There was nothing in our premises that said the cuts must be perfect and uniform. Real materials have flaws and inconsistencies. In this case, we can get lengths of 2 ft or 1 ft, maybe with some sanding at the end. Three ft is easy enough, too. But boxes are out for all but two lengths.
Looks like we can get a box with length 2.5 ft, with corresponding area 6.25 ft^2. And we can get a box with length 3 ft and area 9 ft^2. No other boxes look possible—without surgery.
What is the probability box’s length is between 1 and 2 ft?
Pr(length between 1 and 2 ft| F’) = 0.
What is the probability a box’s area is between 1 and 4 square feet?
Pr(area between 1 and 4 ft^2| F) = 0.
Neither can be done.
Well, this is a crude picture, which we can refine by sharpening our saws and perfecting our sanding. In the end, though, we will still be left, no matter what, with a box that is constructed out of finite discrete measurable parts.
That being so, there will never be an inconsistency in probabilities. This is easy to prove. Suppose, without (as they say) loss of generality, the box comes in chunks of 1/n, and that all chunks are uniform. Let’s skip writing the units, as they don’t really matter.
We can get boxes of length 1/n, area (1/n)^2, or length 2/n, area (2/n)^2, and so on, up to length 1, area 1^2 (multiplying this by a constant, like 3 to match the first example, doesn’t change anything). As long as the length and area in the probability question is in this set, then there is never be a paradox. (You can’t ask for lengths .3/n or 1.7/n or whatever, because these can’t be made.)
Call our new information G.
What is the probability box’s length is between 1/n (the minimum) and 1/2 (the middle)?
Pr(length between 1/n and 1/2| G) = 1/2.
What is the probability a box’s area is between (1/n)^2 and (1/2)^2, which is the equivalent question (as it must be) in whatever units we use?
Pr(area between (1/n)^2 and (1/2)^2| F) = 1/2.
All found by simple counting.
Now let n grow, and grow as large as you like, except don’t let it hit infinity. Let it be 10 to the 10 a million times, and raise all that to the power of 10 to the 10 a million times, and keep doing that 10 to the 10 a million times, and then multiply all this by 2.
This, you will agree, is a very large number. More than there are particles in the universe. No boxes can be made this big. But, mathematically, there is no difficulty. And there is no paradox. All the probabilities work and match.
As large as number is, infinity is still, well, infinitely far away. To get there and save probability, you have to take a road that allows the length and area to stick together. Allow them to separate, even for an instant, and the whole thing is wrecked.
Incidentally 1: Jaynes provides an elegant example of the path to infinity in his Probability Theory: The Logic of Science, and I describe this situation in more detail in my own Uncertainty.
Incidentally 2: most problems in statistics, like parameterized models, priors, and all that, suffer the same kind of travels to infinity made in Bertrand’s paradox.
Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.
Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here
Categories: Philosophy, Statistics
A HA! So the “Core Scientific Principles” depend on the “Language We Choose” (re David Hume and Jonathan Weisberg conclude). So the Wakandians and Wokelings were correct after all.
Just going to top up my meds and have another read. Great artwork too!
This is what happens when someone who is “good” at math attempts to explain it to someone who actually needs to have it explained to him.
Other folks have problems with infinities, too. Curious to hear you take, if you care to share it.
Fantastic explanation! thanks.
Is this an example of where Monte Carlo simulations can be misleading? There’s a part of me that wants to say: in the video they simulated it with finite steps!! But the infinity is snuck into the assumptions of setting up the monte carlo. The model tells us what we told it to tell us!
Sorry! I watched a different video where they simulate the original circle version of the problem.
This rolls over into the problems with differing types of averages, measurements that are really ranges and not integers, and spoils all those supposed global single-number values to several decimal points.
“Not only is infinity unimaginably huge and mysterious, it is itself of different sizes.”
An insane premise right up there with singularity and black holes…mental detritus.
Probability is going the way of all those useless Skinner Box experiments; what’s
next Dr. Who’s infinite improbability? I regret that last comment someone’s gonna
run with it and tranny-splain why it’s all so like some hermaphroditic creator god.
Isn’t the issue here that in the finite examples we are using a “naive” finite probability of one over the cardinality, while in the infinite case we use 1 over the total measure (i.e. length for the sides and area for the whole square)? The assumption is that measure can be used just like cardinality can, but that’s clearly not true.
Since the squaring function is injective the image is always going to have the same cardinality of the domain, and more to the point if we double the cardinality of the domain we will always double the cardinality of the image. (All of this remains true in the infinite case as long as we are aware that “doubling” infinity doesn’t change its cardinality.)
But the squaring function clearly doesn’t work the same with respect to measures. If double the length of the domain we will multiply the area of the image by 4.
So how I see it is:
-Using the principle of indifference on a finite set of events of size n to say, absent more information, that each event has probability 1/n: Perfectly fine and doesn’t lead to problems.
-Using the principle of indifference on an infinite probability space of measure m to say that the probability follows a distribution function that signs a probability density of 1/m to each point: pretty sketchy, leads to contradictions, and I don’t think it is even very intuitive.
I’m feeling extremely tempted to buy you a graphics pad and pen. Then I realize it’s you, so that wouldn’t help.
How is the Principle of Indifference different from brute ignorance? Why is brute ignorance “at the core of almost every scientific field”?
What we have here is the deification of probability as a kludge to excuse ignorance. Throw some bones, read the entrails, call it science.
Matt – might I suggest draw.io / diagrams.net
Very easy to diagram out a simple picture and then export to SVG.
Also this is available with it:
So my intuitive thought that I need to know how many boxes are being made before trying to detwrmine the probability is proven true.
There’s nothing wrong with the principle of indifference as a starting point. For example, suppose that we know that there is something that we want to measure, but all we can tell is that it is between 1 and 2 (inclusive). (Maybe this is a length we are measuring and that’s the best that we can get with the precision of our instrument.) If we want to use it in later calculations we need an approximation. If the only information we have about it is that it is in between 1 and 2 then the most reasonable approximation is to take a value of 1.5. This minimizes the maximum possible error to .5. If we had chosen 1 or 2 as an approximation we would have opened up the possibility of an error of 1. Now if the value actually is 1 then clearly using a value of 1 rather than 1.5 leads to less error, but if there is no more information beyond it being in between 1 and 2 then there is no reason to favor 1 over 2 or vice versa, hence an approximation 1.5 to minimize potential error makes the most sense.
If we are in a situation where all we know is that “something happened, and it happened in one of exactly n ways” then it makes sense to assign a probability of 1/n to each way it could have happened. In most cases we actually know more than the number of possibilities, but if that is really all we know then there is no way to favor any possibility over any other and the safest starting point is to treat all of them as equally likely.
(There is a difference between the two examples in that lengths actually exist while probabilities do not. But the overall idea is the same in terms of how the reasoning is done since probabilities are numerical expressions of our uncertainty.)
Where things get iffy is how we should consider “equally likely” when there are infinitely many possibilities. Weird things happen when we start going to infinity and we shouldn’t rely on our intuition for what formulas seem the most obvious or straightforward.
The historical version of Bertrand’s paradox is to ask “if we fix an equilateral inscribed in a circle, what is the probability of getting a chord whose length is greater than the sides of the triangle?” By using three different derivations the probabilities of 1/2, 1/4 and 1/3 are reached.
-If we say that a chord can be chosen “randomly” by first selecting a radius for it to be perpendicular to and then choose a point for it to intersect that radius, then there is a probability of 1/2.
-If we say that a chord can be chosen “randomly” by selecting its midpoint. If we’re talking about the midpoint of the chord then there is only one possible chord, so we have the chord. In this case the probability is 1/4.
-If we say that a chord can be chosen “randomly” by selecting its first point on the circle and then selecting the second, then the probability is 1/3.
The most obvious way to resolve this is to say that there are no “random” chords in a circle without qualification. We can “randomly” produce a chord with some selection process, but there are multiple such processes which lead to different probabilities.
If we are (rightfully!) distrustful of infinity we could express each of these selection processes in terms of limits of finite processes. For example, in the last approach we might start by picking n points equally spaced along the circle (where n > 1.) Once we have selected a first point, there will be n-1 possible points which can be chosen second and we can assign a probability of 1/(n-1) to each of them (or a probability of 2/(n(n-1)) to each possible chord; the 2 coming from the fact that the chord from A to B is the same as the chord from B to A.) We can then count the number of chords that are longer than the side of an equilateral triangle to get the probability for that finite case, and take the limit as n goes to infinity and get 1/3, like the probability given in the third case.
If we wanted to instead use the midpoint for our probability calculation, we would need to have a process where we put a lattice of equally spaced points in the circle and then derive the chord from that point. If you do this for each set of finite points and take the limit as the number of points go to infinity (or equivalently let the distance between points go to 0) then the probability will go to 1/4. However, we can see that this is a very different type of selection. Essentially the last one was talking about selecting from points with equally spaced lengths where here we are choosing points equally spaced over area.
So we have the same dimension problem with respect to measures as before. You might object that when we consider the remaining approach where we choose the third method where we choose a random radius and then choose a random point along the radius. Isn’t this an equally spaced length, and so shouldn’t we get the answer of 1/3 again, instead of 1/2? Well, not really. We are taking both an equally spaced angle and an equally spaced length. Essentially we are spacing things over an area, but using polar coordinates. If any approach is wrong, it’s this one, since a “polar area” of an angle times a length doesn’t have geometric meaning here. I’ve seen things go similarly bad when people try to “randomly distribute” points on a sphere by naively using uniform distributions on spherical coordinates. But my understanding is that Jaynes reaches the exact opposite conclusion by showing that the other two approaches don’t satisfy translational invariance.
So I’m honestly not sure if there is a “correct” answer in either the classic case or the area of the square case. But at minimum it should tell us that if there even is a correct answer in cases where we compare “random” choices without an explicit distribution, then it is never obvious which it is. So we should we explicitly justify a limiting process every time, or use ideas like geometric invariance every single time, and never assume that we know what it means to “randomly” pick something in an infinite case otherwise. But, of course, no one actually does this.
I would become the new Jackson Pollock.
I’ll do the circles later. Same thing though. Rudolph is on the way.
I neglected to say there are other justifications beside the principle of indifference. Namely, the statistical syllogism. You can search for that here. It’s in the books too.
Don’t need frequentism here. Don’t need real boxes or real factories. You could start with “Elves make tusuches” or whatever. It’s a matter of logic not physics, that’s all.
All that video proves is that innumeracy is no barrier to an academic career in philosophy, at least in Canada.
I assume that Bertrand himself was not such a fool, and that his paradox is not as shallow as portrayed there.
That’s where you’re wrong. It is just as it is portrayed. Look it up.
Also Bertrand does circles (you’ll see), but there is no difference. Indeed, see that Jaynes (chapter 15, I think). The math gets subtle and that’s when people forget about the infinities.
If a process has a finite number of possible outcomes n the no info freq p is 1/n. Whether we measure outcomes by length, area, or placement makes no difference. Your solution simply assumes a uniform process and counts cases – so no paradox can arise. Bertrand gets his by convoluting processes producing differential measured and denominated cases. He’s just wrong.
From question on Gab.
Boxes are not “random”.
Factory makes boxes. It’s your uncertainty in the box length/area that’s in question. Not theirs. They know what they’re doing.
Pingback: BakerStreetIrregulars.blog — Inspector Doctor Professor Sargent Briggs solves The Case of Bertrand’s Paradox ° Without Leaving Oak-Paneled Study ° Provides Scotland Yard with Diagrams Sending Miscreants to Prison ° Lionized in Press ° Queen Sends Briggs Tin of Royal Tobacco ° How Does He Do It?
Isn’t this explicable in terms of something like fractal dimension?
To give another possible way through, we should remember that 10 is halfway between 1 and 100. Logarithmically, that is.
Incitadus, actually there are many infinities. For example, the natural numbers are countably infinite, as are the ring of integers, but one could posit that the ring of integers is twice the infinity of the natural numbers. Rational numbers are also countably infinite. There is a bijective mapping between the natural numbers and the function f(x)=1/n for all n>0, when n is a natural number. The real numbers include the rational and irrational numbers, of which rational numbers are countably infinite and irrational numbers are uncountably infinite. Thus, there are different infinities, and we haven’t even begun to discuss complex numbers or quaternians.