Class 69: Why Parameters Don’t Exist

Class 69: Why Parameters Don’t Exist

Questions Parameters don’t exist because distributions don’t exist, and distributions don’t exist because probability doesn’t exist. You cannot have “estimates” of “true” values when there are no values..

Video

https://youtu.be/9nh7ngONLoI

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Given below; see end of lecture.

Lecture

I received the question below at YouTube, and various other questions elsewhere which are similar. So before we get to control, I thought it best to clear this up. Parameters do not exist. They are not part of Nature. Nature does not “draw” from distributions. Distributions does not exist. Probability does not exist.

Recall first that we used a normal distribution to characterize our uncertainty in unknown GPAs, and that we allowed one central parameter for Whites and another for non-Whites, and we let them share a spread parameter. Our justification for this was “Everybody else does it.” A compelling argument to peer reviewers everywhere.

In that example, in the Wrong way, we spoke of “estimates” for the central parameters’ “true” values. (The Wrong way ignores the spread parameter, and indeed ignores the uncertainty in GPA altogether, which was our stated goal of doing the problem!)

In order to have an “estimate” there must be a true value. In the Right way, I claimed there was no true value, and that he parameters were a useful fiction in order to create a model we knew would be, at best, an approximation, and maybe even a good approximation.

We knew it was only an approximation for two great reasons: (1) GPAs can only exist on some finite discrete set, and even if we don’t know what it is, it is necessarily discrete and finite, and normals give values on the continuum, which is infinite, and (2) even if this were not so, the probability leakage (the model gave positive probabilities to impossible events, like GPA > 4.15) proved that model was only an approximation, and thus there could be no true values of the parameters, at least for this model.

Recall, finally, it is the custom to put “hats” on the parameter “estimates” to indicate they are “estimates.”

Which, at last, brings us to our question:

Would it be reasonable to say those “hat parameters” are estimates of the “true parameters” because they are calculated using only a subset of the all the possible data you could collect, where the “true parameters” are the ones that are calculated with the full dataset. I say “true parameters” in quotes because I agree that they do not inhere in the data, but because the “true” refers to a full/complete dataset rather than just a subset.

An excellent question. My heart soars like a hawk.

I have received a similar question for models used in surveys (like polls, etc.), in which we take only a small sample of a larger and fixed and, most importantly, non-infinite population. Can’t we say the parameter estimate used in a poll/survey model is a guess (however construed) of the true value of that parameter, which we would know if we could sample the entire population?

Let’s take the poll/survey first, because it’s easiest and comes closest to intuitions of the questioners.

Our poll or survey will be this: whether any adult’s (18 or older) nose is larger than one majestic inch in length. We only care about the adults alive right now, in this instant, and not any before or after. Thus, we have a fixed, definite population. Call the size N. In the end, there will be M, and only M, folks who fit the bill (duck joke). The true proportion is thus M/N. Same kind of thing holds in election polls.

We now have several vantage points.

One is at the end, assuming we have measured everybody. What is the probability any person meets our criterion? One, if we know and condition and the measurements (we know each person’s measure), or M/N if we condition only on the totals. Now you can call that M/N a parameter, but that is an abuse of terms. It is one counted measured observed number divided by another. M people exist with the criterion, and N-M exist without it. Probability isn’t even needed here unless we ignored all measurements except the totals.

Another vantage point is before we take any measurements. Surprisingly, in the RIGHT way we can still make predictions, deductions based on knowing N and what people might not or might meet the criterion. If all we know (and this will sound familiar to those who have followed from the beginning) is that there are between 0 and N people who might meet the criterion, then each possibility has equal chance. That is

Pr(M = i | N) = 1/(N+1) for i = 0,1,2,…,N.

There is nothing here that looks like a parameter. Everything is observable, the probabilities are deduced. (Of course we can add information besides just N and the criterion, like what we know of the criterion’s essensce, and so forth.)

The probabilities remain deduced if take a sample from the N: call it W, which has so many who meet the criterion (w1) and so many who do not (w0). Then we can compute the probability how many of the remaining N-W observations will yet meet the criterion, or we can compute the probability of M (the total), or whatever else we want. See the lecture on the origins of parameters for full mathematical detail.

We’re always predicting a real thing in this case, an actual count. There are no parameters. Only by taking the process to the limit, i.e. supposing N goes to infinity, do parameters arise as approximations. They are not needed when N is finite, even if large. We can always model probabilities directly, without the aide of any parameters.

Here’s a concrete example. Suppose N = 300 and M = 200. These are real facts about the world. Given that information, the probability somebody in this group matches the criterion, and is of course unmeasured by you, is 2/3, as is clear. Everything is real and measurable.

But we can still do this the wrong way! That is, what is the value of the parameter now we have reached the physical limit? The “estimate” is still 2/3, as one hopes, but now it has a “confidence interval” of (0.61, 0.72), which makes no sense whatsoever. The parameter is supposed to be a property of the world, just as the real people are.

A bad habit the field has developed is in models like this calling the parameter “the probability” (as in e.g, “let the probability of the criterion be theta”), which is false. As in not true. This has caused great confusion. This mistake lets people speak of “estimating the probability”. No. Probabilities are deduced on (subjectively) chosen information. There is no need to “estimate” them. Though there might arise approximations in their deductions (math isn’t always so easy).

Consider you have your “estimate” of the probability. And it’s “confidence interval” (or “credible”). Then what? What is the probability of the criterion? After all, that was your goal! You still don’t know. It might be the “estimate”, and then it again it might not.

Even worse, there is no parameter “estimate” when we have taken no sample (as yet). But, as we did above, we can still calculate the probabilities of observables. It gets worse still.

Suppose we have taken one sample, just one, and it meets the criterion. Thus W = 1. Then the parameter “estimate” is 1. With a confidence interval (if it makes any sense to calculate it) of (1,1). Or suppose the one sample revealed a non-match, then the “estimate” is 0 with confidence interval (0,0). Both answers are absurd.

Now for other models, like the normal, if there is any kind of probability leakage, we know the parameters have nothing to do with real life, so speaking of “estimates” of “true values” in those cases is just plain wrong. I think we’re all clear here.

But there is that other, subtler criticism mentioned briefly above. Suppose we think no probability leakage exists. In the end of time, whatever that might be for a process, we’ll have N actualities. N will be less than infinity. The resulting observations or states (if observed or not) will be on a finite discrete set, which will not be the continuum. Any continuous valued model will give probabilities to the continuum, which is to say, to impossible values.

A picture like this will obtain.

You can imagine a model in advance which assigns probabilities to each state, but then we’re back to the first situation. We could model directly without needing any parameters. Any model with continuous parameters we put on this will smooth over the bumps. It may be that you consider the departure of probabilities from parameters to be trivial or negligible. But trivial or negligible is, in logic, like “nearly a virgin.” The departure from the pure state proves the parameters are not what is claimed.

This does not, of course, mean that they cannot be used as approximations. A normal model is much simpler than, say, a multinomial model over all states. The only question is what price do we pay for that convenience?

Now it gets less easy. There remains the possibility that some things, perhaps space itself, is absolutely continuous, and so potential states can be modeled with continuous parameters and models. All actual states take definite values, finitely and discretely, as above. Anything do with with actual things and observations will fit our scheme above. But perhaps were interested only in potentialities. In the Aristotleian sense, I mean.

In order for parameters to have true values here, the model itself must be true. Not ad hoc, posited, or guessed at, but deduced from first principles, with no free “constants”. I know of no such model. But my ignorance is vast, so it’s possible one could exist. Because if the model is not deduced from positively certain axioms through an indubitable chain of reasoning, then there will be no way to prove that model in any finite set of observations, no matter how large. You will always be left with surmise, with residual uncertainty, no matter how small or “negligible”. That distance will always remain far from truth.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use PayPal. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.


Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *