Philosophy

What Is A Model? We need to know to test between good and bad science

Science Gone Wild with William M Briggs
Science Gone Wild with William M Briggs
What Is A Model? We need to know to test between good and bad science
Loading
/

Listen to the podcast at YouTube, BitChute, and Gab.

Before we describe what models are in science, it’s best to know, and to never forget, that all models only say what they are told to say.

Models are lists of statements of the form “If this, then that”. No matter how large they grow, or how sophisticated, or how mathematical, or how computerized, or how much data that is put into them, or from what sources, their natures are not altered. They are always lists of “If this, then that.”

This applies to all models. It does not matter what names those models are given: artificial intelligence, statistical, probability, physical, meteorological, air transport, crop production, chemical, sociological, psychological, genetic, quantum mechanical, machine learning, and on and on. All are the same in essence: all models in every field have the same nature. And all say just what they are told to say—and nothing more.

This is not a limitation or a flaw. It is the way things are.

Here is a simple, common, and most useful model, used by casinos the world over: “If this die has six sides, then the probability any side is up in a throw is one in six.”

That model says exactly what we want it to say, and only what we told it to say. It is an accurate model, too. It matches reality well; indeed, it makes beautiful predictions, especially because casinos keep a watchful eye on dice throws. Vast sums of money are made using this model.

The model says nothing, not one thing, about what causes any side to be up on any toss. Efficient cause cannot be inferred from examining the model. No cause was built into the model. That is, none of the “If this, then that” statements (and there is only one) mentioned cause (except part of the formal cause, that the object has six sides). But the model is still good.

We conclude that models can be good and useful yet be silent on cause. The opposite is also true: a model can perform well in practice, but we cannot from that good performance conclude it has identified the cause of things. Ensuring cause has been identified is a much more difficult task, as anybody who has painfully designed an experiment only for it to go incomprehensibly wrong can attest.

Since all models only say what they are told to say, we can always create a model to say anything we want. We can have the model speak in the complex language of mathematics or physics, or we can have it discourse in arcane computer code; we can have it project in pictures, words, or numbers. We have the complete freedom to make any model say anything we want it to say.

We have the freedom to specify the “If this” parts of the model, from which we sometimes can deduce, and sometimes must guess, the “Then that” parts. Or we can work backward, starting from desirable “Then that”, and picking compatible “If this” parts.

We have the freedom to say which and how much and from where the “data” goes into the model, and what “If this, then that” they are married to. We have the freedom to embrace any simplifications or approximations we want. We can, and an increasing minority even do, cheat.

In short, we have complete freedom over all aspects of all models.

This freedom comes with a cost. Since any model can be made to say anything at all, it means models can’t really be trusted until they are tested against reality.

Models certainly cannot be trusted because of the authority of who builds, or rather creates, them. That is a fallacy. And they can’t be trusted because “We need to do something and there is nothing else.” That is also a fallacy: there are always other options.

Models can only be believed when they are tested independent of the information used to create them, and independent of the people who created them.

If the model only works by those that created it, or built using only the information controlled by its creators, we likely have a case of confirmation bias. Everybody in science knows what confirmation bias is, but everybody also believes it always happens to the other guy. Again, models must be independently compared with Reality to prove themselves.

Here’s another simple thought experiment showing why all models must be independently tested.

A team of eminent engineers, all with many awards and high positions, claim to have discovered a new theory that can be used to build a machine to safely transport people (and only people) through space to distant habitable plants, by dematerializing them here, and rematerializing them there.

The trip is one way, though, because there is no machine built to the theory’s specifications on the other side. One day, if the machine works, and enough people can be transported successfully, an industrialized civilization capable of recreating the machine can grow, and people might be able to return. We also cannot communicate with the people on the distant planet, because again the machine is only one way, and the planets are thousands of light years away.

In our understanding and definition, there is no difference between “theory” and “model”.

So, will you step into the machine to be dematerialized?

Or will you first demand some kind of proof the machine works? That is, proof that the model matches reality. Or will you take the creator’s word for it, and be convinced by their impressive demonstrations of math and science, and their insistence most of their colleagues (say, 97%) agree with them? If so, safe journey!

Even though it may not look like it, especially to non-mathematicians and non-coders, it is simplicity itself to create a complex model. An enormous list of equations, all accurate in themselves, can be written down with very little trouble. And, with even greater ease, we can say those equations apply to some physical thing, some measurable aspect of reality.

For it is we who say the “x” in our model means “quantity of ammonia”, “temperature”, “crop yield”, “income”, or whatever we like. It is we who say all those fancy calculations mean “If this, then x does that.”

This is always raw assertion. This assertion may have any number of good, and even excellent, reasons for believing it, or at least putting confidence in it. But the assertion about “x” cannot be believed finally until the model proves itself.

We must always remember the math and complex symbols inside a model are just that: math and symbols. They only take on meaning when we assign it to them. They have no meaning in themselves. Although you are likely tired of reading it by now, what this means is that the only true test of model is when we witness its boasts about those symbols.

Testing a model against reality is not easy. If a model says, “If this set of conditions hold, then x does that”, then we either wait until that set of conditions arises and then examine x, or we carefully as possible design an experiment to bring about those conditions.

That takes time, and is costly.

Which is why this step is usually skipped. Instead, something like the certainty of the model’s “If this, then that” statements, or the model’s inner coherence, are offered as proof enough of the model’s worth. But this move does not work.

The ideal model is one which makes perfect predictions. This happens when every one of its “If this, then that” statements is true itself, and can be proved true, where that every is strict. And where the complete cause of “x” is within that list of statements. Since every “If this, then that” statements is true, and we know the complete cause, the model must predict perfectly.

Useful predictions can also happen when not every “If this, then that” statement is true; that is, when some of them are false or uncertain. Or when they all are true, but when we don’t know the complete cause. This applies to the dice throwing model above.

Usefulness is a subjective criterion, but that it is subjective is not necessarily a shortcoming. A model may be useful to one man, but useless to another. It depends on to what uses the model will be put.

For an extreme example, a model that is wrong all the time can be very useful to the man who successfully markets and sells that model. Whereas the model will have no value to those who buy it. This is why buyers must insist on independent demonstrations of model performance.

The more complex a model grows, the more difficult it is to witness enough or all the situations envisioned by the model.

A psychic (with her internal “mind” model) claims to be able to guess if you are thinking of an odd or an even number. You think of odd, she predicted odd, a success. Is her model correct; i.e. does the woman truly have psychic powers?

Maybe. But one test is clearly insufficient to tell. And even if she guessed right a large number of times, we would not have decisive evidence her model was correct. Winning poker players know why: a cause other than that asserted by the model (psychic ability) might also account for good results. This is why testing models against reality is so difficult.

Again, the difficulty only grows with the model’s complexity. A model of guessing odd or even numbers is trivial. A model which purports to say what the temperature or level of some atmospheric chemical will be a year hence is hideously elaborate.

The more sophisticated the model, the greater its number of “If this, then that” statements. For testing, compromise in limiting that set to something manageable is almost always required. With that compromise should come an increasing uncertainty of the model’s value, though, even after it is tested. If the model cannot be tested everywhere, it should never be wholly trusted.

Sometimes something very like the opposite is true in practice. More complex models are trusted more, even when imperfectly tested.

We have to consider those who create these complex models love those models. And why not? Creators spend vast quantities of time and energy and honest sweat in designing, shaping, and polishing those models, like Pygmalion adoring his statue. Their models are beautiful creations, at least in their eyes.

The people involved in these large modeling efforts are intelligent, even highly intelligent. It is always therefore an affront to cast doubt on their work, a kind of insult. Nobody likes to be suspected of error, least of all those who believe themselves our best thinkers. Harsh questions that could be asked about a model are, for this reason, sometimes not asked.

And, as mentioned above, the cost and time required to do robust testing balloon with model complexity. When model users feel model-based decisions have to be made, and perhaps the funds and will to do so are low, the temptation to bypass independent testing is often not resisted.

Even when this is not resisted, the temptation with independent testing to resort to shortcuts is too often taken. Or certain mistakes in verification creep in unnoticed.

Suppose we have a model that predicts the atmospheric deposition of some thing. An independent test is done, as is proper. Many points never before used in building the model, in any way, are predicted. These predictions turn out to be good—with good defined by the decisions models users make, and the gains and losses they experience with the model.

A quick judgement comparing the averages of the predictions and the observations appears to say the model is fine. But a closer examination reveals that the majority of points tested were of trivial size, concentrations that no one really cares about, because these happened to be the bulk of observations taken during the test. Those small amounts are what happened in the world.

A complete look, using better measures than just comparing averages, reveals larger values of deposition, the ones important to model users, are not predicted well; indeed, the error in the model is seen to grow as the value of the deposition grows.

This means we have to do the independent tests at points where important decisions would be made by model users. We can’t rely on simple tests of goodness, because it might easily turn out that any model would have done well, and we just happened to test ours.

This brings in the last idea of skill. It is simple to grasp. Suppose we had a good guess of the average values of deposition over a time period of interest to model users. Maybe the mean of old observations are taken. We could use that average and use it as if it were a model. We can make predictions with it, just as we can with any model. All the predictions happen to be the same value, which is the old average.

Now the deposition model we supposed a moment ago is considered to be a sophisticated, physically and chemically accurate model. It appears to explain the physics and chemistry and other components of the atmosphere in a satisfying way. A large group of important users implement this sophisticated model. Much is invested in it. Important decisions are made using it. Lawmakers embrace the model and require it to be used.

But then suppose we compare that sophisticated model to the average model and the average model beats the sophisticated model, using the verification measures we thought important.

Which model is better? Well, as just said: the average model. The sophisticated model does not have skill with respect to the average model.

Since we would be better off using the average model, we should not use the sophisticated model, no matter how important it was thought to be, or how much was invested in it, or how “official” it is. Or even because it is believed the sophisticated model explains the physics and so forth better than the average model.

In truth, the sophisticated model, because it cannot, and did not, beat the simple average model, cannot explain the physics better. It can only appear that way to those who love the model. Because if it did explain the physics better, it would have also made better predictions. It is as simple as that.

The final lesson is simple: put all models to the test.

Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.

Subscribe or donate to support this site and its wholly independent host using credit card click here. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.

Categories: Philosophy, Statistics

11 replies »

  1. That an equation accurately predicts Reality(tm) does not require Reality(tm) to go along with the gag. It was Hawking who pointed this out, iirc, though he was more formal about it. That an equation contains a variable does not require the universe to do so, or words to that effect. For example, Ptolemaic astronomers found their model could explain the retrograde motions of Mars, Jupiter, and Saturn by introducing epicycles. These were famous;y successful for many centuries — the settled, consensus science — but the actual solar system did not have epicycles in it. [Except the motions of the moons of Jupiter, which they seized upon avidly as proof of epicycles.] So, a model that predicts successfully need not mimic in its innards the workings of the system being modeled. Keep this in mind the next time someone says, “Turing test.”

  2. We have the freedom to say which and how much and from where the “data” goes into the model, and what “If this, then that” they are married to. … In short, we have complete freedom over all aspects of all models.

    Yes, we can, just like we have complete freedom to say anything we want. However, there are standards to judge whether assumptions (or what we say) are appropriate or correct or reasonable or rational or acceptable.

  3. A good model is whatever the Chinese Communist Party tells you is good, and is bad whenever it can be blamed on the enemy. Listen to this Expurt testimony! And if the Expurt should be found to be incriminating itself, it will conveniently not remember whatever it said, or modelled for, in the distant past so-so long ago… it’s a very busy man!
    https://www.zerohedge.com/covid-19/watch-fauci-blames-trump-chinas-covid-cover
    https://www.zerohedge.com/covid-19/faucis-7-hour-deposition-what-we-know-so-far

  4. “. . . a machine to safely transport people (and only people) through space to distant habitable plants, by dematerializing them here, and rematerializing them there.”

    This would make an excellent plot for a Twilight Zone episode. I can see the reason behind the “and only people” premise (so the travelers can’t build an identical machine from transported parts and use it to send back messages, breaking the untestability aspect of the thought experiment, right?), but that means the travelers will arrive naked. But this could be handled by writing foliage into the script, which could be used to strategically block camera views.

    The plot could focus on the untestability aspect of the device, a leap into the unknown. Because of that, the available pool of volunteers would be small, say a man on death row, a woman with a terminal disease, etc. The obligatory plot twist at the end would be mind-bending (it’s a cookbook!). It would be filmed in black & white for dramatic effect. The script almost writes itself! Gawd, I really miss that show.

    Back to reality . . . you hint that extrapolation (as opposed to interpolation) from the values tested as being potentially problematic. I think this dramatically understates how untrustworthy extrapolation really is. (On the other hand, perhaps the ultimate extrapolation is the most common one – we assume the laws of physics are the same throughout the universe.)

    YOS – excellent point, one that shouldn’t be omitted from such discussions. The equations become the reality because they make accurate predictions. Quantum mechanics, anyone?

  5. Obviously I agree fully with this – but ..

    Let’s agree that science progresses when an existing model is shown to offer less skill than another model – the current model works, but we find exceptions or conditions under which it can fail to predict reality, and therefore set off in search of a better model. e.g. Newton vs Maxwell/Einstein. Now, can we sensibly ask whether this can be run backwards? -i.e. to claim that models whose predictive output is non determinative must necessarily either lack or misuse embedded information?

  6. “Back to reality . . . you hint that extrapolation (as opposed to interpolation) from the values tested as being potentially problematic. I think this dramatically understates how untrustworthy extrapolation really is.”

    Extrapolation is not a valid mathematical operation. I used to do lots of computer programming and have many books on numerical methods. Some of these books contain a chapter on interpolation methods, but none of them has a chapter on extrapolation methods. As one of my math profs used to warn us, the only thing more dangerous than extrapolation is predicting the future. Predicting the future is exactly what you are doing with extrapolation. That’s not mathematics, that’s prognostication.

  7. how does your data relate to your imagination

    Tom Nelson’s upload #48 is a video of Dr. Carl-Otto Weiss’ presentation: “three main cycles determine the Earth’s temperature”.

    In that video, Dr. Weiss starts his factual basis with a diagram on 8000 years of solar activity (screenshot here) and he asserts [paraphrased]: the spikes in the diagram relate to solar activity measured in physical proxy material on planet Earth.

    The largest spikes are at 86 years, 207 years, 499 years and 978 years. Put that in a conceptual black box and ask: Why are these annual intervals authentically related to solar activity?

    In his presentation Dr. Weiss now shifts to climate cycles in Earth’s atmosphere. But let us stay in the whole solar system and ask: 86, 207, 499, 897 years of what?

    Astronomy answers this question with: 86 Earth years ? 271 Mercury orbital periods, 207 Earth years ? 97 Mars orbital periods, 499 Earth years ? 42 Jupiter orbital periods, 978 Earth years ? 456 Mars orbital periods (all periods are observable [tell you offspring how and what, as adamant as possible] in/around the same week in the respective year).

    Hence, the conceptual black box can be labelled so that your imagination is fulfilled. But any predictions derived from it are as wrong as your blackbox is wrong.

  8. Sorry, the question marks in the equations are wrong wordpress translations of Unicode 2245 “approximate equal to”.

  9. Models are hypothesis generating machines. No more, no less. Hypotheses must be tested before their validity can be established.

    This used to be so widely understood as not to need elaboration outside of introductory classes that discuss method. But as you note the complexity, cost, and effort required to produce modern hypothesis generating machines makes them so precious that their developers are loath to subject them to the rigors of testing.

    In one of my books I talk about pushing models until they break. That’s how you learn something. Once one model breaks, learn why it breaks and come up with a better model. But breaking requires testing, and challenging tests, not rigged ones.

    Until science remembers its methodological foundations it will have less and less utility, except from the perspective of modelers who can successfully market them to those who do not demand empirical validation.

  10. A “good” model of a physical system is is based upon a solution to the Problem of Induction. The problem is of how, in a logically justified way to select the set of inferences that are made by this model from a larger set of possibilities. In the year 1963, this problem was solved by Ronald Arlie Christensen, then a PhD candidate in the theoretical physics program of the University of California, Berkeley. Christensen reveals the manner in which he solved this problem in the book that is entitled “Multivariate Statistical Modelling” (ISBN 0-938-87614-7), 1983.Empirical validation is an integral part of this method for the construction of a model of a physical system. The finished product expresses all of the available information but no more.

Leave a Reply

Your email address will not be published. Required fields are marked *