Occam’s razor: The simplest hypothesis is usually the best. Simplest is easy to understand, and non-controversial, or relatively so. Consternation creeps in with usually.
Before that, first understand what is an induction. Not in its mathematical “if n + 1 then n” implementation, but in its logical form. The reason you do not leap from a fifty-story building to arrive at the ground in the quickest manner, is the same reason you do not stick your bare hand into a roaring fire. Every time somebody has done this has either died from rapid deceleration or has been toasted like a marshmallow.
It is logically possible that when you leap you will not be turned to jelly, just as it is logically possible that when you reach into the fire your hand remains unscathed. It could be, for instance, that a great gust of wind blows northwards slowing your southwards fall just before you reach your final stop. Or a fluke of the fire causes the flame to part and the heat to be directed away from your digits.
We induce that leaps from high places or reaching in to roaring flames will cause death or pain because they have always been seen to. Any contingent event—like these physical events—that we see always happens, we say via induction that they always will.
If we only die most but not all times we leap from tall buildings, or are only burned most but not all times when we thrust our hands into a fire, then we are still leery of these activities, but not we did not learn our fear from induction. We instead used probability, or non-deductive logic, but not induction.
Inductions are just one-off deductions, so to speak. Like a deductive one, an inductive conclusion is always seen to follow from its premises, but unlike the deductive conclusion which must always follow, the inductive conclusion is not logically necessary.
And both of these differ from non-deductive conclusions, which only sometimes, or rarely, or usually—anything but never or always—follow from their premises. These are crucial distinctions, easily misunderstood and the cause of much confusion.
Now, we so often derive explanations for events that it is not just second, but first nature. Why did the car go when we pushed the gas pedal? Why does the basement flood when it rains? Why do pencils fall off tables? Why does your wife become argumentative with the waxing of the moon? For all these things and innumerable others we have developed explanations, theories, models (all synonymous).
We have discovered, through experience, that usually—but not always—models which are simpler are often more useful or are more often correct than are models which are complex. Occam’s razor is thus not an induction, but a belief, or rule of thumb, based on non-deductive reasoning. If it were an induction we would have discovered that each and every time the simpler model was the better.
What does simpler mean? One with fewer premises. One without as many knobs to twist, contingencies that must be met, chains of complexities that must be followed. Occam is supposed to have said, “entities must not be multiplied beyond necessity.” But this begs the question of what is necessary.
If a model is not deductive or inductive, it is non-deductive; therefore its conclusion is only probable (a number strictly between 0 and 1) given its premises. If a model is deductive, it is always possible to strip away all unnecessary premises, here defined as those that do not change the validity of the conclusion. Necessary premises are those absolutely required to make the conclusion true or false (a probability of exactly 1 or 0).
If a model is inductive, it is usually clear which premises that can be taken away so that the conclusion is still seen to be likely, but then we have to define likely. Inductive conclusions are contingent, so none have probabilities exactly equal to 1 or 0, but probabilities which are as close to 0 or 1 as you like. How far from 1 or 0 does the conclusion have to change upon removal of a premises that this removal convinces us this premises was “necessary”?
Actually, any substantial, measurable change in the probability is enough. If removal of the premises shows the conclusion is not just contingent in general, but specifically such that the probability changes measurably, then that premise was necessary for the model.
For non-deductive models, removing any premise just changes the probability of the conclusion. A removal can make the probability go up or down, but so what? The conclusion is still not certain and so still not guaranteed, so the model (with or without the premise) is still not wrong.
We can only gauge the goodness of the model by how close the probability of the conclusion was to the actual event (whether it happened or not), and only after the fact. This is tough luck for those who would criticize a non-deductive model (which is all known physical models, like GCMs, etc.). What the punter wants is to, before the fact, have a basis on which to criticize or judge a non-deductive model: and that is why he invokes Occam’s razor.
After the fact, models which have closer probabilities are said to be better. So Occam’s razor comes to this: experience has shown that non-deductive models with fewer premises often have probabilities closer to actual events. In other words, Occam’s razor is a meta-probability statement, no different than probability statements for ordinary models.
Categories: Philosophy, Statistics
What does simpler mean? If you hypothesize (assume) the best about everything people say and do, life would be much simpler and easier. ^_^
I will argue, with qualification, that SOME non-predictive models (models lacking predictive utility) ARE “good” — such occurs where a process has distinct discontinuities (e.g. “step functions”) that force a need for a different model.
A crude example might be a model for the behavior of a “fluid” under certain circumstances. When that fluid changes state (e.g. gas to liquid or solid) a given model’s utility will, usually, fail and an entirely new model is needed (i.e. a new model for the gven conditions is usually better…and simpler…than one that addresses wider circumstances). This is commonly observed with very complex systems; economics models, for example, are routinely constructed in this manner.
Arguably, such a family of models applicable to defined boundary conditions for a given activity might, under some definitions, qualify as a “model” (singular)…though this is generally considered a bit of a stretch of semantics.
Semantics aside, complex systems are (to my experience) more routinely modeled as “systems of models” applicable as relevant to defined boundary conditions. That’s a result of substantial experiment & observation to develop precise models for particular conditions.
Which makes me wonder how climate models are constructed — from what little insights I’ve gotten they seem designed as singular all-encompassing models for a very complex, and largely unknown, system of interrelationships. Is that true? If so, that alone is strongly indicative of a very primitive state of affairs.
This is typical illogical claptrap as is a good deal of this type of “philosophy” The statement
“experience has shown that non-deductive models with fewer premises often have probabilities closer to actual events.”
is simply an unsupported general assertion designed to support a predetermined conclusion and in any event, even if true, there is no reason to think that it would apply to GCMs in particular .
It especially would not apply to the IPCC GCM models which are simply wrongly framed in the first place because they are not based on any empirical analysis of the various data time series and subsequent assignment of forcings and feedbacks based on eg fourier analysis but merely on those forcings and feedbaks arbitrarily chosen by the modelers based on the prejudged outcomes that they wished to illustrate.
“Why does the basement flood when it rains? Why do pencils fall off tables? ”
For the same reason that dropped toast lands jelly side down — to be annoying. Same may be trued of the wife and moon but only you can answer that.
I’ve always treated Occam as a rule of thumb. Less complication means less likely to have a way of breaking it.; easier to comprehend — simpler is just more desirable. I believe Occam’s rule was to be applied only if both models are otherwise equal. Simpler meaning less assumption.
My employer sent me to a class on problem solving – specifically, root cause analysis. What I got out of the class was that we should keep asking “why” and testing what we come up with. If B causes A, then what causes B? If C causes B, then does changing C reduce B? Etc.
The problem for me with such classes is that they infest my brain outside of work, too. Pulling up to a stop light behind another vehicle, and noticing that there was thick black choking smoke coming out of it’s tail-pipe, my brain started in on the series of “whys” and continued long after the light had turned green. Nothing I could think of would be guaranteed to change the situation with the stinky car, probably because the driver of that car had clearly already made his choices.
Finally I arrived at the only satisfactory “why” I could come up with: the root cause of smoky vehicle exhaust is clearly the design choice of having the tail pipe stick out the back of the car instead of the front.
What DAV said. I’ve always taken it as a policy statement. The less complicated explanation is less vulnerable to refutation – it puts fewer ducks up to be shot at.
I’ve noticed, though, that some people quote it in a way that means a simpler theory is more likely to be true and I see nothing in the Universe to guarantee that.
Interesting post Mr. Briggs. Based on what you said, why wouldn’t be okay to keep adding parameters to a model provided that the model reduces the error between observation and prediction?
I have a climate model that uses big foot sightings, military’s spending as a percentage of GDP, and marginal tax rates for married couples– i kept adding variables and adjusting weights until I got a really low error rate for predicting next years temperature.
Excellent point. It is always possible to find a set of premises (i.e. variables) that will allow a model to fit past observed data perfectly. This is why the true test of a hypothesis/model/theory is how well it predicts data not yet seen. That is part of being what “better” is.
I am probably exposing myself to a ridicule. OK, I will take it with humility.
The rock bottom problem with induction (for me) relies on the following fact:
In order to create a model we have to assume some distribution. We than get busy collecting data about the process of interest. How large set of data? It depends on the distribution. Isn’t it a circular argument?
Isn’t it a weakness of induction?
That very fact makes me to capitulate in the face of a serious skeptical argument. Don’t take me wrong, I am an empiricist by profession.
Is Occam’s razor a statement about probability? Perhaps it better describes a method of investigation — to begin with no more assumptions than necessary to perform a test. You ask what is necessary, but it is more relevant to know what is *more* than necessary. More than necessary requires more than one test. If you assume A and B then C, when A might have sufficed, your test will be inconclusive. You must still test for A alone.