Statistics

The Difference Between Variability & Uncertainty In Risk

Reader question:

Dear Dr. Briggs

I regularly read your blog and have purchased both your books. I especially liked your book on Uncertainty, although I need to study it a second time.

I recently came across this statement on an EPA site, and wondered immediately what you would make of it:

“Available software cannot distinguish between variability and uncertainty. Some factors, such as body weight and tap water ingestion, show well-described differences among individuals. These differences are called ‘variability’. Other factors, such as frequency and duration of trespassing, are simply unknown. This lack of knowledge is called ‘uncertainty’. Current Monte Carlo software treats uncertainty as if it were variability, which may produce misleading results”

Source: [EPA link]

I am really interested to know your thoughts on this statement.

Kind regards,

[Anon]

This is from an EPA article on the “Use of Monte Carlo Simulation in Risk Assessments”. Before we get to that, if you haven’t already, read “The Gremlins Of MCMC [Markov Chain Monte Carlo]: Or, Computer Simulations Are Not What You Think“.

Don’t be lazy. Read it.

There aren’t any such things as “random numbers”, so MCMC models are just like all other models: they only say what they are told to say. If you feed a model “random normals”, or whatever, you are just giving it numbers, which are manipulated exactly as you say they should be manipulated. And the numbers you give the model are exactly the numbers you specify. There is nothing mystical to them.

In other words, attempts to feed models “random” numbers so that they behave like nature “picking” distributions or whatever aren’t anything like that at all. They are just models doing precisely what they are told to do by modelers. That modelers sometimes don’t know, or can’t anticipate, what the outcomes of their models are doesn’t mean the model isn’t doing what it was told.

So much for the throat clearing. Let’s examine EPA’s terms. The article starts like this:

EPA’s current risk assessment methods express health risks as single numerical values, or “single-point” estimates of risk. This technique provides little information about uncertainty and variability surrounding the risk estimate.

Now risk is just this: the probability of a bad thing (death, disease) given certain premises or assumptions. Risk is not cause. If it was, then this probability would always be 0 or 1. Given those certain premises, we can come to, as they say, a “single-point” estimate.

For instance, the probability (risk) of dying a horrible death given you are walking blindfolded on a soaring cliff in a blizzard is high. If you quantify all the connections of those premises, you can come to a single-point estimate. If you don’t quantify them, we use fuzzy words like “high”.

If those premises vary, or are changed, then the risk changes. Change the premises, change the probability. Add the premise that you can fly, then the risk drops to zero (because cause). Add the premise that you are wearing waxed flat bottom shoes, then the “high” becomes “Egads”.

Something closer to the CDC example. Being fat and having, say, diabetes. Knowing only that you are fat, quantified in some way, such as BMI, we can quantify a risk of diabetes. As BMI varies, so does the risk. It’s still not cause, just correlation. This risk is a model. And all models only say what they are told to say.

Does the model apply to you? Only if you want it to. All we know, as said, was BMI and probability of diabetes. We don’t even know what “data” informed this model. We just have the model.

But in this model, as BMI varies, so does the risk. This is one kind of variability the CDC mentions.

The unknowns, or unknown unknowns, are everything not in the model. Such as whether the model applies to you. If you knew it applied to you, that is because there must necessarily be other premises in that model that allowed you to say it applied to you. For instance, suppose you learn the data that informed the model was of 40-55 year olds, Americans, men. And you say to yourself, “That’s me!”

Well all those things vary, too. That range of age is a second kind of variability. But not in the model. In the model, they are fixed. It says nothing about 56 year olds, or Russians, or women. If you decide to apply the model to yourself, and you are Canadian, male, and 56, you have created a new model. One that also includes the premise “And Canadian man, 56”. Maybe it’s a useful model, maybe not.

Now you want to know whether you will get diabetes. That has a cause, or causes. If all causes were in the model, and the model applied to you, again the probability would be 0 or 1. Supposing it isn’t, then at least one of the causes of diabetes is unknown to the model.

Those are the unknowns the CDC means.

I’ll end with my usual harangue. Nobody has a probability/risk of diabetes, or of anything. Probabilities are always conditional on the premises you assume. Once you make assumptions, i.e. create premises, you create the probability. Change the premises, change the probability.

Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.

Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here; Or go to PayPal directly. For Zelle, use my email.

Categories: Statistics

10 replies »

  1. Briggs: you write “There aren’t any such things as ‘random numbers’…”

    OK. Computer modelers typically refer to pseudo-random number generators (of which there are many) that I assume you would also say are not random either numbers, am I correct?

    What would be the correct term, in your view, for selection of values from a defined normal distribution, as an example. Would it be merely a ‘perturbation’ around an expected value?

  2. Robin,

    “Random” only means unknown, and computers generate known numbers by a known process.

    You can easily generate numbers to fit any curve you like, e.g. normal, or whatever.

  3. In quality control work, “random” referred to a kind of variation. Variation in a measured variable is due to a myriad of causes, which can be divided into two sorts: variation that can be assigned to a particular cause (“assignable variation”), and variation that is the net result of many small causes, none of which are dominant and none of which are economically worth identifying and controlling (“random variation”). Note that “random” does not mean “uncaused.” Quite the opposite!

    The example I always used was a pair of dice ordinaire. You roll them, they come up 12. Let’s say 12 represents a defective product. The forces of righteousness converge and lay their hands on the product crying “Be healed!” Your boss says “Find out what caused this unacceptable result” to which the rational, job-retaining response is “Yowza!”

    So what did I do different this time? I dunno, maybe I threw the dice too hard? Yeah, that’s it. So you test the hypothesis. You throw the dice Gently, and Lo! they do not come up 12. That proves it works. So you put gentle dice-throwing in the SOP and train the workers in gentle dice-throwing. This is called “preventive action.”

    Alas, after a while a 12 again appears! Again, corrective action is demanded. It can’t be hard dice-throwing. You’ve been real careful on that. It must be some other cause. Maybe I threw at a bad angle. Yeah, that’s it. So you get Mfg Eng to whomp up a fixture so the dice always hit the carpet at a perfect 45 deg angle. You throw the dice (gently) using the fixture and you don’t get a 12. That proves it works. This is called “process validation” (Except in medical devices, where three thows are required).

    Well, you know what happens next. Eventually, another 12 appears. The third time you get a 12, you take your Production Report Form and write “11”.

    Why do you do this thing? Because your momma didn’t raise no stupid kids, and it don’t take long to realize that whatever hoops you jump through to prevent more 12s, another 12 will occur for some other reason.

    But if a 12 really is unacceptable, what can we do? “Random” variation is a sign that the Cause is “baked into” the design of the process, so we must look for design causes, not operational ones. In this case, the root cause is that a 6 appears on one face of each die, so the solution is to white out one dot on one of the 6s, after which a 12 becomes impossible. (More elaborate: mill off one 6-face and rout out five spots instead; or use tetrahedral dice; or something else.) IOW, it was a material cause, not any one particular cause.

    Throwing a 13 is a different matter entirely. A particular cause can be assigned to such a result. (Measurement error is a possibility.)

  4. Yes and thank you SSgt Briggs and YOS.

    A relatively trivial yet instructive true example:

    Some time ago in a very hot climate (fried eggs on the car hood type of heat – not recommended), I was monitoring the concrete works of a concrete project on behalf of the owner. Contract was proceeding nicely, but well into it, the concrete test results started consistently failing. Owner wanted to stop the project; contractor declared that nothing had changed. Dispute broke out. In these situations, 9 times in 10 the contractor is assumed to be at fault.

    On this project I happened to be tracking the variations (as variance) of the testing procedures and noticed they were getting larger and larger with time. Told my client to hold on – this time the contractor might be right – and not to stop the works.

    This was a French contractor; their concrete engineer was top drawer. All the equipment calibrations were good, production plants, testing machines, etc. Raw material variability hadn’t changed. Simple result – the testing company chose to use plastic molds for the test samples and they became slightly warped with time and use – probably due to the excessive heat. I do mean slightly (millimetres) and not easily spotted by eye. Replaced them with new steel ones and boom! Everything immediately came back into line.

    These seemingly little events can have huge financial effects in construction works. Projects are stopped routinely by engineers who really do not understand what they are doing, but think that they do. When it comes to the variation and uncertainty of processes, one can never ask too many questions.

    That was a process measurement fault, but I frequently come across failures that, as YOS indicates, are “baked into the design of the process”. Some designers have no understanding of the nature of processes but they sometimes specify things in the design that are simply unachievable. That’s when the fun starts in the arbitrations that are sure to follow.

  5. Ha if you think EPA using Monte Carlo Simulation is something, check out the notion of Value of Statistical Life–please, please please give it a look. This old statistician eagerly anticipates your thoughts on the matter.

Leave a Reply

Your email address will not be published.