What is the purpose of modeling? There can be only two.
The first is to say what happened, to explain. But in order to do this one must first assume what one wants to show. If you want your model to show that the drug cured more people than the placebo, you must assume this is true.
Just think: any data collected will show either that the placebo or the drug cured more patients, conditional on whatever other measures or subsets of other measures that you like. We can stop right there! If the data shows the drug cures more or less (in whatever subsets) we are done. A model may, it’s true, invert this observation, but only if you build this reversal into it. Models can only show what they are told to show.
The explanatory model can thus only be confirmed when its makes skillful predictions, which is the second and true purpose of any model: prediction. Buy this award-eligible international seller book for more on this important subject.
Enter sick cows.
In the R package
lme4, the dataset cbpp, taken from the paper “Within-herd spread of contagious bovinepleuropneumonia in Ethiopian highlands” by Lesnoff and others, in Preventive Veterinary Medicine. From the data description:
Contagious bovine pleuropneumonia (CBPP) is a major disease of cattle in Africa, caused by a mycoplasma. This dataset describes the serological incidence of CBPP in zebu cattle during a follow-up survey implemented in 15 commercial herds located in the Boji district of Ethiopia. The goal of the survey was to study the within-herd spread of CBPP in newly infected herds. Blood samples were quarterly collected from all animals of these herds to determine their CBPP status. These data were used to compute the serological incidence of CBPP (new cases occurring during a given time period). Some data are missing (lost to follow-up).
These are incidence counts in 15 different herds at four consecutive time period (n = 56). Mixing code and output:
head(lme4::cbpp) herd incidence size period 1 1 2 14 1 2 1 3 12 2 3 1 4 9 3 4 1 0 5 4 5 2 3 22 1 6 2 1 18 2
For example, herd 15 started in period 1 with 19 head and had one infection, and at period 2 there were only 15 head and 1 new infection, and so on.
Now we could look at the infection rate at each time period and for each herd and report on that. Conditional on the belief that this disease is the only cause of the observations we are done. Some herds at different periods have different infection rates than others. We have no measurements and no idea why the differences exist. We can only deduce that differences must exist. Which is to say we have no assumptions about causes of these differences, but we know different causes exist.
What good would a model do now except to tell its what we have already seen? No, we never need a model to show us what happened. All we have to do is look. We only want to model to make predictions about other herds and periods that are like ours.
What do I mean by like? Only this: that the as-yet-unmeasured herds and periods are subject to the same, and only the same, causes as the herds and periods we have already measured. This is yet another assumption, a prior if you will, that we bring to the problem.
Very well, let’s assume new herds and periods will be like ours in this casual sense. Then we are still not done with our casual assumptions. Is each herd the same? I mean is herd 1 the same as herd 2 and so on? The veterinarians didn’t think so. They picked a model to let the herds be different. So we will too. And if herds are different, so are periods.
Here is the model in code form, take from this source and suitably modified:
library(rstanarm) fit <- stan_glmer(cbind(incidence, size - incidence) ~ size + period + (1|herd), data = lme4::cbpp, family = binomial, QR = TRUE)
This is a standard linear binomial model, where the intercept is varied by herd, an acknowledgement of the assumption that different herds have different causes operating on them.
We can look at the model parameters (code and output mixed):
summary(fit) Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) -1.6 0.6 -2.8 -2.0 -1.5 -1.1 -0.4 size 0.0 0.0 0.0 0.0 0.0 0.0 0.1 period2 -1.0 0.3 -1.6 -1.2 -1.0 -0.8 -0.3 period3 -1.1 0.4 -1.8 -1.4 -1.1 -0.9 -0.4 period4 -1.6 0.5 -2.6 -1.9 -1.6 -1.3 -0.7 b[(Intercept) herd:1] 0.6 0.4 -0.2 0.3 0.6 0.9 1.5 b[(Intercept) herd:2] -0.4 0.4 -1.3 -0.6 -0.3 -0.1 0.4 b[(Intercept) herd:3] 0.4 0.4 -0.3 0.1 0.4 0.6 1.1 b[(Intercept) herd:4] 0.0 0.5 -1.0 -0.3 0.0 0.4 1.0 b[(Intercept) herd:5] -0.3 0.4 -1.1 -0.5 -0.2 0.0 0.6 b[(Intercept) herd:6] -0.5 0.5 -1.4 -0.8 -0.4 -0.2 0.4 b[(Intercept) herd:7] 1.0 0.4 0.1 0.7 0.9 1.2 1.9 b[(Intercept) herd:8] 0.5 0.5 -0.5 0.2 0.5 0.9 1.5 b[(Intercept) herd:9] -0.3 0.6 -1.4 -0.6 -0.2 0.1 0.8 b[(Intercept) herd:10] -0.6 0.4 -1.6 -0.9 -0.6 -0.3 0.2 b[(Intercept) herd:11] -0.2 0.4 -1.0 -0.4 -0.1 0.1 0.6 b[(Intercept) herd:12] -0.1 0.5 -1.1 -0.4 -0.1 0.3 1.0 b[(Intercept) herd:13] -0.8 0.5 -1.8 -1.1 -0.8 -0.5 0.1 b[(Intercept) herd:14] 1.0 0.5 0.2 0.7 1.0 1.3 2.0 b[(Intercept) herd:15] -0.6 0.5 -1.7 -0.9 -0.6 -0.3 0.3 Sigma[herd:(Intercept),(Intercept)] 0.6 0.4 0.2 0.4 0.5 0.8 1.6
summary(fit, digits=3) to get better resolution; too many numbers here look ugly on this screen.
These are the parameter posteriors found using the priors assumed in the model. There is nothing special about them, and neither is there anything curious.
What about the conjugate normal priors on the parameters? Well, change the ad hoc chosen-for-convenience prior, change the posterior. Feel free to change this one to see how it affects the results. But also, change the ad hoc chosen-for-convenience model, change the posterior. There is no mystery here! All probability at all times is conditional on the assumptions we make. Nothing has a probability. Probability is a state of mind deduced from assumptions. (A secret is that weird premise of infinite parameters and observations, which are impossible in practice, and which we'll see later.)
Now these posteriors are in the form of log odds, a screwy thing to think of. Of course, they can easily be transformed into odds by exponentiation. Still weird to think of, though, because these would not be odds of the observable, but of some strange multiplier. In any case, it would be a capital mistake to assume the certainty we have in these parameters is equal to the certainty of the observable, which is the only uncertainty we care about. Repeat that last sentence before continuing.
We have to pick a herd. How about 15 at period 1? We know with certainty that the incidence number was 1 with herd size 19. We are done! Unless we want to say something about as yet unseen herds that are causally like herd 15.
So let's look at the predictive posterior:
newdata = data.frame( herd = "15", size = 19, incidence = 1, period = "1") p = posterior_predict(fit,newdata) plot(table(p)/length(p),ylab='Probability', xlab='Incidence', main='Like Herd 15, Period 1')
Herd size was 19, so the maximum possible number of incidents is also 19. The probabilities, conditional on all the assumptions and data we have made so far, of every possible incident number are in this pic. Incidence numbers greater than 9 are pretty rare. These are not the probabilities: they are the probabilities conditional on all the assumptions we made to this point.
And we're done again! That picture is the answer.
"But what about the prior?" Well, what about the model? Okay. Change the prior above to see what a different prior does. If you're not sure which to use, then use both! Combine the models.
fit2 <- stan_glmer(cbind(incidence, size - incidence) ~ size + period + (1|herd), data = lme4::cbpp, family = binomial, QR = TRUE, prior_intercept = student_t(4, 0, 10, autoscale = FALSE)) # 4 is degrees of freedom; i.e. a wider prior p2 = posterior_predict(fit2,newdata) # examine the new model plot(table(p2)/length(p),ylab='Probability', xlab='Incidence', main='Like Herd 15, Period 1, Model 2') # combined plot(table(c(p,p2))/length(c(p,p2)),ylab='Probability', xlab='Incidence', main='Like Herd 15, Period 1\nCombined Model')
This combination is based on the statistical syllogism, with is the deduction based on the premise that we have two models and only two, therefore the prior probability of either is deduced as 1/2. Why this prior? No reason, except that it's one that's in common use. Why this model? No reason, except that it's one that's in common use.
What if we wanted to look at the same herd at period 2? I mean herds casually similar to ours (15) at period 2. We can, of course, but it requires more assumptions. From period 1 to 2 herd 15 lost 4 head. There was only 1 infection in period 1, which we can assume led to the cow's culling or death. But what about the other 3? These losses had some cause or causes--- which we must assume will be the same for our as yet unmeasured herd that is causally like 15!
# increase iterations to get better convergence fit <- stan_glmer(cbind(incidence, size - incidence) ~ size + period + (1|herd), data = lme4::cbpp, family = binomial, QR = TRUE, iter=10000) newdata = data.frame( herd = "15", size = 15, incidence = 1, period = "2") p = posterior_predict(fit,newdata) plot(table(p)/length(p),ylab='Probability', xlab='Incidence', main='Like Herd 15, Period 2', lwd=7, xlim=c(0,8)) axis(1,at=0:8,labels=0:8) newdata = data.frame( herd = "15", size = 18, incidence = 1, period = "2") p = posterior_predict(fit,newdata) lines(table(p)/length(p), col=2)
I reuse the
p, and rebuild the model for better convergence (recalling there is no magic or "randomness" in MCMC methods; they are just numerical approximations to integrations). It's still a bit off for the smallest probabilities. Increase it yourself for tighter results.
Obviously, the probabilities for number of incidents change because the assumption on herd size changes. Which is "the" assumption to use? There is none. It depends on the situation you think will obtain in the future.
"I'm still concerned about the prior. I'm a frequentist and priors make me nervous."
You're not concerned about the model, which is vastly more influential than the prior assumption, because as a frequentist you believe things "have" probabilities. That is, after all, what frequentist theory insists. Things don't have probabilities. All probability is deduced based on the assumptions we make. The ideal situation, which is not this one, and that observation is key, is when we start with true premises and deduce a model.
Here, we start with ad hoc premises based on past experience. These suggest a model, which may or may not be any good, and all models which use infinite parameters must have priors.
"But how do I tell if the model and priors are any good?"
You can't. Not now. There is no way to tell for certain, just as there is no way to tell for certain in any frequentist analysis. The only way to know is to use the model to make predictions of as-yet-unseen observations. If these predictions are useful---where useful depends on more assumptions which can vary from person to person---then the model is good. If not, then not.
Simple as it is, that is the answer.
To support this site and its wholly independent host using credit card or PayPal (in any amount) click here