Last time we created four models of CGPA. Which is correct? They all are. Why? I should ask as a homework question, but I’ll remind us here. Since all probability is conditional, and these are all ad hoc models, all are correct, given the assumptions. Which one is best? Depends on the decisions you want to make and on what you mean by “best.” Let’s see.
We discovered last time that by itself HGPA was relevant to CGPA, but not by much. SAT was also by itself relevant. The contour plots (which I have decided not to redo, since we did them last week) showed that SAT and HGPA when considered together are also relevant. We also created the “null” model (remember our terminology does not match the classical usage) which only used past data (and grading rules, etc.). We now have to see how useful each of these models are. (If you can’t remember what all these terms mean — review!)
In one sense, we cannot do too much more because we have the models and have made predictions using them. That was our goal, remember? Now we have to sit back and wait for new values of HGPA/SAT and CGPA come in. Then we can see how each of these models’ predictions match reality. This is the True Test.
Here, for instance, is an excerpt of predictions of the full model (HGPA/SAT) (I’m assuming we left off just where we were last time, so that
a are still in R’s memory; if not, go over last week’s code):
> cbind(y[,1:2],round(a,2)) sat hgpa 0 1 2 3 4 1 400 0 0.55 0.26 0.19 0.00 0.00 2 500 0 0.52 0.27 0.21 0.01 0.00 3 600 0 0.47 0.28 0.24 0.01 0.00 4 700 0 0.42 0.30 0.28 0.01 0.00 5 800 0 0.35 0.32 0.32 0.01 0.00 6 900 0 0.28 0.33 0.37 0.01 0.00 7 1000 0 0.22 0.35 0.41 0.02 0.00 8 1100 0 0.16 0.36 0.45 0.03 0.01 9 1200 0 0.11 0.36 0.48 0.04 0.01 10 1300 0 0.08 0.34 0.51 0.06 0.02 11 1400 0 0.05 0.31 0.52 0.08 0.03 12 1500 0 0.04 0.28 0.53 0.11 0.04 13 400 1 0.52 0.24 0.23 0.01 0.00 14 500 1 0.46 0.25 0.27 0.01 0.00 15 600 1 0.40 0.27 0.32 0.01 0.00 16 700 1 0.32 0.29 0.37 0.02 0.00 ... 57 1200 4 0.00 0.04 0.22 0.58 0.15 58 1300 4 0.00 0.04 0.18 0.63 0.15 59 1400 4 0.00 0.04 0.14 0.66 0.15 60 1500 4 0.00 0.05 0.12 0.69 0.14
We could publish this, or the whole table, and we’d be done! Anybody could take these predictions and implement them. They wouldn’t have to know the details of your model, or of your original data. There is your bold theory, exposed for the world to see! That, after all, is how science should work.
Of course, all the shortcomings of your model will be obvious to anybody who tries to use it, too. Which, again, is just how it should be.
Make sure you understand what we’ve done so far. If you are the Dean and want to classify students into one of five CGPA buckets, we have a predictive model accounting for HGPA and SAT. But suppose you didn’t want to account for HGPA. Well, we have a model of just SAT: use that. And so on. The breakdowns of SAT (every 100) and HGPA (every 1) we used were also geared to the decision. Change the decision, change the breakdown, change the model.
In any case, this is it! This is what we wanted. We wanted to know, given the grading rules and old obs, and values of SAT and HPGA, what is the probability of having CGPA in one of the buckets. And this is what we did! We are done. All those people who wanted practical examples of the predictive way, well, here you go. In the end, it’s pretty simple, isn’t it?
But we can do two more things. (1) We can compare our predictive model (perhaps varying it here and there) with old-fashioned NHST/parameter-estimating models, and (2) we can create a new model that predicts the performance our current model. Number (2) is the really important step, but we won’t get to it today. Let’s do (1).
What model would most use in this situation? A linear regression. Here it is (mixing code and output again). The
cgpa.o was the original CPGA, not classified into buckets. It is the raw score (the “o” is for original).
fit.n= glm(cgpa.o ~ sat + hgpa , data=x) # note the cgpa.o! which is the original data summary(fit.n) Call: glm(formula = cgpa.o ~ sat + hgpa, data = x) Deviance Residuals: Min 1Q Median 3Q Max -1.12153 -0.44120 0.00954 0.38198 1.80356 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.0881312 0.2866638 -0.307 0.759169 sat 0.0012167 0.0003011 4.041 0.000107 *** hgpa 0.4071133 0.0905946 4.494 1.94e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Wee p-values galore! The asterisks give the glory. Feel it. So, the classical statistician would say, SAT and HPGA are “linked to” CGPA. Researchers will speak of “statistical significance” and say “SAT drives high college grade point” and so on. How much does SAT influence (they’ll say “impact”) CGPA? Well, they might say for every increase in SAT by 1, CGPA goes up on average 0.0012 points. And so on.
Not much more would be done with this model, especially since everything is “significant”. Maybe the modeler throws in an interaction. Whatever. No matter what, this model exaggerates the evidence we have, and is in substantial error, even though it doesn’t look like it. Here’s why.
This model implies a prediction: all models imply predictions, even though they are not routinely made. It’s written in classical form, but the prediction is there, hidden away. Let’s look at it. We do so by integrating out the parameters, picking a “flat” prior, which, for this model anyway, gives us the exact same results for the parameters as the frequentist estimates.
fit is our mutlinomial (like) model, including both SAT and HGPA. Let’s pick two hypothetical students, a good one and a bad one, and compare our predictive model with the prediction based on the ordinary regression. Before running this, first re-download the class code, which has been updated to include the code which calculates probability predictions from normal regression models. This is the
obs.glm, which outputs a central, spread, and degrees-of-freedom parameter for the predictive distribution of the regression (this turns out to be a non-central T).
y = data.frame(cgpa = c("4","4"), sat=c(400,1500), hgpa = c(1,4)) #bad and good student a=predict(fit, newdata = y, type='prob')$p #our mulitnomial moels plot(levels(x$cgpa), a[1,],type='h',xlab='College GPA',ylab='Probability',ylim=c(0,1.1*max(a)),lwd=5) lines(levels(x$cgpa), a[2,],type='h',lty=2,col=2,lwd=3) grid() legend('topright', c("SAT = 400, HGPA = 1","SAT = 1500, HGPA = 4"),lty=1:2,col=1:2,bty='n',lwd=2) s1 = obs.glm(fit.n,y[1,]) # the prediction from the regression s2 = obs.glm(fit.n,y[2,]) m = s1$central-s2$central # plotting up the probability DENSITIES, NOT PROBABILITIES s = sqrt(s1$scale^2+s2$scale^2) w = seq(min(s1$central,s2$central)-3*s, max(s1$central,s2$central)+3*s, length.out = 100) lines(w,.7*dt((w-s1$central)/s1$scale,s1$df)/s1$scale,lty=1,lwd=2) lines(w,.7*dt((w-s2$central)/s2$scale,s2$df)/s2$scale,lty=2,col=2,lwd=2)
Our multinomial-like model gives the spikes; the regression densities, which are not probabilities, are the curves. We’ll fix the densities into probabilities later. But it’s densities, because the normal lives on the continuum, and so does the predictive distribution from the normal (the t). Notice anything odd? The regression gives probabilities to impossible values, i.e. CGPA less than 0 and greater than 4. I call this probability leakage.
It’s not terrible here, but it does exist. It means the model is predicting impossibilities. The standard regression model is at the least inefficient. Interestingly, this leakage model cannot be falsified. It gives positive probability to events we’ll never see, but it never gives 0 probability anywhere! Falsifiability isn’t that interesting.
That’s enough for this time. Next time, we turn the densities into probabilities, and make modifications to our multinomial model.
Homework Try the normal predictions for the singular models with SAT and HGPA alone, and for the “null” model, which is had by
glm(cgpa.o ~ 1 , data=x). Which has the most leakage? Are there huge differences?
Because you asked You can now download
briggs.homework.R, which contains all the code used in the lectures. Note that this is different than
briggs.class.R, which can be treated like black-box helper code.