We began our first predictive analysis, and spent a lot of time with it. But we still haven’t got to the main question!
And that is how it should be.
Since the predictive method separates probability from decision and emphasizes decision, we should spend most of our time with defining the decision. The probability part will be easy, and is just math. So, as they say on the planes, but here I mean it, sit back, relax, and enjoy the flight.
When we left off, we were exploring CGPA. If a person was only taking one class, there were 14 possible CGPAs (0, 0.33, …, 4.33). Now, if all we knew was the scoring (grading) system and that person was taking just one class, then we deduce the probability (from the symmetry of logical constants leading to the statistical syllogism) of a CGPA of, e.g., 4 as 1/14—and the same for the other possibilities.
Because all probability is conditional on a specified list of premises, and only on that list, it’s well to be explicit. The probability CGPA equals, say, 0, needs givens. Those assumptions, premises, givens, truths, are the list itself, (0, 0.33, …, 4.33), the implicit premise that CGPA must be one of these; or the explicit premise there is only one class and explicit rules of the scoring system which together imply the list. Notice we do not allow an “incomplete”. Why not? Why not indeed? It is as assumption on our part and nothing else. If we assumed an incomplete, the probability changes (homework: how?). Remember: the all in “if all we knew…” is as rigorous as can be. We calculate the probability on these premises and none other. Probability is not subjective, except in the sense that we choose the premises: after the premises are chosen, probability is deduced.
If the person were taking 2 classes, there are 196 different possible grades (from 14^(number of classes); see this document on permutations), of which only 42 are unique (0, 0.165, 0.330, 0.335, …, 4.33). If all we knew were the scoring system and that there were 2 classes, the chance of a CGPA = 3 is 9/196 = 0.046. Use this self-explanatory R code to play (but don’t push r much beyond 5!; install gtools if you don’t have it; this code is not meant to be efficient, but explicative; if you can’t follow the code, don’t worry, just use it).
# possible grades; a premise set by us
s = c(0,.33, .67, 1, 1.33, 1.67, 2, 2.33, 2.67, 3, 3.33, 3.67, 4, 4.33)
r=2 # number of classes; another premise
result = as.matrix(expand.grid(lapply(numeric(r), function(x) s)), ncol=r)
cgpa = apply(result,1,function(x) sum(x)/r)
table(cgpa) gives a count of possibilities for each CGPA; i.e. with r = 2 there is only 1 way to get 0, 2 ways to get 0.165, and so on.
Again, the “all” in the “all you know” cannot be stressed too highly. All probability is conditional on the information assumed, and only on that information, so the probabilities above are only valid assuming just the premises given and none other. In particular, it does not matter what you might know about a person and their study habits, or the school, or anything else. The probabilities are true given the premises. Whether these are the right premises for the question we want to answer is another question which we’ll explore — in depth — later.
Now, what if all we knew were the scoring system and that the person were going to take 1 or 2 classes? Suppose we’re interested in a CGPA of 3 again. If 1 class, the probability is 1/4; if 2 classes, it’s 9/196. And since we don’t know if 1 or 2 classes, we apply the statistical syllogism again, and deduce 1/2 * 1/4 + 1/2 * 9/196 = 0.148.
You can see that we in principle can derive exact answers—though the counting will grow difficult. For a “full load” of 12 classes, there are 14^12 possible grades (5.7e13), of which only about 1,000 are unique.
Two of these possibilities are 1.860833 and 1.861667. We could, of course, compute the probability of these CGPAs given the by-now usual premises. But is that what we want? Is this the decision? Compute the probability of barely distinguishable grade points?
It could be that we care about such small differences. If we do, then we have the apparatus to solve the problem. Not for incorporating SAT or HGPA or past observations yet, but for our “naked” premises. We’ll come to that other information in time. But let’s be clear what we’re trying to do first or we risk making all the usual mistakes.
Now, I do not care about such small differences, and neither do most people. I just do not want to differentiate (though I could if I wanted) between, e.g. 1.860833 and 1.861667. To the nearest, say, tenth place is good enough for the decision I want to make about CGPA. Yet small differences are important if our goal is ordering; if, say, we want to predict who has the highest or lowest CGPA and that kind of thing. We’re not doing that there. Our decision is quantifying uncertainty in CGPA for individual people and accuracy to the 6th decimal place isn’t that interesting to me — to you it might be.
We have a decision about our decision to make: keep the small differences, which carry computational burdens and produces not very interesting answers, or make an approximation. Pay attention here. Tradition (classical methods) approximates the finite discrete CGPA as a continuous number, usually on the real line, a.k.a. the continuum. This approximation is so common that few pause to think it is an approximation! But, of course, it is, and a crude one.
If this most important point has not sunk in, then stop and think on it.1
One difficulty with the traditional approximation is that it says the probability of any caCGPA (the “ca” prefix is for the continuous approximation) is 0, which is dissatisfying (the continuum is a strange place!). The benefit is that all sorts of canned software is ready for use, and the math is much easier. Whether these benefits are worth it is the point in question and cannot be assumed true in all problems.
Besides the continuum, another approximation is to compress CGPA. It is already finite and discrete: we keep that nature, but further reduce the level of detail. I don’t care about the differences between 1.860833 and 1.861667, but suppose I do care about the difference between 1 and 2, and between 2 and 3, and 3 and 4.
That is, one compression is to put CGPA on the set (0, 1, 2, 3, 4). There are no computational difficulties with such a small set; all probability statements based on it are readily calculated. Number of classes has much less effect on this set, too.
It’s a crude compression, true. Still, that doesn’t mean a useless one. It depends—as all things do—on the decisions I want to make with CGPA. If I’m a Dean of some sort (Heaven forfend), this compression may be perfect, and I can even consider going cruder, say, (0-2, 3-4).
Or again, it may be too crude at that. Maybe every tenth is more what I’m looking for, especially if I’m considering eligibility of some scholarship.
We’ll see what these approximations do next time.
I’ll answer all pertinent questions, but please look elsewhere on the site (or in Uncertainty) for criticisms of classical methods. Non-pertinent objections will be ignored.
1You may argue that CGPA is embedded (in some mathematical sense) in an infinite sequence, and thus CGPA would live on the continuum, and thus the continuous is no longer an approximation. Since probability is conditional, accepting this condition works in the math. But, of course, CGPA is not embedded in any infinite sequence. Nothing is, because nothing contingent is infinite. So we’re back to the continuous as an approximation.