I should re-emphasize one of the goals of this book. It is meant to be for that large host of unfortunates who are forced—I mean required—to take a statistics course and, importantly, do not want to. This is why a lot of formulas and methods do not make their traditional appearance. Understanding—and not rote—is paramount.
The material is enough to cover in one typical semester. The student will not learn how to handle many different kinds of data, but he damn well will comprehend what somebody is saying when they make a probability statement about data.
Face it. The vast majority of students who sit through statistics classes never again compute their own regression models, factor analyses, etc., etc. But they often read these kinds of results prepared by others. I want them, as their eyes meet a p-value, say to themselves, “Aha! Here is one of those p-value things Stats 101 warned me about! Sure enough, it is being misused yet again. I don’t know the right answer in this study, but I do know what is being claimed is too certain.”
If I can do that, then I will be a happy man.
(The contents of Chapter 3 now follow. If you use Firefox > version 2.0, then you will be able to see all the characters on your screen. Else some of the content may be a little screwy. I apologize for this. If you can’t read everything below, consider this a tease for the real thing. You can always download the chapter and print it out.)
How to Count
1. One, two, three…
Youtube.com has a video at this URL
The important part is that “
v=wcCw9RHI5mc” business at the end, which essentially means “this is video number
wcCw9RHI5mc“. This video is, of course, different than number
wcCw9RHI5md, and number
wcCw9RHI5me and so on. We can notice that the video number contains 11 different slots (count them), each of which is filled with a number or upper or lower case Latin letter, which means the number is case sensitive; A differs from a. The question is, how many different videos can Youtube host given this numbering scheme? Are they going to run out of numbers anytime soon?
That problem is hard, so we?ll start on a simpler one. Suppose the video numbering scheme only allowed one slot, and that this slot could only contain a single-digit number, chosen from 0-9. Then how many videos could they host? They?d have v=0, v=1 and so on. Ten, right? Now how about if they allowed two slots chosen from 0-9. Just 10 for the first, and 10 for each of the 10 of the first, a confusing way of saying 10 ? 10. For three slots it?s 10 ? 10 ? 10. But you already knew how to do this kind of counting, didn?t you?
Suppose the single slot is allowed only to be the lower case letters a,…,z? This is v=a, v=b, etc. How many in two such slots? Just 26 ? 26 = 676. Which is the same way we got 100 in two slots of the numbers 0-9.
So if allow any number, plus any lower or upper case letter in any slot, we have 10 + 26 + 26 = 62 different possibilities per slot. That means that with 11 slots we have 62 ? 62 ? ? ? ? 62 = 6211 ? 5 ? 1019 , or 50 billion billion different videos that Youtube can host.
How many ways are there of arranging things? In 1977, George Thorogood remade that classic John Lee Hooker song, “One Bourbon, One Scotch, and One Beer.” This is because George is, of course, the spiritous counterpart of an oenophile; that is, he is a connoisseur of fine spirits and regularly participates in tastings. Further, George, who is way past 21, is not an idiot and never binge drinks, which is about the most moronic of activities that a person could engage in. He very much wants to arrange his coming week, where he will taste, each night, one bourbon (B) , one scotch (S), and one beer (R). But he wants to be sure that the order he tastes these drinks doesn?t influence his personal ratings. So each night he will sip them in a different order. How many different nights will this take him? Write out what will happen: Night 1, BSR; night 2, BRS; night 3, SBR; night 4, SRB; night 5, RBS; night 6, RSB. Six nights! Luckily, this still leaves Sunday free for contemplation.
Later, George decides to broaden his tasting horizons by adding Vernors (the tasty ginger ale aged in oak barrels that can’t be bought in New York City) to his line up. How many nights does it take him to taste things in different order now? We could count by listing each combination, but there?s an easier way. If you have n items and you want to know how many different ways they could be grouped or ordered, the general formula is:
n! = n ? (n ? 1) ? (n ? 2) ? ? ? ? ? 2 ? 1
The term on the left, n!, reads “n factorial.” With 4 beverages, this is 4 ? 3 ? 2 ? 1 = 24 nights, which is over three weeks! Good thing that George is dedicated.
3. Being choosy
It?s the day before Thanksgiving and you are at school, packing your car for the drive home. You would have left a day earlier, but you didn?t want to miss your favorite class?statistics. It turns out that you have three friends who you know need a ride: Larry, Curly, and Moe. Lately, they have been acting like a bunch of stooges, so you decide to tell them that your car is just too full to bring them along. The question is, how many different ways can you arrange your friends to drive home with you when you plan to bring none of them? This is not a trick question; the answer is as easy as you think. Only one way?that is, with you driving alone.
But, they are your friends, and you love them, so you decide to take just one. Now how many ways can you arrange your friends so that you take just one? Since you can take Larry, Curly, or Moe, and only one, then it?s obviously three different ways, just by taking only Larry, or only Curly, or only Moe. What if you decide to take two, then how many ways? That?s trickier. You might be tempted to think that, given there are 3 of them, that the answer is 3! = 6, but that?s not quite right. Write out a list of the groupings: you can take Larry & Curly, Larry & Moe, or Moe & Curly. That?s three possibilities. The grouping “Curly & Larry,” for example, is just the same as the grouping “Larry & Curly.” That is, the order of your friends doesn?t matter: this is why the answer is 3 instead of 6. Finally, all these calculations have made you so happy that you soften your heart and decide totake all three. How many different groupings taking all of them are possible? Right. Only one.
You won’t be surprised to learn that there is a formula to cover situations like this. If you have n friends and you want to count the number of possible groupings of k of them when the order does not matter, then the formula is
(see the book)
The term on the left is read “n choose k”. By definition (via some fascinating mathematics) 0! = 1. Here are all the answers for the Thanksgiving problem:
(see the book)
There are some helpful facts about this combinatorial function that are useful to know. The first is that n choose 0 always equals 1. This means, out of n things, you take none; or it means there is only one way to arrange no things, namely no arrangement at all. n choose n is also always 1, regardless of what n equals. It means, out of n things, you take all. n choose 1 always equals n, and so does n choose n?1 : these are the number of ways of choosing just 1 or just n ? 1 things. As long as n > 2, n > n , which makes sense, because you can make more groups of 2 than of 1.
4. Counting meets probability: The Binomial distribution
We started the Thanksgiving problem by considering it from your point of view. Now we take Larry, Moe, and Curly’s perspective, who are waiting in their dorm room for your call. They don’t yet know whether which, or if any of them, will get a ride with you. Because they do not know, they want to quantify their uncertainty and they do so using probability. We are now entering a different realm, where counting meets probability. Take your time here, because the steps we follow will the same in every probability problem we ever do.
Moe, reminiscent, recalls an incident wherein he was obliged to poke you in the eyes, and guesses that, since you were somewhat irked at the time, the probability that you take any one of the gang along is only 10%. That is, it is his judgment that the probability that you take him, Moe, is 10%, which is the same as you would also (independently) take Curly and so on. So the boys want to figure out the probability that you take none of them, take one of them, take two of them, or take all three of them.
Start with taking all three. We want the probability that you take Larry and Moe and Curly, where the probability of taking each is 10%. Remember probability rule #2? Those “ands” become “times”: so the probability of taking all three is 0.1 ? 0.1 ? 0.1 = 0.001, or 1 in a 1000. Keep in mind: this is from their perspective, not yours. This is their guess of the chances; because you may already have made up your mind?but they don?t know that.
What about taking none of them? This is the chance that you do not take Larry and you do not take Moe, and you do not take Curly. The key word is still “and;” which makes the probability (1 ? 0.1) ? (1 ? 0.1) ? (1 ? 0.1) = 0.93 ? 0.73, since the probability of not taking Larry etc. is one minus the probability of taking him etc. It is, too, because you can either take Larry or not; these are the only two things that can happen, so the probability of taking Larry or not must be 1. We can write this using our notation: let A = “Take Larry”, then AF = “Don’t take him”. Then Pr(A ? AF |E) = Pr(A|E) + Pr(AF |E) = 1, using probability rule #1. So if Pr(A|E) = 0.1, then Pr(AF |E) = 1?Pr(A|E) = 0.9. In this case, E is the information dictated by Moe (who is the leader), which lead him to say Pr(A|E) = 0.1.
How about taking just one? Well, you can take Larry, not take Moe, and not take Curly, and the chance of that is (using rules #1 and #2 together) 0.1 ? (1 ? 0.1) ? (1 ? 0.1) ? 0.08; but you could just as easily have taken Moe and not Larry, or Curly and not Larry, and the chance you do either of these is just the same as you taking Larry and not the other two. For shorthand, write M as “Take M” and so on, and MF as not take M and so on. Thus you could “LMF CF or LF MCF or LF MF C.” Using probability rule #1, we break up this statement into three pieces (“LMF CF “), and then use probability rule #2 on each piece (“ands” turn to times), then add the whole thing up.
You could do all that, but there is an easier way. You could notice there are three different ways to take just one?which we remember from our choosing formula, eq. (10). This makes the probability 3 0.08 = 3 ? 0.08 = 0.24. Since we already know the probability of taking one of those combinations, we just multiply it by the number of times we see it. We could have also written the answer like this:
0.11 x (1 ? 0.1)^2 = 0.24.
And we could also written the first situation (taking all of them) in the same way
0.13 x (1 ? 0.1)^0 = 0.001.
where you must remember that a^0 = 1 (for any a you will come across).
You see the pattern by now. This means we have another formula to add to our collection. This one is called the binomial and it looks like this:
There is a subtle shift in notation with this formula, made to conform with tradition. “k” is shorthand for the statement, in this instance, K = “You take k people.” For general situations, k is the number of “successes”: or, K = “The number of successes is k”. Everything to the right of the “|” is still information that we know. So n is shorthand for N = “There are n possibilities for success”, or in your case, N = “There are three brothers which could be taken.” The p means, P = “The probability of success is p”. We already know EB , written here with a subscript to remind us we are in a binomial situation. This new notation can be damn convenient because, naturally, most of the time statisticians are working with numbers, and the small letters mean “substitute a number here,” and if statisticians are infamous for their lack of personality, at least we have plenty of numbers. This notation can cause grief, too. Just how that is so must wait until later.
Don?t forget this: in order for us to be able to use a binomial distribution to describe our uncertainty, we need three things. (1) The definition of a success: in the Thanksgiving example, a success was a person getting a ride. (2) The probability of a success is always the same. (3) The number of chances for successes is fixed.
Italian soccer training camp indeed. Excellent video.
“Face it. The vast majority of students who sit through statistics classes never again compute their own regression models, factor analyses, etc., etc. But they often read these kinds of results prepared by others. I want them, as their eyes meet a p-value, say to themselves, ?Aha! Here is one of those p-value things Stats 101 warned me about! Sure enough, it is being misused yet again. I don?t know the right answer in this study, but I do know what is being claimed is too certain.”
Okay. But we call this aiming low, or reduced expectations.
If a scientist-in-training, someone who will be using complex statistical methods, gets a “D” in regular statistics, should we still give them a science diploma?
Answer: it happens all the time!!!!
At its very heart, statistics is the Scientific Method. Logical inference. Using measurements, which are generally numbers. I sympathize with those who cannot speak Algebra, but does that mean we should hire them to fill slots in Science Dept., and accept their scientific pronouncements?
This is a real world problem. As a consulting statistician I deal every day with high level Ph.D. science project leaders who cannot tie their own shoes, statistically speaking. And their “science” output is a pile of stinking ******. Yet they get the big bucks, and what’s worse, their putrid abortions of science become Law!
There may be virtues to making it easy, but there are also drawbacks. Do you want to undergo surgery when your doc is a quack? Do you wish to have your economy ruined, freedoms curtailed, future destroyed, because the “scientists” in charge have no idea what the Scientific Method is, and are just power-tripping PC dream-it-ups while staring at a brick wall?
Answer: it happens all the time!!!!
So I have mixed emotions. Don’t underestimate your students. And if 9 out of 10 can’t hack it, maybe be thankful for No. 10. Don’t blame yourself for the General Decline of Intelligence. Instead preserve and nurture what you can when the opportunity arises.
All of which is not a criticism of the Book, which is coming along nicely. I do sense some desperation, though, involving the desire to teach and convey Knowledge. Which is not a bad thing. I feel the same way.
What I want to be easy are the computational aspects, which are a pain in the ass particularly in classical statistics. Too many ad hoc formulae to be memorized.
The problem is, you can make the students work hard and many can kind sorta recall how to hand calculate something, but they cannot remember what it means.
The immense effort of memorizing that this statistic is divided by n, the other by the square root of n-1, etc., etc. can not make you remember that your goal in the first place was not to say something about some mysterious statistic, but about some real observable thing.
It is a good point to not assume students are as dumb as most professors think they are. I have been told routinely that students will not be able to (a) understand modern statistics (even though it is more intuitive to them than frequentism), and (b) they cannot type commands into a computer. I ignore both of these warnings and the students do fine.
Anyway, the first 7 chapters of the book are just groundwork, necessary material to understand what probability and statistics are really doing.
You seem to me to be on the right track. It’s certainly what this old fart needs 🙂
MikeD, I sympathise. I come across scientists all the time who don’t understand the difference between scientific law and scientific theory. That’s pretty basic stuff. There’s an awful lot of assumption goes on in science classes as Feynman pointed out often enough.
PG — Chemistry, physics, and other “lab” sciences are one thing, outdoor environmental sciences are quite another. Climate modeling is one example, but all the enviro sciences suffer from junk methods. We call it GAP science: guess all parameters.
The plethora of multivariate nonsense stats in the enviro sciences with eigen vectors, canonical correlations, community similarities, etc. are complete and utter bogosity. Yet that crap fills journals that fill libraries.
Which is why most enviro sciences have not advanced much beyond Medieval superstition. We are drowning in ignorance.
Why do you think it’s mostly prospective scientists who are taking statistics classes? Most business and Poly-sci students have to take at least a survey course. This sounds like an idea approach for most of these folks.
I hope your approach does attract non-science majors, and teaches them something, too. Fear of math is fairly widespread in most disciplines, so a kinder, gentler way could be enticing.
Lord knows something has to be done to drain the ignorance swamp. And caning students has been outlawed, leaving few other choices.
On your Larry, Moe, Curly problem, why does the “and” when we’re taking all three of them mean that we are supposed to multply their individual probabilities together?
This is from Probability Rule #2, where “ands” become “times.” This is from Chapter 2.