Variant on a theme

We, dear readers, have earlier dealt with the nonsensical argument, of which an Enlightened few are excessively fond, "There is no truth." This is an argument often employed by those…

Stats 101: Chapter 8

Here is the link.

This is where it starts to get complicated, this is where old school statistics and new school start diverging. And I don’t even start the new new school.

Parameters are defined and then heavily deemphasized. Nearly all of old and new school statistics entire purpose is devoted to unobservable parameters. This is very unfortunate, because people go away from a parameter analysis far, far too certain about what is of real interest. Which is to say, observable data. New new school statistics acknowledges this, but not until Chap 9.

Confidence intervals are introduced and fully disparaged. Few people can remember that a confidence interval has no meaning; which is a polite way of saying they are meaningless. In finite samples of data, that is, which are the only samples I know about. The key bit of fun is summarized. You can only make one statement about your confidence interval, i.e. the interval you created using your observed data, and it is this: this interval either contains the true value of the parameter or it does not. Isn’t that exciting?

Some, or all, of the Greek letter below might not show up on your screen. Sorry about that. I haven’t the time to make the blog posting look as pretty as the PDF file. Consider this, as always, a teaser.

For more fun, read the chapter: Here is the link.



1. Background

Let?s go back to the petanque example, where we wanted to quantify our uncertainty in the distance x the boule landed from the cochonette. We approximated this using a normal distribution with parameters m = 0 cm and s = 10 cm. With these parameters in hand, we could easily quantify uncertainty in questions like X = “The boule will land at least 17 cm away” with the formula Pr(X|m = 0 cm, s = 10 cm, EN ) = Pr(x > 17 cm|m = 0 cm, s = 10 cm, EN ). R even gave us the number with 1-pnorm(17,0,10) (about 4.5%). But where did the values of m = 0 cm and s = 10 cm come from?

I made them up.

It was easy to compute the probability of statements like X when we knew the probability distribution quantifying its uncertainty and the value of that distribution?s parameters. In the petanque example, this meant knowing that EN was true and also knowing the values of m and s. Here, knowing means just what it says: knowing for certain. But most of the time we do not know EN is true, nor do we know the values of m and s. In this Chapter, we will assume we do in fact know EN is true. We won?t question that assumption until a few Chapters down the road. But, even given EN is true, we still have to discern the values of its parameters somehow.

So how do we learn what these values are? There are some situations where are able to deduce either some or all of the parameter’s values, but these situations are shockingly few in number. Nearly all the time, we are forced to guess. Now, if we do guess?and there is nothing wrong with guessing when you do not know?it should be clear that we will not be certain that the values we guessed are the correct ones. That is to say, we will be uncertain, and when we are uncertain what do we do? We quantify our uncertainty using probability.

At least, that is what we do nowadays. But then-a-days, people did not quantify their uncertainty in the guesses they made. They just made the guesses, said some odd things, and then stopped. We will not stop. We will quantify our uncertainty in the parameters and then go back to what is of main interest, questions like what is the probability that X is true? X is called an observable, in the sense that it is a statement about an observable number x, in this case an actual, measurable distance. We do not care about the parameter values per se. We need to make a guess at them, yes, otherwise we could not get the probability of X. But the fact that a parameter has a particular value is usually not of great interest.

It isn’t of tremendous interest nowadays, but again, then-a-days, it was the only interest. Like I said, people developed a method to guess the parameter values, made the guess, then stopped. This has led people to be far too certain of themselves, because it?s easy to get confused about the values of the parameters and the values of the observables. And when I tell you that then-a-days was only as far away as yesterday, you might start to be concerned.

Nearly all of classical statistics, and most of Bayesian statistics is concerned with parameters. The advantage the latter method has over the former, is that Bayesian statistics acknowledges the uncertainty in the parameters guesses and quantifies that uncertainty using probability. Classical statistics?still the dominate method in use by non-statisticians1?makes some bizarre statements in order to avoid directly mentioning uncertainty. Since classical statistics is ubiquitous, you will have to learn these methods so you can understand the claims people (attempt to) make.

So we start with making guesses about parameters in both the old and new ways. After we finish with that, we will return to reality and talk about observables.

2. Parameters and Observables

Here is the situation: you have never heard of petanque before and do not know a boule from a bowl from a hole in the ground. You know that you have to quantify x, which is some kind of distance. You are assuming that EN is true, and so you know you have to specify m and s before you can make a guess about any value of x.

Before we get too far, let?s set up the problem. When we know the values of the parameters, like we have so far, we write them in Latin letters, like m and s for the Normal, or p for the binomial. We always write unknown and unobservable parameters as Greek letters, usually ? and ? for the normal and ? for the binomial. Here is the normal distribution (density function) written with unknown parameters:

(see the book)

where ? is the central parameter, and ? 2 is the variance parameter, and where the equation is written as a function of the two unknowns, N(?, ?). This emphasizes that we have a different uncertainty in x for every possible value of ? and ? (it makes no difference if we talk of ? or ? 2 , one is just the square root of the other).

You may have wondered what was meant by that phrase “unobservable parameters” last paragraph (if not, you should have wondered). Here is a key fact that you must always remember: not you, not me, not anybody, can ever measure the value of a parameter (of a probability distribution). They simply cannot be seen. We cannot even see the parameters when we know their values. Parameters do not exist in nature as physical, measurable entities. If you like, you can think of them as guides for helping us understand the uncertainty of observables. We can, for example, observe the distance the boule lands from the cochonette. We cannot, however, observe the m even if we know its value, and we cannot observe ? either. Observables, the reason for creating the probability distributions in the first place, must always be of primary interest for this reason.

So how do we learn about the parameters if we cannot observe them? Usually, we have some past data, past values of x, that we can use to tell us something about that distribution?s parameters. The information we gather about the parameters then tell us something about data we have not yet seen, which is usually future data. For example, suppose we have gathered the results of hundreds, say 200, of past throws of boules. What can we say about this past data? We can calculate the arithmetic mean of it, the median, the various quantiles and so on. We can say this many throws were greater than 20 cm, this many less. We can calculate any function of the observed data we want (means and medians etc. are just functions of the data), and we can make all these calculations never knowing, or even needing to know, what the parameter values are. Let me be clear: we can make just about any statement we want about the past observed data and we never need to know the parameter values! What possible good are they if all we wanted to know was about the past data?

There is only one reason to learn anything about the parameters. This is to make statements about future data (or to make statements about data that we have not yet seen, though that data may be old; we just haven?t seen it yet; say archaeological data; all that matters is that the data is unknown to you; and what does “unknown” mean?). That is it. Take your time to understand this. We have, in hand, a collection of data xold , and we know we can compute any function (mean etc.) we want of it, but we know we will, at some time, see new data xnew (data we have not yet seen), and we want to now say something about this xnew . We want to quantify our uncertainty in xnew , and to do that we need a probability distribution, and a probability distribution needs parameters.

The main point again: we use old data to make statements about data we have not yet seen.

Stats 101: Chapter 4

Chapter 4 is ready to go.

This is where it starts to get weird. The first part of the chapter introduces the standard notation of “random” variables, and then works through a binomial example, which is simple enough.

Then come the so-called normals. However, they are anything but. For probably most people, it will be the first time that they hear about the strange creatures called continuous numbers. It will be more surprising to learn that not all mathematicians like these things or agree with their necessity, particularly in problems like quantifying probability for real observable things.

I use the word “real” in its everyday, English sense of something that is tangible or that exists. This is because mathematicians have co-opted the word “real” to mean “continuous”, which in an infinite amount of cases means “not real” or “not tangible” or even “not observable or computable.” Why use these kinds of numbers? Strange as it might seem, using continuous numbers makes the math work out easier!

Again, what is below is a teaser for the book. The equations and pictures don’t come across well, and neither do the footnotes. For the complete treatment, download the actual Chapter.


1. Variables

Recall that random means unknown. Suppose x represents the number of times the Central Michigan University football team wins next year. Nobody knows what this number will be, though we can, of course, guess. Further suppose that the chance that CMU wins any individual game is 2 out of 3, and that (somewhat unrealistically), a win or loss in any one game is irrelevant to the chance that they win or lose any other game. We also know that there will be 12 games. Lastly, suppose that this is all we know. Label this evidence E. That is, we will ignore all information about who the future teams are, what the coach has leaked to the press, how often the band has practiced their pep songs, what students will fail their statistics course and will thus be booted from the team, and so on. What, then, can we say about x?

We know that x can equal 0, or 1, or any number up to 12. It’s unlikely that CMU will loss or win every game, but they?ll prob ably win, say, somewhere around 2/3s, or 6-10, of them. Again, the exact value of x is random, that is, unknown.

Now, if last chapter you weren?t distracted by texting messages about how great this book is, this situation might feel a little familiar. If we instead let x (instead of k?remember these letters are place holders, so whichever one we use does not mat
ter) represent the number of classmates you drive home, where the chance that you take any of them is 10%, we know we can figure out the answer using the binomial formula. Our evidence then was EB . And so it is here, too, when x represents the number of games won. We?ve already seen the binomial formula written in two ways, but yet another (and final) way to write it is this:

x|n, p, EB ? Binomial(n, p).

This (mathematical) sentence reads “Our uncertainty in x, the number of games the football team will win next year, is best represented by the Binomial formula, where we know n, p, and our information is EB .” The “?” symbol has a technical definition: “is distributed as.” So another way to read this sentence is “Our uncertainty in x is distributed as Binomial where we know n, etc.” The “is distributed as” is longhand for “quantified.” Some people leave out the “Our uncertainty in”, which is OK if you remember it is there, but is bad news otherwise. This is because people have a habit of imbuing x itself with some mystical properties, as if “x” itself had a “random” life. Never forget, however, that it is just a placeholder for the statement X = “The team will win x games”, and that this statement may be true or false, and it?s up to us to quantify the probability of it being true.

In classic terms, x is called a “random variable”. To us, who do not need the vague mysticism associated with the word random, x is just an unknown number, though there is little harm in calling it a “variable,” because it can vary over a range of numbers. However, all classical, and even much Bayesian, statistical theory uses the term “random variable”, so we must learn to work with it.

Above, we guessed that the team would win about 6-10 games. Where do these number come from? Obviously, based on the knowledge that the chance of winning any game was 2/3 and there?d be twelve games. But let?s ask more specific questions. What is the probability of winning no games, or X = “The team will win x = 0 games”; that is, what is Pr(x = 0|n, p, EB )? That’s easy: from our binomial formula, this is (see the book) ? 2 in a million. We don’t need to calculate n choose 0 because we know it?s 1; likewise, we don?t need to worry about 0.670^0 because we know that?s 1, too. What is the chance the team wins all its games? Just Pr(x = 12|n, p, EB ). From the binomial, this is (see the book) ? 0.008 (check this). Not very good!

Recall we know that x can take any value from zero to twelve. The most natural question is: what number of games is CMU most likely to win? Well, that’s the value of x that makes (see the book) the largest, i.e. the most probable. This is easy for a computer to do (you’ll learn how next Chapter). It turns out to be 8 games, which has about a one in four chance of happening. We could go on and calculate the rest of the probabilities, for each possible x, just as easily.

What is the most likely number of games the team will win is the most natural question for us, but in pre-computer classical statistics, there turns out to be a different natural question, and this has something to do with creatures called expected values. That term turns out to be a terrible misnomer, because we often do not, and cannot, expect any of the values that the “expected value” calculations give us. The reason expected values are of interest has to do with some mathematics that are not of especial interest here; however, we will have to take a look at them because it is expected of one to do so.

Stats 101: Chapter 3

Three is ready to go. I should re-emphasize one of the goals of this book. It is meant to be for that large host of unfortunates who are forced---I mean…

Stats 101: Chapter 2

Chapter 2 is now ready for downloading---it can be found at this link. This chapter is all about basic probability, with an emphasis on understanding and not on mechanics. Because…