Since the subject has come up so often, today a note on the words correlated and correlation. They have technical definitions and plain English meanings. The two definitions overlap but they are not equivalent.
Suppose you have these two propositions: X = “Jack has an IQ of 107” and Y = “Jack makes $72,000 a year.” And you wonder, does Jack’s IQ have a bearing on his salary? Or does Jack’s salary have a bearing on his IQ? Higher incomes might imply a softer lives, more leisure time and perhaps more bodily ease for the little gray cells to flourish. So the latter question might be answered “yes.”
Problem is, we can’t answer either of these two questions without making recourse to other evidence. And if we want to quantify the answers, we also have to fix our meaning of “has a bearing.” This part is simple. If we knew X or assumed X was true for the sake of argument, then given X the probability of Y being true changes if we knew or assumed X was false. This “has a bearing” captures what we mean when we say X causes Y or if X is merely related to Y but is perhaps not in the “causal path” of Y.
For instance, there might be some W that causes both X and Y simultaneously; in this case knowledge of X “has a bearing” on knowledge of Y. Or it might be that X caused A which causes B which causes C and so on right up to Y. Or this path might be reversed. But once again, knowledge of X has a bearing on our knowledge of Y, even if we know nothing directly of A, B, C, etc.
A classical statistician wondering whether Jack’s IQ has a bearing on his salary would probably venture forth and collect data on Jill’s IQ and salary, and likewise data from Bill and from Alice, and from Will and Wilma, and so on. This maneuver adds the additional information or evidence we required. Why do we require this? Well, what is the answer to this:
Pr (Y | X) = ?
This is “What is the probability Y is true given (or assuming) X is true?” It has no answer in this form. If you find yourself supplying an answer, it is because you are implicitly adding extra evidence not stated in the formula. That is, you are doing something like this:
Pr (Y | X & A) = some number between 0 and 1,
where A was mentally supplied by you. Just as it was supplied by the statistician who collected the other pairs of IQ and salaries, which also implies (this is part of the statistician’s “A”) that these pairs are relevant to Jack; it also assumes that the causal path (and our certainty in it) from X to Y is the same for all these pairs. (This sameness can be changed, as in regression say, but sameness is the first belief.)
Now imagine we make a plot of our pairs: at each observation “X = Jill has an IQ of 108” and Y = “Jill has a salary of $74,500” we make a dot at (108, 74500), and so forth. To the extent that a straight line draw through the midst of these scattered points approximates the points themselves, the higher we say the correlation is. If all the points lined up exactly on this straight line, the correlation is “1” or exact. If the points are spread from near to far and do not look at all friendly to the line, the correlation is “0” or nearly.
This is the technical definition: if our gathering of Xs and Ys can be approximated by a straight line, they are said to be “correlated” or that the two variables have “non-zero correlation.”
Now imagine a sine wave. Here we have statements like X1 = “We are at time point 1” or X2 = “We are at time point 1.01” or whatever, with Y = “The sine at time point 1 is 0.84” and Y = “The sine at time point 1.01 is 0.85” and so forth. In this case, given the additional information on the formula of the sine, we can say that X directly causes Y to take the values it does. That is (ignoring rounding error),
Pr (“The sine at time point 1 is 0.84” | “We are at time point 1” & S) = 1,
where S is the knowledge we have of the sine (see any trig or intro calculus book for this). But if we plotted1 a bunch of these Xs and Ys we would find the (technical) correlation between these Xs and Ys was somewhere in the vicinity of 0. This strange happenstance is because the extra evidence here purposely ignores S, the knowledge of the sine wave. It replaces S with some M, which assumes that, given X, our knowledge of Y is quantified by a normal distribution. Why ignore S? Well, just so we can replace it with M. If this seems odd, then know that in many statistical models relevant information like S is often ignored.
Anyway, we finally arrive at the most succinct definitions. Technical correlation is when a straight line approximates pairs of Xs and Ys. Plain English correlation is when knowledge of X changes the certainty we have in Y. Plain English correlation thus encapsulates technical correlation. Plain English correlation can also be called relevance, which is similar (but not identical to) technical “dependence.” About that, another day.
1For once, Wikipedia has some good plots of functions like the sine where we know there is causality but where the correlation is 0 or near 0; they also have the formula for technical correlation.