This article, which originally appeared 20 May 2010, was inspired by reading Fredwin On Evolution, as wisely suggested by reader Bob Ludwick. At the bottom of Fred’s piece, there appears a rough calculation, which I expand here.
How long would it take a monkey typing randomly to reproduce the completes works of William (great name, incidentally) Shakespeare?
Once we know that, we can answer how long it would take a barrelful. If that is, we knew how many monkeys would fit in a standard barrel. In experiments conducted by your author, I can tell you the answer is eleven, but you have to press hard.
A typewritten work is composed, of course, of words, and in between those words are spaces and the occasional punctuation. Separating the words are headings, themselves comprised of words and numbers.
According to Bennett, Briggs (no relation), and Triola, Shakespeare penned 884,647 words, which isn’t as many as you would think. A standard newspaper-style column, of the kind you read at websites such as this, is 800 words. If Will wrote columns, 884,647 words would fill about 1,100 columns.
And if he wrote one column per day, then it would only take three years to have an oeuvre. After that, of course, comes retirement. Sounds like a government job, no?
Now, each, or nearly each, word Shakespeare wrote was accompanied by a space, this being a peculiarity of English. There are no spaces in Chinese and Japanese, for example. Each word consists of letters, there being in English 26 of them: to make less work for our monkeys, we’ll assume case insensitivity: capitals, lower case, all the same to us.
We could count all the letters in those 884,647 words, but that would take too long (I don’t have the files at hand). Instead we’ll use the average word length in English (in Shakespeare’s day): it was 5 letters. And let’s not forget punctuation, which is roughly 10% of his published work (scientifically estimated by glancing at Act IV of The Tempest).
This gives us
#characters = (#words x avg. word length + #spaces) x punctuation
#characters = (884,647 x 5 +884,647) x 1.1 = 5,838,670
which is close enough to 6 million for anybody. That’s 6 million characters, consisting of letters, numbers, the space, and punctuation.
There are 26 letters, 10 numbers, 1 space, and 8 (that I could see) punctuation marks—[‘ , : ; ! ? . –]. That’s 45 different characters.
Ready? Gather your monkeys and sit them in front of a keyboard. In order to reproduce Shakespeare, what has to happen first?
Well, the Bard’s earliest published work was Venus and Adonis, which begins “Even as the sun with purple-colour’d face…” In order to reproduce all his works, your monkey has to at least reproduce Venus and Adonis, and in order to reproduce that, your monkey has to at least reproduce the first word, which is “Even”.
And in order to reproduce that, your monkey has to first type an ‘e’. Suppose he has typed an ‘r’, which is not an ‘e’. And then suppose he typed “ven as the sun with purple-colour’d face…” and all the other letters, numbers, spaces, and punctuation in proper order.
Has your monkey reproduced the complete works of Shakespeare? No, sir, he has not. If you think of Shakespeare’s works as a key, then any deviation from that key won’t work in our lock (whatever that is). So our monkey has to have every i dotted, every t crossed, in the proper order.
Given our information (our evidence, our premises), what is the probability that your monkey types an ‘e’? There are 45 possibilities, so the chance is 1 in 45, or 0.022222.
Given our evidence and given the additional fact that your monkey did type an ‘e’ first, what is the chance he types the second character correctly? Right: it’s the same; 1 in 45. And so on for each character.
What are the chances of your monkey typing, in order, all the characters? It’s the probability of typing the first correctly, times the probability of typing the second correctly, and so on. This is
Prob = (1/45^(6,000,000)) ~ 2 x 100-6,000,000
where I have approximated 1/45 by 2/100, which is close enough. We can write that better using logs (base 10), because
2 x 100-6,000,000 = 10log(2) – 2 x 6,000,000 ~ 10-12,000,000
which is an awfully small number (log(2) = 0.3, which is so tiny that it’s not worth subtracting from 12,000,000). This is a 1 divided by a 1 followed by 12 million zeros.
Keep in mind that a googol (no, the other one) is defined as 10100, which is just plain large. Our number (rather, its inverse) is much, much—much!—bigger. (However, our number is smaller than a googolplex.)
The universe is roughly 14 billion years old, which is about 4.4 x 1017 seconds. It now becomes tricky. Do we only accept a monkey’s efforts that are of the correct length, which we then compare with the Bard? Or, more fairly, do we throw out the stream of characters that do not match the matching stream of Shakespeare? Do we, that is, let the monkey continuously start over until he gets it right?
Well, it just doesn’t matter. The number 10-12,000,000 is so mind-bogglingly small that it is never going to happen. Even if we let a barrelful of monkeys type 100 characters a second, they are never going to finish.
And so we conclude what we already knew: randomness isn’t enough to make a Shakespeare; something more is needed.
I assumed that the order of Shakespeare’s works matter: that is, the monkey first has to type Venus and Adonis before moving on to a sonnet. If order doesn’t matter, then the chance the monkey reproduces everything increases, by about 100 times.
I also ignored all the other keys on a standard keyboard: including them drops the chance of duplication to even smaller levels.
If you find any errors in calculations or logic or whatever, please email them to email@example.com.
Update Fixed thanks to Charles’s suggestion. Another way to think of it is that a space adds to the average word length by one.