In the last post, reader George suggested a challenge to see whether or not short poem codes would be easy to “break”; that is, decrypt without access to the original poem I used to encrypt the message. I want to give a hint about how the challenge might be solved.
Do not try and guess the poem: you will never get it. At best—and only if you have access to some furiously fast flops—you’ll be able to identify the words from the poem, but they are not especially enlightening.
The way to attack the problem, as in most code breaking, is through statistics using a technique called frequency analysis. But let me first tell you why statistics won’t always work (though they often do).
Suppose I am an agent and want to communicate the direction from which the attack will come: north, south, east, or west. I select from my poem five words which contain at least 18 total characters. Why 18? That’s how many characters “northsoutheastwest” has. And why is that important?
A transcription code is just a permutation of the given characters (think about this). Let’s say we want to encrypt “east” and that my poem snippet is exactly 18 letters long. The letters of “east” will be placed in the spots from 1 to 18 depending on the poem’s words (prove this to yourself). Further, since I know this fact, I will use the “nonsense” letters “northsouthwest”, which will also be salted throughout our final encrypted string (again, depending on how the poem’s word places them). That is, the final string will contain all the direction words (plus 2 extra letters, if we want to be neat and form a final group of exactly 5 letters; however, this is obviously not necessary).
You can see that no manner of frequency testing is going to help because all the possible words are there, and attempts to break the code can learn no more than that. That is, by using frequency analysis, plus knowledge of English spelling, you will eventually hit upon that fact that the message contains the four direction words. But that is all you will discover. You will not know which is the intended direction.
The only possible way to figure which word was intended is to make guesses of the words from the poem, and then try encrypting the message yourself and see if matches the original encrypted string you intercepted. There is a frequency of words in use in English and you can choose from them. The strategy would be to pick from all these words (which is a lot!) and try them out in various combinations, one by one, until you hit on the overheard encoding (whatever string of letters you have intercepted). If you think you have some insight into the poem I might have used, then you have scored because the number of possible word combinations will have shrunk amazingly.
I was not especially cruel in the poem code challenge. The message is there and the nonsense padding letters are not parts of other messages. Neither was I overly generous, though. The padding I choose were (roughly) letters that are the most common in English, with a slight favoring of duplicate letters that were in the message I encrypted. That is the hint. (Obviously, you also ignore the first five characters.)
Still not too easy!
The challenge is to decrypt the following:
agmpw tdenl wyecs eotas saobn ynodo orlet