Statistics

Poem Code Challenge Hint

In the last post, reader George suggested a challenge to see whether or not short poem codes would be easy to “break”; that is, decrypt without access to the original poem I used to encrypt the message. I want to give a hint about how the challenge might be solved.

Do not try and guess the poem: you will never get it. At best—and only if you have access to some furiously fast flops—you’ll be able to identify the words from the poem, but they are not especially enlightening.

The way to attack the problem, as in most code breaking, is through statistics using a technique called frequency analysis. But let me first tell you why statistics won’t always work (though they often do).

Suppose I am an agent and want to communicate the direction from which the attack will come: north, south, east, or west. I select from my poem five words which contain at least 18 total characters. Why 18? That’s how many characters “northsoutheastwest” has. And why is that important?

A transcription code is just a permutation of the given characters (think about this). Let’s say we want to encrypt “east” and that my poem snippet is exactly 18 letters long. The letters of “east” will be placed in the spots from 1 to 18 depending on the poem’s words (prove this to yourself). Further, since I know this fact, I will use the “nonsense” letters “northsouthwest”, which will also be salted throughout our final encrypted string (again, depending on how the poem’s word places them). That is, the final string will contain all the direction words (plus 2 extra letters, if we want to be neat and form a final group of exactly 5 letters; however, this is obviously not necessary).

You can see that no manner of frequency testing is going to help because all the possible words are there, and attempts to break the code can learn no more than that. That is, by using frequency analysis, plus knowledge of English spelling, you will eventually hit upon that fact that the message contains the four direction words. But that is all you will discover. You will not know which is the intended direction.

The only possible way to figure which word was intended is to make guesses of the words from the poem, and then try encrypting the message yourself and see if matches the original encrypted string you intercepted. There is a frequency of words in use in English and you can choose from them. The strategy would be to pick from all these words (which is a lot!) and try them out in various combinations, one by one, until you hit on the overheard encoding (whatever string of letters you have intercepted). If you think you have some insight into the poem I might have used, then you have scored because the number of possible word combinations will have shrunk amazingly.

I was not especially cruel in the poem code challenge. The message is there and the nonsense padding letters are not parts of other messages. Neither was I overly generous, though. The padding I choose were (roughly) letters that are the most common in English, with a slight favoring of duplicate letters that were in the message I encrypted. That is the hint. (Obviously, you also ignore the first five characters.)

Still not too easy!

The challenge is to decrypt the following:

  agmpw   tdenl   wyecs   eotas   saobn   ynodo   orlet

Categories: Statistics

2 replies »

  1. Thanks for a fun puzzle – even though its not looking good for finding a solution.

    Assuming there aren’t variations to the poem code procedure that you didn’t discuss, I would think that the first step in breaking the message would be to guess the number of characters in the poem words – or more importantly the number of lines in the encryption table. Again assuming the 5 letter key implies 5 words, it doesn’t seem to likely that you’re key words averaged 2 letters each so we probably are not looking at 3 rows. Two rows of a 15 letter code is a possibility.

    Looking at the letters in the what would be the top row for that code, which shouldn’t have any padding, I see two y’s. Words with y’s seem to be a good place to start, but after playing with a number of combinations (most of which involved “yearly cost” or “costly year”) I didn’t find one where the beginning of the second line sensibly continued the message. At which point it occurs that the worst type of message would be one shorter than the number of code letters. That would put everything on one line and give no check on whether I’ve got the correct solution. Since we also don’t know how many letters are in the real message and how many are padding, there are a lot of potential messages from 30 letters. The only hope I could see in that case would be to have some expectation of what might be in the message.

    Missing letters preclude potentially interesting words like “statistics”, “Briggs”, “poem” and “George”. A plausible sounding message can be made assuming you encrypted a quip including decode: “really not so easy to decode now”. On the other hand, maybe one of the y’s is on the name of a day. The only option there is Wendesday and that choice gives a possible message as well: “try to call boss Wednesday noon”. I would think that a one line message is pretty hopeless without having a pretty good idea of the subject or at least the number of dummy letters.

  2. Bob!

    This is exactly the way to solve a puzzle like this. I am very proud. I’ll wait until the weekend(ish) to post the solution.

Leave a Reply

Your email address will not be published. Required fields are marked *