Yesterday I was listening to a mathematics lecture on cryptography and number theory. The lecturer began discussion older cryptographic methods and he started spending time on a process called mono-alphabetic substitution. He mentioned this particular method lasted for approximately 800 years as a secure way to transmit information but that it was ultimately cracked when an Arabian man named Abu Yusuf Yaqub ibn Ishaq as-Sabbah al-Kindi (I think I have this fairly close when I say it translates to something like: Father of Yusuf [Joseph], Yaqub [Jacob] son of Isaq [Isaac] the Morning [or the Light] of the Kindi family … Arabic culture has such beautiful names.) He’s simply referred to as Al-Kindi most of the time. And he is known as the father of Arabic philosophy.
Anyway, so this lecture was commenting that al-Kindi was performing Gematria (the practice of assigning numberic [or geometric] values to letters in order to find hidden meanings) upon the Quran and he had the brilliant insight that he could apply “frequency analysis” to find out which of the letters occurred most in bodies of text and this allowed him the key to cracking mono-alphabetic substitution. And poof, 800 years of security disappeared!
So here’s where I’m going to go off course (even though I find history and math both to be extremely interesting subjects). As I was listening to this lecture I couldn’t help but thing, “That’s not frequency analysis. That is probability density analysis.” And with that I was struct with the question of, what if he actually did do frequency analysis? If I assigned a value to every letter of an alphabet in a body of text I’d be left with a string of numbers. And if I viewed those numbers as the amplitude of a signal, I’d be able to perform a Fourier transformation on that signal to receive a signal strength for each discrete frequency bin, where every bin would represent a given letter.
I’m not entirely sure what utility this would have because I haven’t actually played around with this yet, but a few possibilities pop into my head right away:
1 – The bins themselves will help me identify which alphabet is being used
2 – Every language will probably have it’s own unique footprint in any given alphabet
3 – This could be extended to analyze punctuation, capitalization and possibly grammatical structure
4 – In this case it might help with neural network learning for artificially intelligent machines on how to speak a language “properly” (or any other task where rules are hard or impossible to explicitly define)