Reading about Korean language representation in Unicode–and wow.
The Korean alphabet is simple enough that anyone can memorize it in an afternoon. The difficult part is that when the Korean alphabet was invented, the only other writing game in town was Chinese, with its square-shaped characters. In theory, it should be easy to read Korean letters linearly–but someone decided that the letters should be arranged in aesthetically pleasing syllabic boxes. Fast forward 600 years, and now each multi-letter syllable has its own Unicode representation.
I won’t attempt to re-explain what that article covers so thoroughly, but I just had to share the best part: when you take a Unicode point (for example, 46239) that describes a single “boxed” syllable, you can deconstruct it into letter components by:
tail = (codepoint – 44032) % 28
vowel/mid = 1 + ((codepoint – 44032 – tail) % 588)/28
lead = 1 + (int) ((codepoint – 44032)/588)
With my random number I picked above, it becomes… “dwetch,” which has no part of any Korean word so far as I know. Internet search yields… 50,000 results of complete jibberish, looking remarkably like (imagine!) someone just rendered random Unicode combinations.
But still. HOT. I love this world.