Character Encoding – UTF8 vs. UTF16 vs. char* Explained

character-encodingmultibytestringutf-16utf-8

I've managed to mostly ignore all this multi-byte character stuff, but now I need to do some UI work and I know my ignorance in this area is going to catch up with me! Can anyone explain in a few paragraphs or less just what I need to know so that I can localize my applications? What types should I be using (I use both .Net and C/C++, and I need this answer for both Unix and Windows).

Best Answer

Check out Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

EDIT 20140523: Also, watch Characters, Symbols and the Unicode Miracle by Tom Scott on YouTube - it's just under ten minutes, and a wonderful explanation of the brilliant 'hack' that is UTF-8

Best Answer

Related Solutions

Unicode – Differences Between UTF-8, UTF-16, and UTF-32

UTF-8 Terminology – What Is a Multibyte Character Set?

Related Question