I'm trying to figure out what collation I should be using for various types of data. 100% of the content I will be storing is user-submitted.
My understanding is that I should be using UTF-8 General CI (Case-Insensitive) instead of UTF-8 Binary. However, I can't find a clear a distinction between UTF-8 General CI and UTF-8 Unicode CI.
- Should I be storing user-submitted content in UTF-8 General or UTF-8 Unicode CI columns?
- What type of data would UTF-8 Binary be applicable to?
Best Answer
In general, utf8_general_ci is faster than utf8_unicode_ci, but less correct.
Here is the difference:
Quoted from: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
For more detailed explanation, please read the following post from MySQL forums: http://forums.mysql.com/read.php?103,187048,188748
As for utf8_bin: Both utf8_general_ci and utf8_unicode_ci perform case-insensitive comparison. In constrast, utf8_bin is case-sensitive (among other differences), because it compares the binary values of the characters.