I'm able to browse the mysql paperwork and they're pretty obvious. But, how do you choose which character set to make use of? On which stuff does collation have an impact?

I am requesting a reason of these two and just how to select them...


In the article here (for futher reading through):

A character set is some symbols and encodings. A collation is some rules for evaluating figures inside a character set. Let us result in the distinction obvious with a good example of an imaginary character set.

Guess that there's an alphabet with four letters: 'A', 'B', 'a', 'b'. We give each letter several: 'A' = , 'B' = 1, 'a' = 2, 'c' = 3. The letter 'A' is really a symbol, the amount may be the encoding for 'A', and also the combination of four letters as well as their encodings is really a character set.

Now, guess that you want to compare two string values, 'A' and 'B'. The easiest method of doing this really is to check out the encodings: for 'A' and 1 for 'B'. Because is under 1, we are saying 'A' is under 'B'. Now, what we have just done is apply a collation to the character set. The collation is really a set of rules (just one rule within this situation): "compare the encodings." We refer to this as easiest of possible collations a binary collation.

But what to state that the lowercase and uppercase letters are equivalent? Only then do we might have at least two rules: (1) treat the lowercase letters 'a' and 'b' as equal to 'A' and 'B' (2) then compare the encodings. We refer to this as a situation-insensitive collation. It is a a bit more complex than the usual binary collation.

In tangible existence, most character sets have many figures: not only 'A' and 'B' but whole alphabets, sometimes multiple alphabets or eastern writing systems with 1000's of figures, together with many special symbols and punctuation marks. Also in tangible existence, most collations have numerous rules: not just situation insensitivity but additionally accent insensitivity (an "accent" is really a mark mounted on a personality as with German 'ö') and multiple-character mappings (like the rule that 'ö' = 'OE' within the two German collations).

A character encoding is a method to scribe caracters to ensure that they can fit in memory. That's, when the charset is ISO-8859-15, the euro symbol is going to be encoded as 0xA4, as well as in UTF-8, it will likely be 0xe282ac.

The collation is how you can compare caracters, in latin9, you will find letters as e é è ê f, if sorted by their binary representation, it'll go "e f é ê è" but when the collation is appropriate, you will have them within the order you thought they'd be.

Serta Esparza's response is not in the linked article, but from MySQL's manual ultimately.