I am establishing a brand new server, and wish to support UTF-8 fully during my web application. I've attempted previously on existing servers and try to appear to finish up needing to fall to ISO-8859-1.

Where exactly should i set the encoding/charsets? I am conscious that I have to configure Apache, MySQL and PHP to get this done - can there be some standard record I'm able to follow, or possibly trobleshoot and fix in which the mismatches occur?

To a brand new Linux server, running MySQL 5, PHP 5 and Apache 2.

Storage:

  • Specify utf8_unicode_ci (or equivalent) collation on all tables and text posts inside your database. This will make MySQL physically store and retrieve values natively in UTF-8.

Retrieval:

  • In PHP, in whatever DB wrapper you utilize, you will need to set the bond charset to utf8. By doing this, MySQL does no conversion from the native UTF-8 if this hands data on PHP.
  • Observe that if you do not make use of a DB wrapper, you'll most likely need to problem a question to inform MySQL to provide you with leads to UTF-8: SET NAMES 'utf8' (the moment you connect).

Delivery:

  • You need to tell PHP to provide the correct headers towards the client, so text is going to be construed as UTF-8. In PHP, you should use the default_charset php.ini option, or by hand problem the Content-Type header yourself, that is just more work but has got the same effect.

Submission:

  • You would like all data delivered to you by browsers to stay in UTF-8. Regrettably, the only method to dependably do that is add the accept-charset attribute to any or all your <form> tags: <form ... accept-charset="UTF-8">.
  • Observe that the W3C HTML spec states that clients "should" default to delivering forms to the server in whatever charset the server offered, but this really is apparently merely a recommendation, hence the requirement for being explicit on each and every <form> tag.
  • Although, on that front, you will still wish to verify every posted string to be valid UTF-8 before you decide to attempt to store it or utilize it anywhere. PHP's megabytes_check_encoding() does the secret, but utilize it religiously.

Processing:

  • This really is, regrettably, hard part. You have to make certain that each time you process a UTF-8 string, you need to do so securely. Simplest method of doing this really is by looking into making extensive utilization of PHP's mbstring extension.
  • PHP's string procedures aren't automatically UTF-8 safe. You will find several things you are able to securely use normal PHP string procedures (like concatenation), however for the majority of things you need to use the same mbstring function.
  • To be aware what you are doing (read: not screw it up), you will need to know UTF-8 and just how it creates the cheapest possible level. Take a look at the links from utf8.com for many good assets to understand all you need to know.
  • Also, I seem like this ought to be stated somewhere, despite the fact that it might appear apparent: every PHP or HTML file you will be serving ought to be encoded in valid UTF-8.