I am establishing a brand new server, and wish to support UTF-8 fully during my web application. I've attempted previously on existing servers and try to appear to finish up needing to fall to ISO-8859-1.
Where exactly should i set the encoding/charsets? I am conscious that I have to configure Apache, MySQL and PHP to get this done - can there be some standard record I'm able to follow, or possibly trobleshoot and fix in which the mismatches occur?
To a brand new Linux server, running MySQL 5, PHP 5 and Apache 2.
utf8_unicode_ci(or equivalent) collation on all tables and text posts inside your database. This will make MySQL physically store and retrieve values natively in UTF-8.
- In PHP, in whatever DB wrapper you utilize, you will need to set the bond charset to utf8. By doing this, MySQL does no conversion from the native UTF-8 if this hands data on PHP.
- Observe that if you do not make use of a DB wrapper, you'll most likely need to problem a question to inform MySQL to provide you with leads to UTF-8:
SET NAMES 'utf8'(the moment you connect).
- You need to tell PHP to provide the correct headers towards the client, so text is going to be construed as UTF-8. In PHP, you should use the
default_charsetphp.ini option, or by hand problem the
Content-Typeheader yourself, that is just more work but has got the same effect.
- You would like all data delivered to you by browsers to stay in UTF-8. Regrettably, the only method to dependably do that is add the
accept-charsetattribute to any or all your
<form ... accept-charset="UTF-8">.
- Observe that the W3C HTML spec states that clients "should" default to delivering forms to the server in whatever charset the server offered, but this really is apparently merely a recommendation, hence the requirement for being explicit on each and every
- Although, on that front, you will still wish to verify every posted string to be valid UTF-8 before you decide to attempt to store it or utilize it anywhere. PHP's
megabytes_check_encoding()does the secret, but utilize it religiously.
- This really is, regrettably, hard part. You have to make certain that each time you process a UTF-8 string, you need to do so securely. Simplest method of doing this really is by looking into making extensive utilization of PHP's
- PHP's string procedures aren't automatically UTF-8 safe. You will find several things you are able to securely use normal PHP string procedures (like concatenation), however for the majority of things you need to use the same
- To be aware what you are doing (read: not screw it up), you will need to know UTF-8 and just how it creates the cheapest possible level. Take a look at the links from utf8.com for many good assets to understand all you need to know.
- Also, I seem like this ought to be stated somewhere, despite the fact that it might appear apparent: every PHP or HTML file you will be serving ought to be encoded in valid UTF-8.