php - UTF-8 all the way through -


i'm setting new server, , want support utf-8 in web application. have tried in past on existing servers , seem end having fall iso-8859-1.

where need set encoding/charsets? i'm aware need configure apache, mysql , php - there standard checklist can follow, or perhaps troubleshoot mismatches occur?

this new linux server, running mysql 5, php 5 , apache 2.

data storage:

  • specify utf8mb4 character set on tables , text columns in database. makes mysql physically store , retrieve values encoded natively in utf-8. note mysql implicitly use utf8mb4 encoding if utf8mb4_* collation specified (without explicit character set).

  • in older versions of mysql (< 5.5.3), you'll unfortunately forced use utf8, supports subset of unicode characters. wish kidding.

data access:

  • in application code (e.g. php), in whatever db access method use, you'll need set connection charset utf8mb4. way, mysql no conversion native utf-8 when hands data off application , vice versa.

  • some drivers provide own mechanism configuring connection character set, both updates own internal state , informs mysql of encoding used on connection—this preferred approach. in php:

    • if you're using pdo abstraction layer php ≥ 5.3.6, can specify charset in dsn:

      $dbh = new pdo('mysql:charset=utf8mb4'); 
    • if you're using mysqli, can call set_charset():

      $mysqli->set_charset('utf8mb4');       // object oriented style mysqli_set_charset($link, 'utf8mb4');  // procedural style 
    • if you're stuck plain mysql happen running php ≥ 5.2.3, can call mysql_set_charset.

  • if driver not provide own mechanism setting connection character set, may have issue query tell mysql how application expects data on connection encoded: set names 'utf8mb4'.

  • the same consideration regarding utf8mb4/utf8 applies above.

output:

  • if application transmits text other systems, need informed of character encoding. web applications, browser must informed of encoding in data sent (through http response headers or html metadata).

  • in php, can use default_charset php.ini option, or manually issue content-type mime header yourself, more work has same effect.

input:

  • unfortunately, should verify every received string being valid utf-8 before try store or use anywhere. php's mb_check_encoding() trick, have use religiously. there's no way around this, malicious clients can submit data in whatever encoding want, , haven't found trick php reliably.

  • from reading of current html spec, following sub-bullets not necessary or valid anymore modern html. understanding browsers work , submit data in character set specified document. however, if you're targeting older versions of html (xhtml, html4, etc.), these points may still useful:

    • for html before html5 only: want data sent browsers in utf-8. unfortunately, if go the way reliably add accept-charset attribute <form> tags: <form ... accept-charset="utf-8">.
    • for html before html5 only: note w3c html spec says clients "should" default sending forms server in whatever charset server served, apparently recommendation, hence need being explicit on every single <form> tag.

other code considerations:

  • obviously enough, files you'll serving (php, html, javascript, etc.) should encoded in valid utf-8.

  • you need make sure every time process utf-8 string, safely. is, unfortunately, hard part. you'll want make extensive use of php's mbstring extension.

  • php's built-in string operations not default utf-8 safe. there things can safely normal php string operations (like concatenation), things should use equivalent mbstring function.

  • to know you're doing (read: not mess up), need know utf-8 , how works on lowest possible level. check out of links utf8.com resources learn need know.


Comments

Popular posts from this blog

Change the color of an oval at click in Java AWT -

c# - Unity IoC Lifetime per HttpRequest for UserStore -

I am trying to solve the error message 'incompatible ranks 0 and 1 in assignment' in a fortran 95 program. -