Chinese PHP programming under UTF8

Author：Eve Cole Update Time：2009-06-01 18:19:59

Preface:

To be honest,
Sandals also thinks UTF8 is a good thing...
After all, showing China, Japan and South Korea on the same screen is very attractive to East Asians...
(Of course the benefits are not limited to this...)
Not just web programs…
The kernels of many applications are beginning to use Unicode encoding...
The purpose is obvious: support multi-language display...
All Microsoft software is Unicode-based...
Therefore, Japanese software can be displayed normally on your Chinese XP...
And Chinese 98 will cause garbled characters when installing other language software on the GB kernel...

As for UTF8,
It can be said to be a branch of Unicode.
It uses three bytes to save a Chinese character...
(Unicode uses four bytes)
Application software has collectively defected to Unicode...
Are we not allowed to use UTF8 in web applications?

In this article, Sandals will try to introduce PHP programming under UTF8 encoding in as many aspects as possible...
As for why "Chinese" is introduced separately...
Firstly, it’s because there is really no need to consider UTF8 when it comes to English…
Unless you are planning to build a multi-language system...
(I want to make some complaints: Foreigners nowadays don’t pay attention to this issue at all when writing programs...)
Secondly, the processing methods of multi-byte languages such as China, Japan, and Korea under UTF8 encoding are actually very similar...
Just follow the gourd and draw the scoop...
Okay... let's start with the database processing part...

==========================================
Connecting to the database

Many people will find that the data is messed up when they just upgrade to Mysql 4.1...
In fact, it is because Mysql supports character sets starting from 4.1...
And the default character set is UTF8...
(Full proof of the importance of integrating with international standards... Hehe...)
In the past, we mostly used utf8 or GBK encoding...
Of course the output data is garbled...
To solve the garbled code...
You have to let the program know what encoding data to get...

We assume that your previous database was utf8 encoded...

Then you can add

mysql_query('SET CHARACTER SET utf8') or die("Query failed : " . mysql_error());

before the query.

Of course, since this is required only for 4.1 and above,
Therefore we can add judgment:

$mysqlversion = $db->query_first("SELECT VERSION() AS version");
if ($mysqlversion['version'] >= '4.1')
{
mysql_query('SET CHARACTER SET utf8') or die("Query failed : " . mysql_error());
}
In this way, no matter what the default encoding of Mysql is, you can access it normally...
(It doesn’t matter whether you are depositing during the survival period, depositing for a fixed period, or even withdrawing in small amounts...)

However, everyone is international...
Are you still using utf8?
How to transcode it?
besides……
What should I do if garbled characters appear during data upgrade?
Salad!
Let’s listen to the breakdown next time…

============================================= =
Data needs to be upgraded to 4.1

...
You have to export it first...
To say that foreigners are really irresponsible...
The previous export method always lost some Chinese characters...
For example, change "I love your mother" to "I love you"...
(Usually the last word of a piece of data is lost)
The whole thing is a generation behind...
(In the words of Sister Pomegranate, "The fact of such a rebellious act is really exciting"...)
To protect your fragile heart...
Also in order to maintain traditional Chinese ethics and morals...
You can change the fields whose data contains Chinese characters to binary encoding...
The specific method...
You can run this statement:

ALTER TABLE `table name` CONVERT TO CHARACTER SET binary;
In this way, those character type fields such as:
CHAR, VARCHAR and TEXT
will be converted to
BINARY, VARBINARY and BLOB
Then export and import into the 4.1 environment...
Of course, the last tedious task is:
You need to change their types back...

There is an upgrade to 4.1...
Of course, there are also downgrades...
How to downgrade? ? ?
Sandals to go to the toilet...
And please turn to the next page...

============================================ ===
Data was downgraded from 4.1.

Some people found that the SQL files exported from 4.1 could not be imported into lower version programs...
The problem is actually very simple...
And Mysql has already thought of everything for us...
Please add the --compatible parameter when exporting...
We assume your database is utf8 encoded...
And the target database version is 4.0…
Then write this on the command line:

shell>mysqldump --user=username --password=password --compatible=mysql40 --default-character-set=utf8 database > db.sql
The SQL file exported in this way can be successfully imported into the lower version database...

The database part is finally done...
But how should we pay attention to PHP programming?
I have to ask you to turn to the next page...
http://www.knowsky.com
=============================================
PHP file encoding

Do all PHP files have to be converted to UTF8 encoding?
Sandals tell you NO...

let's put it this way...
If the file contains Chinese characters that need to be displayed...
It should be converted to UTF8 encoding...
Let’s give an example:

// I am a sandal
echo time();
Although the above code has code...
But since it exists in the annotation...
No output...
So this page does not need to be converted to UTF8 format...

Another example:

echo "I am Sandals";
This obviously has Chinese character output...
You should just convert to UTF8...

Of course, many programs now use template (language pack) technology...
You cannot see any characters for output in the program (non-language package file)...
In this way, we only need to convert the language pack file into UTF8 encoding...
(This is the advantage of language packs... Ahahahahaha...)
'http://www.knowsky.com
==================================================

UTF8 Chinese interception

because UTF8 uses three bytes...
So the traditional substr function is useless...
Many experts have written UTF8 Chinese character interception functions...
Here are a few:

1. Calculate first and then take

/**
* Author: Dummy | Zandy
* Email: [email protected] | hotmail.com
*Create: 200512
* Usage: echo join('', String::subString_UTF8('Chinese characters', 0, 1));
*/
ini_set('display_errors', 1);
error_reporting(E_ALL ^ E_NOTICE);
class String {
function subString_UTF8($str, $start, $lenth)
{
$len = strlen($str);
$r = array();
$n = 0;
$m = 0;
for($i = 0; $i < $len; $i++) {
$x = substr($str, $i, 1);
$a = base_convert(ord($x), 10, 2);
$a = substr('00000000'.$a, -8);
if ($n < $start){
if (substr($a, 0, 1) == 0) {
}elseif (substr($a, 0, 3) == 110) {
$i += 1;
}elseif (substr($a, 0, 4) == 1110) {
$i += 2;
}
$n++;
}else{
if (substr($a, 0, 1) == 0) {
$r[] = substr($str, $i, 1);
}elseif (substr($a, 0, 3) == 110) {
$r[] = substr($str, $i, 2);
$i += 1;
}elseif (substr($a, 0, 4) == 1110) {
$r[] = substr($str, $i, 3);
$i += 2;
}else{
$r[] = '';
}
if (++$m >= $lenth){
break;
}
}
}
return $r;
} // End subString_UTF8
}//End String
echo join('', String::subString_UTF8('Chinese characters', 0, 1));
2. I think it’s very clever to cut the sandals first and then cut them off...
Use the traditional interception function to truncate first...
Then determine whether a single Chinese character is split...
If it is... then deal with it...
It is important to note that the third parameter of the substr function must be greater than 3...
As for why you don’t use sandals, can you explain it?

// A trim function to remove the last character of a utf-8 string
// by following instructions on http://en.wikipedia.org/wiki/UTF-8
// dotann
// usage: $str = utf8_trim(substr($str,0,50));
function utf8_trim($str) {
$len = strlen($str);
for ($i=strlen($str)-1; $i>=0; $i-=1){
$hex .= ' '.ord($str[$i]);
$ch = ord($str[$i]);
if (($ch & 128)==0) return(substr($str,0,$i));
if (($ch & 192)==192) return(substr($str,0,$i));
}
return($str.$hex);
}
$str = 'Chinese characters';
echo utf8_trim(substr($str,0,3));
3. There are other methods,
For example, 007pig is a function written in the Chinese version of our vBulletin...
Short and sweet...
It is inconvenient to release the source code...
Sorry Bird...

I'll end my writing here today...
There are also issues such as transcoding that have not been written...
Busy lately...
Continue to organize when you have time...
http://www.quchao.com/?p=6&pp=1