People often say that they are always troubled by the problem of garbled Chinese characters when using XMLHTTP. I checked some information, and the result disappointed me. Everyone uses ASP server-side technology to solve this problem.
Let’s first analyze why the problem of Chinese garbled characters occurs. The reason is very simple: when XMLHTTP gets the Response, it assumes that the Response is UTF8 encoded. It treats HTML containing GB2312 encoding as UTF8 format. Therefore, Chinese garbled characters appear.
So, other than using ASP server-side scripting technology, is there no client-side solution? The answer is: yes!
I used VBScript client script to successfully implement the method without using ASP, and solved the problem of Chinese garbled characters when XMLHTTP crawls HTML pages.
Why use VBScript instead of the commonly used JScript? XMLHTTP's responseBody returns an unsigned bytes array. VBScript provides many functions for manipulating strings and formatting data, as well as methods for accessing safe arrays. These functions or methods do not exist in JScript. Here we need to use VBScript's built-in functions: MidB, AscB, LenB, etc., to access responseBody.
To digress, I am not emphasizing that VBScript is better than JScript, but both have their own characteristics. This is my first time writing an article on CSDN. Thank you for your support. There are two purposes in writing this article: first, to train yourself; second, I hope that when you encounter problems, you can learn to analyze the problems, be targeted, and know what is happening and why.
I gave the code Test.htm, which includes two applications: obtaining its own code and obtaining other web page codes. The specific script is as follows:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- Author: Xiao Lin, [email protected] -->
<HTML>
<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=gb2312">
</HEAD>
<script language=VBScript>
Function bytes2BSTR(vIn)
strReturn = ""
For i = 1 To LenB(vIn)
ThisCharCode = AscB(MidB(vIn,i,1))
If ThisCharCode < &H80 Then
strReturn = strReturn & Chr(ThisCharCode)
Else
NextCharCode = AscB(MidB(vIn,i+1,1))
strReturn = strReturn & Chr(CLng(ThisCharCode) * &H100 + CInt(NextCharCode))
i = i + 1
End If
Next
bytes2BSTR = strReturn
End Function
Function viewSource1()
dimXmlHttp
set XmlHttp = CreateObject("Microsoft.XMLHTTP")
XmlHttp.Open "GET", document.location.href, false
XmlHttp.setRequestHeader "Content-Type","text/XML"
XmlHttp.Send
dim html
html = bytes2BSTR(XmlHttp.responseBody)
msgboxhtml
End Function
Function viewSource2()
dimXmlHttp
set XmlHttp = CreateObject("Microsoft.XMLHTTP")
XmlHttp.Open "GET", " http://www.google.com ", false
XmlHttp.setRequestHeader "Content-Type","text/XML"
XmlHttp.Send
dim html
html = bytes2BSTR(XmlHttp.responseBody)
msgboxhtml
End Function
</script>
<BODY bgcolor=gainsboro style='border:1pt solid white'>
<TABLE class=text>
<tr>
<td class=text>Complete client-side Script solution to Chinese garbled characters when XMLHTTP Gets HTML pages</td>
</tr>
<tr>
<td class=button><button onclick=viewSource1()>View your own web page code</button></td>
</tr>
<tr>
<td class=button><button onclick=viewSource2()>View google homepage code</button></td>
</tr>
</TABLE>
</BODY>
</HTML>