I have been studying web page coding for a long time, because recently I have to design a VBS script for friendly link detection, and the pages of the people you link to are likely to be in various encodings'/*============= ================================================== ==========
'*Intro I have been studying web page coding for a long time, because recently I have to design a VBS script for friendly link detection, and the page of the person you are linking to is likely to be in various encodings. The previous method was: if it cannot be found using GB2312, then use UTF -8 Check, and can't find any proof that the other party didn't link to you. Although it's not 100% correct, it's almost the same. There are more people using these two codes. I accidentally saw an idea at an address in my favorites. , it is finally possible to automatically determine the encoding of web pages when collecting articles. Because this problem has been bothering me for a long time during the research process, although it seems simple now, many people must still be looking for it, so I posted these three functions.
' * FileName GetWebCodePage.vbs
' * Author yongfa365
' * Version v2.0
' * WEB http://www.yongfa365.com
' * Email yongfa365[at]qq.com
' * FirstWrite http://www.yongfa365.com/Item/GetWebCodePage.vbs.html
' * MadeTime 2008-01-29 20:55:46
' * LastModify 2008-01-30 20:55:46
' *================================================ ==========================*/
Call getHTTPPage(http://www.baidu.com/)
Call getHTTPPage(http://www.google.com/)
Call getHTTPPage(http://www.yongfa365.com/)
Call getHTTPPage(http://www.cbdcn.com/)
Call getHTTPPage(http://www.csdn.net/)
'Get the matching content and return the array
'getContents(expression, string, whether to return a reference value)
'msgbox getContents(a(.+?)b, a23234b ab a67896896b sadfasdfb ,True)(0)
Function getContents(patrn, strng, yinyong)
'by www.yongfa365.com Please keep the link when reprinting, so that end users can get the latest updated information in time
On Error Resume Next
Set re = New RegExp
re.Pattern = patrn
re.IgnoreCase = True
re.Global = True
Set Matches = re.Execute(strng)
If yinyong Then
For i = 0 To Matches.Count -1
If Matches(i).Value<> Then RetStr = RetStr & Matches(i).SubMatches(0) & Liu Yongfa
Next
Else
For Each oMatch in Matches
If oMatch.Value<> Then RetStr = RetStr & oMatch.Value & Liu Yongfa
Next
End If
getContents = Split(RetStr, Liu Yongfa)
End Function
Function getHTTPPage(url)
On Error Resume Next
Set xmlhttp = CreateObject(MSXML2.XMLHTTP)
xmlhttp.Open Get, url, False
xmlhttp.Send
If xmlhttp.Status<>200 Then Exit Function
GetBody = xmlhttp.ResponseBody
'Liu Yongfa (www.yongfa365.com)'s idea here is to first search based on the returned string and find the file header. If it is not there yet, use GB2312. Generally, the encoding can be directly matched.
'Looking at the returned string, although the Chinese characters are garbled, it does not affect our encoding.
GetCodePage = getContents(charset=[']*([^,']+), xmlhttp.ResponseText, True)(0)
'Look at the encoding in the header file
If Len(GetCodePage)<3 Then GetCodePage = getContents(charset=[']*([^,']+), xmlhttp.getResponseHeader(Content-Type) , True)(0)
If Len(GetCodePage)<3 Then GetCodePage = gb2312
Set xmlhttp = Nothing
'The following sentence should be blocked when used formally.
WScript.Echo url & --> & GetCodePage
getHTTPPage = BytesToBstr(GetBody, GetCodePage)
End Function
Function BytesToBstr(Body, Cset)
On Error Resume Next
Dim objstream
Set objstream = CreateObject(adodb.stream)
objstream.Type = 1
objstream.Mode = 3
objstream.Open
objstream.Write Body
objstream.Position = 0
objstream.Type = 2
objstream.Charset = Cset
BytesToBstr = objstream.ReadText
objstream.Close
Set objstream = Nothing
End Function