The collection program actually calls web pages on other websites through the XMLHTTP component in XML. For example, many of the news gathering programs call Sina's news web pages, and some of the HTML in them are replaced, and advertisements are also filtered. The advantages of using a collection program are: there is no need to maintain the website, because the data in the collection program comes from other websites, and it will be updated as the website is updated; it can save server resources. Generally, the collection program only has a few files, and all web content is from other websites. The disadvantages are:
Unstable, if the target website goes wrong, the program will also go wrong, and if the target website is upgraded and maintained, the collection program will also need to be modified accordingly; speed, because it is a remote call, the speed is slower than reading data on the local server. Definitely slower.
1. Cases
The following is a brief explanation of the application of XMLHTTP in ASP.
Copy the code code as follows:
<%
'Commonly used functions
'1. Enter the url target web page address, and the return value getHTTPPage is the html code of the target web page.
function getHTTPage(url)
dimHttp
set Http=server.createobject(MSXML2.XMLHTTP)
Http.open GET,url,false
Http.send()
if Http.readystate<>4 then
exit function
end if
getHTTPPage=bytesToBstr(Http.responseBody,GB2312)
set http=nothing
if err.number<>0 then err.Clear
end function
'2. Convert Ranma. Directly use xmlhttp to call web pages with Chinese characters. What you get will be Ranma. You can convert it through the adodb.stream component.
Function BytesToBstr(body)
dim objstream
set objstream = Server.CreateObject(adodb.stream)
objstream.Type = 1
objstream.Mode =3
objstream.Open
objstream.Write body
objstream.Position = 0
objstream.Type = 2
objstream.Charset = GB2312 'Convert the original default UTF-8 encoding to GB2312 encoding. Otherwise, directly using the XMLHTTP component to call a web page with Chinese characters will result in garbled code.
BytesToBstr = objstream.ReadText
objstream.Close
set objstream = nothing
End Function
'Try to call the html content of http://www.google below
Dim Url,Html
Url=http://www.google;
Html = getHTTPage(Url)
Response.write Html
%>
2. Several commonly used functions
InStr function
Description Returns the position where a certain character (string2) first appears in another string (string1).
SyntaxInStr(string1, string2)
For example:
Dim SearchString, SearchChar
SearchString =http://www.google ' The string to search for.
SearchChar = blue1000 'Search for blue1000.
MyBK = Instr(SearchString, SearchChar) ' Return 8
'Return 0 if not found, for example:
SearchChar = BK
MyBK = Instr(SearchString, SearchChar) ' Return 0
Mid function
Description: Returns the specified number of characters from a string.
SyntaxMid(string, start, over)
For example:
Dim MyBK
MyBK = Mid (our BK (www.google) design, 7, 12) 'Intercept the string 12 characters after the 7th character of our BK (www.google) design' At this time, the value of MyBK becomes www.google
Replace function
Dim SearchString, SearchChar
SearchString = Our BK Design is a website building resource website's string to be searched within.
SearchString =Replace(SearchString, BK design, Www.google)'At this time, the value of SearchString becomes our Www.google is a website construction resource website
3. Intercept the HTML code of the specified area
For example, I only want to get the text part between <td> and </td> in the following HTML code:
<html>
<title>BK (www.google) Google search engine</title>
<body>
<table>
<tr><td></td></tr>
<tr><td id=Content>BK (www.google) Google search engine is a site with many resources...</td></tr>
</table>
</body>
</html>
<%
…
Dim StrBK,start,over,RsBK
StrBK=getHTTPPage (the address of the web page)
start=Instr(StrBK,<td id=Content>) 'The function here is to get the position of the beginning of the string. Someone is going to ask here: the original code is <td id=Content>, why are you calling <td id=Content> here? Answer: in asp (to be precise, it is represented by two double quotes in VBscript A double quote, because double quotes are a sensitive character for the program.) over=Instr(StrBK,…</td></tr>)'The function here is to obtain the position of the end of the string. 'Someone is going to ask again here:( : Why are there three extra dots in front of the HTML code that the program calls...? Answer: Tip: There is also a </td></tr> in the above line, if you use </td></ tr> to locate, the program will mistakenly regard </td></tr> in the above line as the end of the string to be obtained RsBK=mid(StrBK,start,over-start). 'The function here is to extract the string between the start character and the over character in StrBK. I also talked about the mid function in the previous section; over-start is to calculate the distance between the start position and the end position. distance, that is, the number of characters.
response.write(RsBK) 'Finally output the content obtained by the program
%>
Don't be too happy. When you run it, you will find that there is an error in the html code of the page. Why? Because the html code you obtained is: <td id=Content>BK (www.google) Google search engine is a site with many resources...
Did you see that? There is incomplete HTML code! What to do? The statement start=Instr(StrBK,<td id=Content>) obtains the content of <td id=Content> in
The position number in StrBK, now we can add 17 after the program statement, then the program will point the position to the character after <td id=Content>.
Okay, the program will change to this:
<%
…
Dim StrBK,start,over,RsBK
StrBK=getHTTPPage (the address of the web page)
start=Instr(StrBK,<td id=Content>) + 17
over=Instr(StrBK,…</td></tr>) 'Here you can also subtract seven (-7) to remove 3 points
RsBK=mid(StrBK,start,over-start)
response.write(RsBK)
%>
This is OK, we can steal what we want and display it on our own page, haha~
4. Delete or modify the obtained characters
Replace BK(www.google) in RsBK with BK:
RsBK=replace(RsBK,BK(www.google),BK)
Or delete (www.google) directly:
RsBK=replace(RsBK,(www.google),)
Okay, now RsBK becomes: BK Google search engine is a site with many resources... But in fact, the replace function may not be suitable for some situations. For example, we want to replace all the connections in a certain string. Remove. Connections may include many types, and replace can only replace a specific one of them. We can't replace it with one corresponding replace function, right?