The thief mentioned here refers to using the powerful functions provided by the XMLHTTP component in XML in ASP to capture the data (pictures, web pages and other files) on the remote website to the local, and display it on the page after various processing Or a type of program that is stored in a database. You can use this kind of thief program to complete some tasks that seemed completely impossible in the past, such as changing the pages of a certain website and turning them into your own pages, or saving some data (articles, pictures) of a certain website to be used in the local database. The advantages of Thief are: there is no need to maintain the website, because the data in the Thief program comes from other websites, and it will be updated as the website is updated; it can save a lot of server resources. Generally, the Thief program only has a few files, and all web content is from other websites. The disadvantages are: instability, if the target website goes wrong, the program will also go wrong, and if the target website is upgraded and maintained, the thief program will also need to be modified accordingly; speed, because it is a remote call, the speed is as fast as reading data on the local server It's definitely slower than that. How about it, sounds amazing, right? Let’s start learning some introductory knowledge of the thief program now!
Let’s study something simpler, the weather forecast program on the QQ website
The code is as follows:
<%
On Error Resume Next
Server.ScriptTimeOut=9999999
Function getHTTPage(Path)
t = GetBody(Path)
getHTTPPage=BytesToBstr(t,GB2312)
End function
' First, make some initialization settings for the thief program. The functions of the above codes are to ignore all non-fatal errors, set the running timeout of the thief program to a very long time (so that no running timeout errors will occur), and convert the original default The UTF-8 encoding must be converted into GB2312 encoding. Otherwise, directly using the XMLHTTP component to call a web page with Chinese characters will result in garbled code.
Function GetBody(url)
on error resume next
Set Retrieval = CreateObject(Microsoft.XMLHTTP)
With Retrieval
.Open Get, url, False, ,
.Send
GetBody = .ResponseBody
End With
Set Retrieval = Nothing
End Function
'Then call the XMLHTTP component to create an object and perform initial settings.
Function BytesToBstr(body,Cset)
dim objstream
set objstream = Server.CreateObject(adodb.stream)
objstream.Type = 1
objstream.Mode =3
objstream.Open
objstream.Write body
objstream.Position = 0
objstream.Type = 2
objstream.Charset = Cset
BytesToBstr = objstream.ReadText
objstream.Close
set objstream = nothing
End Function
Function Newstring(wstr,strng)
Newstring=Instr(lcase(wstr),lcase(strng))
if Newstring<=0 then Newstring=Len(wstr)
End Function
'To process the captured data, you need to call the adodb.stream component and perform initialization settings. %>
'The following is the page display part
<%
Dim wstr,str,url,start,over,city
'Define some variables that need to be used
city = Request.QueryString(id)
'The ID variable returned by the program (that is, the city selected by the user) is assigned to id
url=http://appnews.qq.com/cgi-bin/news_qq_search?city=&city&
'Here set the page address that needs to be crawled. Of course, you can also specify an address directly without using variables.
wstr=getHTTPage(url)
'Get all data of the specified page
start=Newstring(wstr, <html>)
'Here sets the header of the data that needs to be processed. This variable should be set according to different situations. The specific content can be determined by viewing the source code of the page that needs to be crawled. Because we need to crawl the entire page in this program, we set it to crawl all pages. Note that the set content must be unique to the page content and cannot be repeated.
over=Newstring(wstr, </HTML>)
'Corresponding to start is the tail of the data that needs to be processed. Similarly, the set content must be unique in the page.
body=mid(wstr,start,over-start)
'Set the range of displayed pages
'Now it's time to use Qiankun Shift +++. Through replace, you can replace the specified characters in the data with some characters.
body = replace(body,skin1,Weather Forecast-Skin Network)
body = replace(body,http://appnews.qq.com/cgi-bin/news_qq_search?city,tianqi.asp?id)
'The replacement work has been completed in this program. If there are other needs, you can continue to perform similar replacement operations.
response.write body
%>
After replacing the content that needs to be modified, the modified content can be displayed on the page. This is the end of the process
Program usage and results: Remove the description part of the above code and save it as tianqi.asp, upload it to a space that supports ASP and XML, and run it in the browser. You can further interface beautification or program optimization based on this program.
The above are just some basic applications of the XMLHTTP component. In fact, it can also implement many functions, such as saving remote images to the local server, and using the adodb.stream component to save the acquired data into the database. Thief has a wide range of functions and uses. But you can’t use it to do illegal things!
Maybe some people still want to ask, is this kind of thief program just a patent of ASP? No, PHP can achieve the same effect through the fopen function. Due to the various characteristics of PHP itself, the thief program written has obvious advantages in size and execution efficiency compared with ASP. However, due to space limitations, here I won’t explain them one by one.