Spider is a very useful program on the Internet. Search engines use spider programs to collect Web pages into databases. Companies use spider programs to monitor competitor websites and track changes. Individual users use spider programs to download Web pages in order to remove them. When used on a computer, developers use spider programs to scan their own Web to check for invalid links... Spider programs have different uses for different users. So, how do spider programs work?
A spider is a semi-automatic program. Just like a real spider travels on its Web (cobweb), a spider program also travels on the web woven by Web links in a similar way. The reason why the spider program is semi-automatic is that it always needs an initial link (starting point), but its subsequent operation is determined by itself. The spider program will scan the links contained in the starting page, and then access these links to point to pages, and then analyze and track the links contained in those pages. In theory, the spider program will eventually visit every page on the Internet, because almost every page on the Internet is always referenced by other more or less pages.
This article introduces how to use C# language to construct a spider program, which can download the contents of the entire website to a specified directory. The running interface of the program is shown in Figure 1. You can easily construct your own spider program using several core classes provided in this article.
For more information, please read: http://info.codepub.com/2008/03/info-18319.html
Expand