Qiannao CMS is the leading automatic crawling program in China;
Can crawl the content of almost any website;
The code is refined, the extension is highly customizable, and it’s free and open source!
The program is built using code, rules, and templates as separate components!
Program Highlights:
Original PHP caching algorithm, crawling algorithm, filtering algorithm, and stable operation!
Original heuristic fully automatic update engine, automatically generates static and automatic updates!
Original image path intelligent recognition function supports image localization (and automatic watermarking)!
Original sub-directory storage function, multi-directory hash cache (similar to SQL's multi-table hash), storage and reading can reach micro level!
The original automatic hyperlink analysis function intelligently identifies all hyperlink formats of the target site.
Whether it is a subdirectory or a root directory, a relative path or an absolute path, 100% intelligent recognition requires no replacement! (Except second-level domain names)
This program perfectly identifies outbound links. If the target site contains links to other websites, the background can be set to allow outbound links!
The code has been repeatedly optimized and tested. It is extremely fast, runs robustly, and has the lowest load among all collection and crawling programs!
Supports two stealing methods: curl and file_get_contents. Virtual hosts support the former, good news for grassroots!
The program already comes with a Huajun Information rule, which is very powerful!
Expand