Beanbun is a simple and extensible crawler framework that supports distribution, daemon mode and normal mode. The daemon mode is based on Workerman and the downloader is based on Guzzle.
https://github.com/kiddyuchina/Beanbun/blob/master/docs/chs/README.md
I would like to recommend a very useful global proxy that I recently discovered: SmartProxy
A professional overseas http agent with 100 million real residential IP resources covering the world. Gaoji stably provides 100% native residential IP and supports social accounts, e-commerce platforms, network data collection and other services.
The anonymity is very good, the degree of disguise is very high, and the IP restriction problem is easily solved.
I feel very good after testing it.
The price is now discounted in spring, and the dynamic residential agency is only 35% off!
Supports both daemon and normal modes (daemon mode only supports Linux servers)
By default, guzzle is used for crawling
Support distributed
Supports multiple queue methods such as memory and Redis
Support custom URI filtering
Supports breadth-first and depth-first crawling methods
Comply with PSR-4 standard
Crawling web pages is divided into multiple steps, and each step supports custom actions (such as adding agents, modifying user-agent, etc.)
The flexible expansion mechanism makes it easy to create plug-ins for the framework: custom queues, custom crawling methods...
Beanbun can be installed through composer.
$ composer require kiddyu/beanbun
Create a file start.php with the following content
<?phpuse BeanbunBeanbun;$beanbun = new Beanbun;$beanbun->seed = [ 'http://www.950d.com/', 'http://www.950d.com/list-1.html', ' http://www.950d.com/list-2.html', ];$beanbun->afterDownloadPage = function($beanbun) { file_put_contents(__DIR__ . '/' . md5($beanbun->url), $beanbun->page); };$beanbun->start();
Execute in command line
$ php start.php
Next you can see the captured logs.
beanbun-parser data extraction plug-in https://github.com/kiddyuchina/beanbun-parser
For more details, check out the documentation