Webmagic adopts a completely modular design, and its functions cover the entire crawler life cycle (link extraction, page download, content extraction, persistence), supports multi-threaded crawling, distributed crawling, automatic retry, and customized UA/ Cookies and other functions.
Main features of webmagic 1. Completely modular design and strong scalability.
2. The core is simple but covers the entire process of crawlers. It is flexible and powerful and is also a good material for learning how to get started with crawlers.
3. Provide rich extraction page API.
4. No configuration, but a crawler can be implemented through POJO+annotations.
5. Support multi-threading.
6. Support distribution.
7. Support crawling js dynamically rendered pages.
8. No framework dependencies and can be flexibly embedded into projects.