Supports customizing the header of this request through header (map) in the Request object, and supports customizing cookies through seimiCookies. The customized cookies will directly enter the cookiesStore and will still be valid for the second request in the same domain.
Optimize the default startup mode, modify cn.wanghaomiao.seimi.boot.Run to support CommandLineParser, you can use -c and -p to pass parameters, where -c is used to specify crawlernames, multiples are separated by ',', and -p specifies a port , you can selectively start an embedded http service and enable the use of the embedded http interface.
The maven-compiler-plugin packaging plug-in has been upgraded to 1.3.0, the script under Linux has been improved, and the startup configuration file has been added. You can view it in detail on the maven-compiler-plugin homepage.
The default downloader is changed to ApacheHttpclient, and the backup is the downloader OkHttp3 implementation
Optimize some code
By default, all demo logs are output to the console.
Introduction to SeimiCrawler (Java crawler framework)SeimiCrawler is an agile, independently deployed, distributed Java crawler framework. It hopes to minimize the threshold for novices to develop a crawler system with high availability and good performance, and improve the development efficiency of crawler system development. In the world of SeimiCrawler, most people only need to worry about writing the business logic of crawling, and Seimi will handle the rest for you. In terms of design concept, SeimiCrawler is inspired by Python's crawler framework Scrapy. It also integrates the characteristics of the Java language itself and the characteristics of Spring. It hopes to make it more convenient and common in China to use more efficient XPath to parse HTML, so SeimiCrawler's default HTML parser is JsoupXpath (an independent extension project, not included with jsoup) uses XPath to parse and extract HTML data by default (of course, you can also choose other parsers for data processing). And combined with SeimiAgent, it completely and perfectly solves the problem of complex dynamic page rendering and crawling.
SeimiCrawler (Java crawler framework) display