This tool uses Python2.7 and scrapy to search WeChat public account articles.
Install Scrapy and query directly.
pip install scrapy
python wescraper/scraper.py account liriansu miawu > we.json # 查询liriansu和miawu相关的公众号
python wescraper/scraper.py key-day liriansu miawu > we.json # 查询liriansu和miawu相关的文章(一天内)
Install Scrapy and Tornado and query through the local server:
pip install scrapy tornado
python wescraper/server.py
After the server is started, you can obtain the WeChat public account article list through http://localhost/account/foo/bar/baz...
Or you can use http://localhost/key-year/foo/bar/baz...
to query public account articles by keyword.
See scraper.py source code
For some configurable parameters, see config.py
Querying the public account will get the first one in the list by default.
This tool may be banned. For solutions, please refer to Scrapy: Avoiding getting banned (generally speaking, changing the IP can solve the problem)
A cookie pool is maintained in cookie.py, which will randomly select n cookies for access. If the cookie is banned, a new cookie will be replaced.
Welcome to modify based on this code, remember to run the unit test: python wescraper/test/test.py
This tool completely relies on Sogou WeChat to search and crawl articles. If the Sogou WeChat search interface changes, the crawling may fail.
Python is great!
The code copyright belongs to the original GitHub author @LKI. Commercial use is strictly prohibited, other reproduction/Fork is free.