Continuous creation and endless delivery
This project uses the management method to capture WeChat public account articles. First, you need to understand the two mainstream methods of capturing WeChat public accounts. Please refer to my article:
How to elegantly capture historical articles of WeChat public accounts
So now there are generally two methods, one is through Sogou WeChat, and the other is through proxy. This project uses the proxy method to capture.
I originally wrote a more complex tool, using Node.js's anyproxy and PHP's Laravel framework to complete these functions. But one day when I was taking a shower, I finally figured out that I had actually complicated a tool that was originally It's very simple. I gave some guidance to a media friend and he started using it very quickly.
There are two things in the output, one is wechat.sqlite, and the other is wechat.csv. Wechat.csv needs to be generated by the command wechat_spider csv
.
The following is the data corresponding to my public account:
Table header explanation:
accountName: 公众号名称 author: 作者 title: 文章标题 contentUrl: 文章链接 cover: 文章封面图 digest: 文章摘要 idx: 如果是1,代表的是当天第一篇文章,如果是2,代表当天第二篇文章,以此类推。 sourceUrl: 阅读原文对应的链接 createTime: 文章创建时间 readNum: 阅读数 likeNum: 点赞数 rewardNum: 赞赏数 electedCommentNum: 被选择显示的回复数
Download the latest version from the website https://nodejs.org/zh-cn/.
Because it relies on sqlite, the compilation process through node-gyp requires python 2.x (3.x will not work) and VCBuild.exe, so Windows students must install it, otherwise errors will occur.
Windows users can download and install the compilation environment dependencies by typing npm install --global --production windows-build-tools
under PowerShell with administrator rights.
On Mac, under terminal, on Windows, under cmd:
$ npm -v 4.3.0 $python Python 2.7.6 (default, Nov 18 2013, 15:12:51) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
If information similar to the above is output, it proves that the tool has been installed.
$ npm install wechat_spider -g
$ wechat_spider --help Usage: wechat_spider [options] Options: -h, --help output usage information -V, --version output the version number
If information similar to the above is output, it proves that wechat_spider has been successfully installed.
There are four steps to use: turn on the proxy, set the proxy on your phone, check the public account history, then start automatically crawling, and finally generate a csv.
Step 1: Open the tool in the terminal on Mac or cmd on Windows:
$ wechat_spider
A trust certificate is required for the first time.
The certificate folder will be opened by default. If it is not opened, open http://localhost:8002/fetchCrtFile in the browser and you can also obtain the rootCA.crt file. After obtaining the root certificate, double-click it and follow the operating system prompts to trust the rootCA:
Windows
Mac
Step 2: Use mobile proxy:
For the first time, you need to install a certificate on your mobile phone. Open the browser: http://localhost:8002/qr_root. Use WeChat to scan the QR code. [Important] Open the browser:
Then get the IP address of your computer, assuming it is 192.168.1.5
Set mobile proxy to computer:
Step 3: Select a WeChat official account and click to view the history
Step 4: Wait for the page "Collection of a public account completed" to appear, then you can generate csv.
$ wechat_spider csv
I'm Jinma, a programmer who wants to do something. If this gadget is helpful to you, you can buy me a cup of coffee, thank you :)
MIT.