wechat_spider Download - wechat_spider Source code download

wechat_spider

AI Source Code

1.0.0

Download

wechat_spider

[Reminder] This crawler tool cannot run due to WeChat API modifications, please refer to the code ideas.

Continuous creation and endless delivery

This project uses the management method to capture WeChat public account articles. First, you need to understand the two mainstream methods of capturing WeChat public accounts. Please refer to my article:

How to elegantly capture historical articles of WeChat public accounts

So now there are generally two methods, one is through Sogou WeChat, and the other is through proxy. This project uses the proxy method to capture.

I originally wrote a more complex tool, using Node.js's anyproxy and PHP's Laravel framework to complete these functions. But one day when I was taking a shower, I finally figured out that I had actually complicated a tool that was originally It's very simple. I gave some guidance to a media friend and he started using it very quickly.

output

There are two things in the output, one is wechat.sqlite, and the other is wechat.csv. Wechat.csv needs to be generated by the command wechat_spider csv .

The following is the data corresponding to my public account:

Table header explanation:

accountName: 公众号名称
author: 作者
title: 文章标题
contentUrl: 文章链接
cover: 文章封面图
digest: 文章摘要
idx: 如果是1，代表的是当天第一篇文章，如果是2，代表当天第二篇文章，以此类推。
sourceUrl: 阅读原文对应的链接
createTime: 文章创建时间
readNum: 阅读数
likeNum: 点赞数
rewardNum: 赞赏数
electedCommentNum: 被选择显示的回复数

Install

Install Node.js

Download the latest version from the website https://nodejs.org/zh-cn/.

Install Python 2.x and other compilation environment dependencies

Because it relies on sqlite, the compilation process through node-gyp requires python 2.x (3.x will not work) and VCBuild.exe, so Windows students must install it, otherwise errors will occur.

Windows users can download and install the compilation environment dependencies by typing npm install --global --production windows-build-tools under PowerShell with administrator rights.

Test that Node and Python are installed correctly

On Mac, under terminal, on Windows, under cmd:

 $ npm -v
4.3.0

$python
Python 2.7.6 (default, Nov 18 2013, 15:12:51)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

If information similar to the above is output, it proves that the tool has been installed.

Install wechat_spider

 $ npm install wechat_spider -g

Test that wechat_spider is installed correctly

 $ wechat_spider --help

Usage: wechat_spider [options]

Options:

-h, --help output usage information
-V, --version output the version number

If information similar to the above is output, it proves that wechat_spider has been successfully installed.

use

There are four steps to use: turn on the proxy, set the proxy on your phone, check the public account history, then start automatically crawling, and finally generate a csv.

You need to install a certificate when opening it for the first time

Step 1: Open the tool in the terminal on Mac or cmd on Windows:

$ wechat_spider

A trust certificate is required for the first time.

The certificate folder will be opened by default. If it is not opened, open http://localhost:8002/fetchCrtFile in the browser and you can also obtain the rootCA.crt file. After obtaining the root certificate, double-click it and follow the operating system prompts to trust the rootCA:

Windows

Mac

Step 2: Use mobile proxy:

For the first time, you need to install a certificate on your mobile phone. Open the browser: http://localhost:8002/qr_root. Use WeChat to scan the QR code. [Important] Open the browser:
Then get the IP address of your computer, assuming it is 192.168.1.5
Set mobile proxy to computer:

Step 3: Select a WeChat official account and click to view the history

Step 4: Wait for the page "Collection of a public account completed" to appear, then you can generate csv.