Recently I saw a WeChat canvassing post, and I did some research on it when I had nothing to do. I found that this activity can be used to brush votes. I will briefly record the process of writing a script to brush votes. In fact, the implementation of this kind of crawler code is always a small problem. The important thing is that you need to know the logic of other people's pages. How to analyze and crawl it is the difficult part.
Open the WeChat voting page, pull down the screen and you will find that "This webpage is provided by XXX" is displayed at the top of the screen. It should be noted that "XXX" here is not "mp.weixin.qq.com", but the event organizer Party’s domain name. In other words, the program for this voting activity is running on the server of S Mall. This involves the concept of OpenID on the WeChat public platform. The official explanation of OpenID is: after encrypting WeChat ID, each user has a unique OpenID for each public account. That is, a user has a unique OpenId for a public account.
The logic of voting is that the user will provide the user's OpenID in the POST parameter when making a voting request; after receiving the voting POST request, the S mall server can block a single user by querying whether the current OpenID has voted within 4 hours. The act of voting is repeated.
However, there is a big loophole here!
S Mall can only determine whether the OpenID is duplicated, but it cannot verify the validity of the OpenID because it cannot call the WeChat server to verify the OpenID.
Then we only need to generate an OpenId that conforms to the format and send a post request.
But one vote is also weird in that it's actually done in two steps. The first time is a get request (the coded part in the picture is designed for privacy. You only need to know that one vote is completed through two requests.
When I saw the name of the first request, I thought that the request was completed, but when using a crawler to access this interface, it did not increase the number of votes. Upon closer inspection, I discovered that there was another request.
The path of this request is very strange, it is a string of garbled characters, and it is different every time. After only triggering the first request, no other operations were performed. At this time, I can be sure that a vote is to get some randomly generated parameters through the first request, and then bring these parameters with the second request. To ensure the legitimacy of the second request, this completes a voting process. So there must be somewhere in the page of the first request that calls the second request. When I checked the source code at this time, I found that there was a string of code like this in the page
When garbled characters are found on the page, it is often because the code has been encrypted.
Two of the parameters are the paths of the second request. It can be seen that this string of garbled characters is related to the second request, and the string of emoticons is the encryption of the js code. Although we don’t understand what this string of garbled characters means, we can press the semicolon (;) to format the string of garbled characters and run it directly in the chrome console. We find that the effect of this string of emoji codes is to execute the second request. (This place should be able to restore the code. If you know it, you can explain it.)
This is basically done, and the rest is code implementation. In general, it is to access the first request, use regular rules to crawl the parameters in the page, and use the parameters as the path of the second request to process the second request. Requested access. Of course, there is also an IP proxy, with random time intervals for access. It is best to dynamically simulate different devices. That is, we will not explain the common problems of modifying the User-Agent. If you have any questions about these, you can send an email.
In general, it is not difficult to implement code using python. The difficulty lies in analyzing step by step, mastering the logic of the website, and constantly trying. This will only be effective if you do it more