Some very interesting python crawler examples, friendly to novices
Some common website crawler examples have higher code versatility and longer timeliness. The project code is relatively friendly to novices . Try to use simple python code with a lot of comments.
Don’t have or don’t know how to set up a proxy? Chinese users can jump to the mirror warehouse code cloud Gitee to download in order to obtain faster download speeds.
#改成你的chromedriver的完整路径地址
chromedriver_path = "/Users/bird/Desktop/chromedriver.exe"
#改成你的微博账号
weibo_username = "改成你的微博账号"
#改成你的微博密码
weibo_password = "改成你的微博密码"
#改成你的chromedriver的完整路径地址
chromedriver_path = "/Users/bird/Desktop/chromedriver.exe"
#改成你的微博账号
weibo_username = "改成你的微博账号"
#改成你的微博密码
weibo_password = "改成你的微博密码"
#改成你的chromedriver的完整路径地址
chromedriver_path = "/Users/bird/Desktop/chromedriver.exe"
#改成你的微博账号
weibo_username = "改成你的微博账号"
#改成你的微博密码
weibo_password = "改成你的微博密码"
Sometimes, you really want to care about her, but you are so busy that she keeps complaining that you don't care about her enough. You secretly made up your mind to send her a message on time next time, even if it was just a few words, but you forgot again. Do you feel aggrieved?, but she feels you are irresponsible.
Now, you don’t have to worry anymore . You can use Python to send regular reminder messages to your girlfriend, and you will not miss every critical moment . You will send messages to her on time every morning when you get up in the morning, when you eat at noon, when you eat in the evening, and when you go to bed at night . And it also allows her to learn English words !
The most important thing is that you can know your girlfriend's emotional index in real time, so you no longer have to worry about your girlfriend getting angry for no reason.
In fact, the choice of wallpaper can largely reveal the inner world of the computer owner. Some people like scenery, some like stars, some like beauties, and some like animals. However, one day you will have aesthetic fatigue, but when you make up your mind to change the wallpaper, you will find that the wallpapers on the Internet are either low resolution or have watermarks.
Here is a small and fresh wallpaper artifact for Mac, Pap.er. It may be the best wallpaper software for Mac. It comes with 5K ultra-clear resolution wallpapers and is rich in various types of wallpapers. When we want to use it under Windows or Linux , you can consider crawling down the 5K ultra-clear resolution wallpaper .
# 跳转到当前目录
cd 目录名
# 先卸载依赖库
pip uninstall -y -r requirement.txt
# 再重新安装依赖库
pip install -r requirement.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# 开始运行
python main.py
This project originated from a course design in my junior year. I often need to search for some movies, but I don’t know which ones have high ratings and a lot of reviews. For ease of use, the original project was rewritten. Think of it as the practice of crawler technology and visualization technology. Mainly by crawling movie data from rankings and movie keywords.
getMovieInRankingList.py
in the current directory, navigate to 107行
, and change executable_path=./chromedriver.exe
to your chromedriver driver pathpip install -r requirement.txt
to install the dependency packages required by the programpython main.py
to run the program When it comes to crawlers, most people will think of using the Scrapy tool, but they only stay at the stage of using it. In order to increase our understanding of the crawler mechanism, we can manually implement the multi-threaded crawler process, and at the same time, introduce an IP proxy pool to perform basic anti-crawling operations.
This time I used Tiantian Fund Network for crawling. This website has an anti-crawling mechanism. At the same time, the number is large enough, and the multi-threading effect is obvious.
000056, CCB Consumer Upgrade Hybrid, 2019-03-26, 1.7740, 1.7914, 0.98, 2019-03-27 15:00
000031,China Renaissance Mixed,2019-03-26,1.5650,1.5709,0.38,2019-03-27 15:00
000048, Huaxia Double Debt Enhanced Bond C, 2019-03-26,1.2230,1.2236,0.05,2019-03-27 15:00
000008,Harvest CSI 500ETF Link A,2019-03-26,1.4417,1.4552,0.93,2019-03-27 15:00
000024, Morgan Stanley Double-profit Enhanced Bond A, 2019-03-26, 1.1670, 1.1674, 0.04, 2019-03-27 15:00
000054, Penghua dual-debt interest-increasing bonds, 2019-03-26, 1.1697, 1.1693, -0.03, 2019-03-27 15:00
000016,China Pure Bond C,2019-03-26,1.1790,1.1793,0.03,2019-03-27 15:00
# 确保安装以下库,如果没有,请在python3环境下执行pip install 模块名
import requests
import random
import re
import queue
import threading
import csv
import json
Have you ever thought about generating a WeChat personal data report to understand your WeChat social history? Now, we conduct comprehensive data analysis on WeChat friends based on Python, including: nickname, gender, age, region, remark, personalized signature, avatar, group chat, official account, etc.
Among them, in terms of analyzing the types of friends, the data of your strangers, starred friends, friends who are not allowed to see my circle of friends, and friends who are not allowed to see his circle of friends are mainly counted. In terms of analyzing regions, we mainly count the distribution of all friends across the country and further analyze the provinces with the largest number of friends. In other aspects, we can count the gender ratio of your friends, guess your closest friends, analyze your special friends, find out the data of friends with the most group chats with you, analyze the personality signatures of your friends, and Your friends' avatars are analyzed and the data of friends using real-life avatars is further detected.
Currently, there are many articles on data analysis in this area on the Internet, but it is troublesome to run. However, the operation of this program is very simple. You only need to scan the QR code to log in and operate it in one step.
# 跳转到当前目录
cd 目录名
# 先卸载依赖库
pip uninstall -y -r requirement.txt
# 再重新安装依赖库
pip install -r requirement.txt
# 开始运行
python generate_wx_data.py
# 安装pyinstaller
pip install pyinstaller
# 跳转到当前目录
cd 目录名
# 先卸载依赖库
pip uninstall -y -r requirement.txt
# 再重新安装依赖库
pip install -r requirement.txt
# 更新 setuptools
pip install --upgrade setuptools
# 开始打包
pyinstaller generate_wx_data.py
In recent years, due to the popularity of WeChat, most people no longer use QQ frequently, so we don’t know much about our QQ data. I believe that it would be extremely happy if you could generate your own QQ history report.
Currently, there are few data analysis tools for QQ on the Internet because QQ-related interfaces are relatively complex. The operation of this program is very simple and has a good user interaction interface. You only need to scan the QR code to log in and operate it in one step.
The data currently obtained by this program include: QQ detailed data, mobile phone online time, online time in non-invisible state, QQ active time, number of one-way friends, QQ property analysis, group chat analysis, group chat data that I quit in the past year, Retire the friend data I deleted for a month, all payment information, the people I care about most and the people who care about me the most. Since the relevant data interfaces have access restrictions, this program does not analyze QQ friends.
# 跳转到当前目录
cd 目录名
# 先卸载依赖库
pip uninstall -y -r requirement.txt
# 再重新安装依赖库
pip install -r requirement.txt
# 开始运行
python main.py
WeChat Moments retains your data, it retains good memories and records every bit of our growth. In a sense, posting on Moments is recording life, feeling life, and seeing everyone's growth every step of the way.
Such a precious memory, why not preserve it? It only takes a cup of coffee to print your circle of friends with one click. It can be a paper book or an e-book, which can be stored for a long time, is better than developing photos, and has a time footprint to remember.
Now, you can choose to print e-books or paper books. If you want to print paper books, you can find a third-party organization to buy them; if you want to print e-books, we can generate them ourselves, which can save a lot of money .
Before starting to write code ideas, let's take a look at the final generated effect.
# 跳转到当前目录
cd 目录名
# 先卸载依赖库
pip uninstall -y -r requirement.txt
# 再重新安装依赖库
pip install -r requirement.txt
# 开始运行
python main.py
Want to see what you have been doing in the past year? Take a look at whether you are fishing for money or working seriously when you go online. Want to write an annual report summary, but have no data? Now, it's here.
This is a Chrome browsing history analysis program that allows you to understand your browsing history. It is suitable for Chrome browsers or browsers based on Chromium. At present, most domestic browsers are browsers with Chromium as the core, so basically they can be used. However, the following browsers are not supported: IE, Firefox, and Safari.
On this page, you will be able to view the top ten rankings of domain names, URLs, and busy days that you have visited and browsed in the past time, as well as related data charts.
Before starting to write code ideas, let's take a look at the final generated effect.
Online demonstration program: http://39.106.118.77:8090 (ordinary server, do not measure pressure)
Running this program is very simple, just follow the following command to run:
# 跳转到当前目录
cd 目录名
# 先卸载依赖库
pip uninstall -y -r requirement.txt
# 再重新安装依赖库
pip install -r requirement.txt
# 开始运行
python app.py
# 运行成功后,通过浏览器打开http://localhost:8090
This project is modified based on @arry-lee's project wereader. Thanks to the original author for providing the source code.
The era of universal reading has arrived. Currently, there are 210 million users of reading software, with more than 5 million daily active users. Among them, young users aged 19-35 account for more than 60%, and users with a bachelor's degree or above account for as much as 80%. In Beijing, Shanghai and Guangzhou, Users from Shenzhen and other provincial capital cities/municipalities account for more than 80%. I am accustomed to using WeChat to read. In order to facilitate organizing books and exporting notes, I developed this small tool.
Before starting to write code ideas, let's take a look at the final generated effect.
# 跳转到当前目录
cd 目录名
# 先卸载依赖库
pip uninstall -y -r requirement.txt
# 再重新安装依赖库
pip install -r requirement.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# 开始运行
python pyqt_gui.py
The project is continuously updated, you are welcome to star this project
The MIT License (MIT)