It is said that people on the Internet only have seven seconds of memory, but I want to record these seven seconds of memory.
The project has been deployed on the server. It will crawl Weibo's hot search list regularly at 11 am and 11 pm every day, save it in Markdown file format, and then upload and backup it to GitHub. You can download and view it at will.
Don't ask me why I chose the two time points of 11, because I always feel that big events will happen around these two time points.
No matter what the hot searches on Weibo are about family affairs, national affairs, world affairs, or entertainment gossip, I just want to faithfully record it...
Python 3.0+
pip install requests
pip install lxml
pip install bs4
or execute
pip install -r requirements.txt
Environment required for installation and operation
weibo_Hot_Search_bs4.py
(new) or weibo_Hot_Search.py
in the warehouse directorypython weibo_Hot_Search_bs4.py
(new) or python weibo_Hot_Search.py
in cmdAfter running, a folder named with time will be generated in the current folder, as follows:
2019年11月08日
(Updated) and a Markdown file named with a specific time in specific hours will be generated, as follows:
2019年11月08日15点.md
(Continue to update) and a csv file named with a specific time in specific hours will be generated, as follows:
2020年08月27日00点.csv
The public hot search list link on Sina Weibo is used: https://s.weibo.com/top/summary/
All data sources for this project come from Sina Weibo. The data content and its interpretation rights belong to Sina Weibo.
weibo_Hot_Search_bs4.py
./bs4版数据/
directory. The storage data format is序号-标题-热度(或置顶)
. This format is easy to process and facilitates subsequent data visualization and other analyses..csv
files are stored in bs4[.csv]版数据
folder. bs4[txt]版数据
and lxml版数据
have been stopped. All new data are saved in bs4[.csv]
version data files.GNU General Public License v3.0