weibo_wordcloud Download - weibo_wordcloud Source code download

weibo_wordcloud

Other source code

1.0.0

Download

Weibo crawler and word cloud display

environment

Python 3
requests
jieba
matplotlib
wordcloud
scipy

reptile

Since the mobile web version does not impose too many restrictions on crawlers, it can directly crawl some Weibo search data. The search API is as follows:

 https://m.weibo.cn/api/container/getIndex?type=wb&queryVal={}&containerid=100103type=2%26q%3D{}&page={}

Based on this API, a certain amount of JSON data can be obtained (see sample.json for the original data). After processing, the format is as follows:

{
    "mid" : " 4199434918992223 " ,
    "text" : " 【深度学习的终极形态】近期，院友袁进辉博士回到微软亚洲研究院做了题为《打造最强深度学习引擎》的报告，分享了深度学习框架方面的技术进展。他在报告中启发大家思考如何才能“鱼和熊掌兼得”，让软件发挥灵活性，硬件发挥高效率。我们整理了本次报告的重点，希望能对大家有所帮助！  ...全文" ,
    "userid" : " 1286528122 " ,
    "username" : "微软亚洲研究院" ,
    "reposts_count" : 21 ,
    "comments_count" : 1 ,
    "attitudes_count" : 9
}

For detailed crawlers, see weibo_search.py.

word cloud

Word cloud can be implemented using wordcloud. The basic steps are:

Word segmentation and keyword extraction: Chinese text requires word segmentation and the removal of a large number of stop words, such as (you, me, him, this), in order to make the generated word cloud more meaningful. This step can be completed directly using the TF-IDF keyword extraction of jieba word segmenter.
What is passed into wordcloud is a string and an underlying image. Concatenate the keywords obtained in the first step with spaces. For the selection of the underlying image, try to choose a white background image, so that the generated image will be closer to the original. picture.

See weibo_cloud.py for code details.