bigdata_analyse
1.0.0
This repo is a collection of data analysis projects that I have practiced. Each project will include a friendly description document to explain and display the entire development process. It will also provide relevant data sets for download and practice.
By using different technology stacks and analyzing data sets from different industries, we hope to achieve the following goals:
Jupyter notebook is a web interactive python editor. It is installed directly through pip and also supports markdown. It is very suitable for data analysis visualization, writing articles, writing sample codes, etc.
theme | Processing method | technology stack | Data set download |
---|---|---|---|
Analysis of 100 million Taobao user behavior data | Offline processing | Clean hive + analyze hive + visualize echarts | Alibaba Cloud or Baidu network disk extraction code: 5ipq |
Real-time analysis of 10 million Taobao user behavior data | real time processing | Data source kafka + real-time analysis flink + visualization (es + kibana) | Baidu network disk extraction code: m4mc |
Analysis of 3 million player data of "Barbarian Age" | Offline processing | Clean pandas + analyze mysql + visualize pyecharts | Baidu network disk extraction code: paq4 |
Analysis of 1.3 million Shenzhen Pass card swiping data | Offline processing | Clean pandas + analyze impala + visualize dbeaver | Baidu network disk extraction code: t561 |
Analysis of 100,000 pieces of Xiamen recruitment data | Offline processing | Clean pandas + analyze hive + visualization (hue + pyecharts) + predict sklearn | Baidu network disk extraction code: 9wx0 |
Analysis of 7,000 rental data | Offline processing | Clean pandas + analyze sqlite + visualize matplotlib | Baidu network disk extraction code: 9en3 |
Analysis of 6,000 bankrupt enterprise data | Offline processing | Clean pandas + analyze pandas + visualization (jupyter notebook + pyecharts) | Baidu network disk extraction code: xvgm |
COVID-19 epidemic data analysis | Offline processing | Clean pandas + analyze pandas + visualization (jupyter notebook + pyecharts) | COVID-19 or Baidu network disk extraction code: wgmg |
Analysis of 70,000 Tmall order data | Offline processing | Clean pandas + analyze pandas + visualization (jupyter notebook + pyecharts) | Baidu network disk extraction code: 27nr |
- https://tianchi.aliyun.com/dataset/
- https://opendata.sz.gov.cn/data/api/toApiDetails/29200_00403601
- https://www.kesci.com/home/dataset
- https://github.com/CSSEGISandData/COVID-19