BaiduSpider Download - BaiduSpider Source code download

BaiduSpider

Other source code

1.0.0

Download

BaiduSpider

A powerful tool for crawling Baidu
Simplified Chinese | Traditional Chinese | English
Get started quickly »

View examples · Report a problem · Request a requirement

Table of contents

About this project
- Dependent libraries
Start
- preconditions
- Install
Simple to use
Project roadmap
Project co-construction
Open source agreement
Contact information
Disclaimer
Contributor
Acknowledgments

About this project

The search engine is a very powerful tool, and if other tools can be integrated with the many powerful functions of the search engine, then these tools will become even more powerful. But currently I have not found an open source crawler that can accurately extract search engine search results. So, I wrote this project to crawl Baidu search engine: BaiduSpider.

BaiduSpider’s unique features:

It saves time in extracting data and is a good help for data model establishment and training in similar deep learning projects.
Accurately extract data and remove ads.
The search results are large and comprehensive, supporting multiple search types and return types.

Of course, no project is perfect. The development of any project requires the help of the community. You can help BaiduSpider progress by publishing an Issue or submitting a PR! :smile:

Some helpful documents or tools are listed in the Acknowledgments section at the end.

Dependent libraries

Some of the main open source dependency libraries used by BaiduSpider.

BeautifulSoup 4
requests

Start

In order to install BaiduSpider, please follow the following few steps.

preconditions

Before installing BaiduSpider, please make sure you have Python3.6+ installed:

$ python --version

If the version is less than 3.6.0 , please go to the Python official website to download and install Python.

Install

Install using `pip`

Please type at the command line:

$ pip install baiduspider

Install manually from GitHub

$ git clone [email protected]:BaiduSpider/BaiduSpider.git

# ...

$ python setup.py install

Simple to use

You can use the following code to obtain Baidu's web search results through BaiduSpider:

 # 导入BaiduSpider
from baiduspider import BaiduSpider
from pprint import pprint

# 实例化BaiduSpider
spider = BaiduSpider ()

# 搜索网页
pprint ( spider . search_web ( query = 'Python' ))

For more samples and configurations, please refer to the documentation

Project roadmap

Please refer to Opening Issues for the latest project plans and known issues.

Project co-construction

Community contributions are the soul of open source projects and are also the way for the entire open source community to learn, communicate, and gain inspiration. We strongly welcome anyone to participate in the development and maintenance of this project.

Specific steps to participate are as follows:

Fork this project
Create Feature branch ( git checkout -b NewFeatures )
After each code modification, commit your changes ( git commit -m 'Add some AmazingFeature' )
Push changes to your own remote repository ( git push origin username/BaiduSpider )
Open your repository on GitHub and submit a PR according to the guidelines

Open source agreement

This project is open source based on GPL-V3 , please see LICENSE for details.

Contact information

samzhangjy - @samzhangjy - [email protected]

Project link: https://github.com/BaiduSpider/BaiduSpider

Disclaimer

This project is for learning purposes only and cannot be used for commercial purposes or to crawl large amounts of Baidu data. In addition, this project uses the GPL-V3 copyright agreement, which means that any other projects involving (using) this project must be open source and indicate the source, and the author of this project does not bear any legal risks caused by misuse. It is hereby stated that violators shall bear the consequences at their own risk.