SwatchBharatUrbanCrawler Download - SwatchBharatUrbanCrawler Source code download

SwatchBharatUrbanCrawler

Other categories

1.0.0

Download

Swatch Bharat Urban Crawler

This is a crawler that crawls the complete website https://sbmurban.org/rrr-centers and extracts the complete information.

About

This crawler was build as a task for ATLAN.
The complete data from the website was crawled and stored in a single file
This was a new task and learned how to scrap the ASP.NET websites which use __VIEWSTATE (Used the https://blog.scrapinghub.com/2016/04/20/scrapy-tips-from-the-pros-april-2016-edition as a tutorial on how to Crawl ASP.NET websites).
Also at the end of complete data scrapping, a POST request would be made to the URL specified.
Also, the setup.py file has been added.
The extracted file contains the following columns:-
- State
- District
- ULB Name
- Ward
- No. of Applications Received
- No. of Applications Not Verified
- No. of Applications Verified
- No. of Applications Approved
- No. of Applications Approved having Aadhar No.
- No. of Applications Rejected
- No. of Applications Pullback
- No. of Applications Closed
- No. of Constructed Toilet Photo
- No. of Commenced Toilet Photo
- No. of Constructed Toilet Photo through Swachhalaya

Doubts/Assumptions

DOUBT=> How can we make a POST request every 5 minutes, since the data crawling itself takes a lot more of time.

ASSUMPTION=> For making the post request every 5 minutes, we can put the project in the ScrapingHub, and schedule it to crawl every 5 minutes. The crawler has been made such that it would make a POST request on completing the crawling, and the data would automatically be posted.

DOUBT=> How many output files are required? Like 1 file containing all the information. Or the 4 Files containing information for 4 different levels like State, District, ULB and Ward Level.

ASSUMTIONS=> I have made 1 CSV file only whose table was shown in the task containing all the information. Since all other information can be easily extracted from that file.

How to Setup

Clone the repository

git clone https://github.com/sagar-sehgal/SwatchBharaturban_Crawler

Make a Virtual Environment

virtualenv venv --python=python3

Activate the virtualenv

source venv/bin/activate

Change the Repository

cd SwatchBharaturban_Crawler

Install the dependencies

pip install -r requirements.txt

Run the Crawler

scrapy crawl swatchbharaturban_crawler

The crawled data would be stored in the swatchbharaturban_crawler/data/swatchbharat_data.csv file.

Expand

Additional Information

Version 1.0.0
Type Other categories
Update Time 2024-12-16
size 50MB
From Github

Related Applications

catalogonline

2024-12-14
ProEventos App

2024-12-14
MichaelBrandonMorris.KingsportMillSafetyTraining

2024-12-14
itransition hypnofrog

2024-12-14
MVC_CRUD_With_Translator

2024-12-14
marco exceptions core

2024-12-14

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
catalogonline

Other categories

1.0.0
ProEventos App

Other categories

1.0.0
MichaelBrandonMorris.KingsportMillSafetyTraining

Other categories

1.0.0
waymo open dataset

Other source code

December 2023 Update
termwind

Other categories

v2.3.0
wp functions

Other categories

1.0.0

Related Information All