SwatchBharatUrbanCrawler
1.0.0
This is a crawler that crawls the complete website https://sbmurban.org/rrr-centers and extracts the complete information.
__VIEWSTATE
(Used the https://blog.scrapinghub.com/2016/04/20/scrapy-tips-from-the-pros-april-2016-edition as a tutorial on how to Crawl ASP.NET websites).URL
specified.ASSUMPTION=> For making the post request every 5 minutes, we can put the project in the ScrapingHub, and schedule it to crawl every 5 minutes. The crawler has been made such that it would make a POST request on completing the crawling, and the data would automatically be posted.
ASSUMTIONS=> I have made 1 CSV file only whose table was shown in the task containing all the information. Since all other information can be easily extracted from that file.
git clone https://github.com/sagar-sehgal/SwatchBharaturban_Crawler
virtualenv venv --python=python3
source venv/bin/activate
cd SwatchBharaturban_Crawler
pip install -r requirements.txt
scrapy crawl swatchbharaturban_crawler
The crawled data would be stored in the swatchbharaturban_crawler/data/swatchbharat_data.csv
file.