This project is modified based on the https://github.com/Spritualkb/yuque-spider-plus/ project
Yuque document crawling tool (crawler) can save any user's entire Yuque knowledge base in Markdown format (including the complete directory structure and index). It fixes the problem of special characters in file names causing non-existent paths.
Use: install python3
https://www.python.org/downloads/
Execute the installation and run module
pip install requests tqdm urllib3
Execute the crawl:
python3 main.py 语雀文档地址
demo: python3 main.py https://www.yuque.com/burpheart/phpaudit
在没有登录语雀的情况下:
复制别人知识库时,查看cookie
在登录语雀的情况下:
直接复制所有cookie
command line
Example 1: Provide URL and Cookie
python main.py " https://www.yuque.com/burpheart/phpaudit " --cookie " verified_books=**** "
Example 2: Provide URL, cookie and output path
python main . py "https://www.yuque.com/burpheart/phpaudit" - - cookie "verified_books=****" - - output "download"
Example 3: Provide URL only
python main.py " https://www.yuque.com/burpheart/phpaudit "
Example 4: Provide URL and output path
python main.py " https://www.yuque.com/burpheart/phpaudit " --output " download "
Example 5: Use default parameters (show help information)
python main.py
To fix the problem that some images cannot be loaded locally, download the network images and replace the image path corresponding to markdown with the relative path ./assets path.