Download the novel on the https://www.po18.tw website as a txt document.
This website cannot be accessed in mainland China and a proxy must be used.
Only get free/purchased chapter content. Please purchase the paid chapter manually first.
Development environment: Python 3.7
Refer to the po18 novel downloader demo (Python 2.7). The original text only provides example functions and has been briefly sorted out in reference.py
of this project.
BeautifulSoup
Requests
lxml
First find the book ID (the string of numbers after URL /books/
) and assign it to book_number
.
Find the total number of chapter content (see the [four digits] in the previous chapter in the latest chapter in the directory, or see from狀態未完結(目前xxx章回)
), and assign it to chapter_sum
.
You can only access the novel page after logging in, and assign account
and pwd
in login()
as your real account information (this information exists locally and will only be sent to the po18 server for login).
Change txt = open('路径' + book_number + '.txt', 'a')
, find a folder path, and replace the Chinese characters.
login()
-> data{}
's client_ip
is replaced with its own native IP (how to check the IP, don't ask me). If you use this script in moderation, the website server will not respond to IPs that are accessed too frequently.
If the website reports an error, find the last command line output xx https://www.po18.tw/books/---/articles/----- processing...
, assign the number xx
to start
.
Here you usually need to modify the page
parameters in getContent(page)
and calculate it yourself.
Rerun and the download will continue. (This post may be repeated several times)