從Scihub下載論文的非官方API。
# Download with a DOI and filenmae is the paper's title.
$ scidownl download --doi https://doi.org/10.1145/3375633
# Download with a PMID and a user-defined filepath
$ scidownl download --pmid 31395057 --out ./paper/paper-1.pdf
# Download with a title
$ scidownl download --title " ImageNet Classification with Deep Convolutional Neural Networks " --out ./paper/paper-1.pdf
# Download with a proxy: SCHEME=PROXY_ADDRESS
$ scidownl download --pmid 31395057 --out ./paper/paper-1.pdf --proxy http=socks5://127.0.0.1:7890
Scidownl可以輕鬆地使用PIP安裝。
$ pip3 install -U scidownl
$ git clone https://github.com/Tishacy/SciDownl.git
$ cd Scidownl && python3 setup.py install
$ scidownl -h
Usage: scidownl [OPTIONS] COMMAND [ARGS]...
Command line tool to download pdfs from Scihub.
Options:
-h, --help Show this message and exit.
Commands:
config Get global configs.
domain.list List available SciHub domains in local db.
domain.update Update available SciHub domains and save them to local db.
download Download paper(s) by DOI or PMID.
$ scidownl domain.update --help
Usage: scidownl domain.update [OPTIONS]
Update available SciHub domains and save them to local db.
Options:
-m, --mode TEXT update mode, could be ' crawl ' or ' search ' , default mode is
' crawl ' .
-h, --help Show this message and exit.
您可以使用一個選項指定2種更新模式: -m
或--mode
crawl
:[默認]爬網上更新的Scihub域網站(又稱Scihub域源)以獲取可用的Scihub域。 Scihub域源網站URL在[scihub.domain.updater.crawl]
部分中的全局配置文件中配置了scihub_domain_source
的鍵。您可以使用scidownl config --location
顯示全局配置文件的位置並進行編輯。
; Global config file: global.ini
; ...
[scihub.domain.updater.crawl]
scihub_domain_source = http://tool.yovisun.com/scihub
; ...
使用crawl
模式的一個示例:
$ scidownl domain.update --mode crawl
[INFO] | 2022/03/07 21:07:50 | Found 6 valid SciHub domains in total: [ ' http://sci-hub.ru ' , ' http://sci-hub.se ' , ' https://sci-hub.ru ' , ' https://sci-hub.st ' , ' http://sci-hub.st ' , ' https://sci-hub.se ' ]
[INFO] | 2022/03/07 21:07:50 | Saved 6 SciHub domains to local db.
search
:根據Scihub域的規則生成組合,並蒐索可用的Scihub域。這將比crawl
模式更長。
使用search
模式的一個示例:
$ scidownl domain.update --mode search
[INFO] | 2022/03/07 21:08:44 | # Search valid SciHub domains from 1352 urls
[INFO] | 2022/03/07 21:08:48 | # Found a SciHub domain url: https://sci-hub.ru
[INFO] | 2022/03/07 21:08:48 | # Found a SciHub domain url: https://sci-hub.st
...
[INFO] | 2022/03/07 21:09:04 | Found 6 valid SciHub domains in total: [ ' https://sci-hub.ru ' , ' https://sci-hub.st ' , ...]
[INFO] | 2022/03/07 21:09:04 | Saved 6 SciHub domains to local db.
Scidownl使用SQLITE作為本地數據庫,以本地存儲所有更新的Scihub域。您可以使用命令domain.list
列出所有保存的Scihub域。list。
$ scidownl domain.list
+--------------------+----------------+---------------+
| Url | SuccessTimes | FailedTimes |
| --------------------+----------------+--------------- |
| http://sci-hub.ru | 0 | 0 |
| https://sci-hub.ru | 0 | 0 |
| https://sci-hub.st | 0 | 0 |
| http://sci-hub.st | 0 | 0 |
| https://sci-hub.se | 0 | 0 |
| http://sci-hub.se | 0 | 0 |
+--------------------+----------------+---------------+
除了易於理解的URL列外, SuccessTimes
列還使用此URL記錄成功下載紙的數量,而FailedTimes
列則用於使用此URL記錄失敗的紙張下載次數。這兩列用於計算下載論文時選擇Scihub域的優先級。
$ scidownl download --help
Usage: scidownl download [OPTIONS]
Download paper(s) by DOI or PMID.
Options:
-d, --doi TEXT DOI string. Specifying multiple DOIs is supported,
e.g., --doi FIRST_DOI --doi SECOND_DOI ...
-p, --pmid INTEGER PMID numbers. Specifying multiple PMIDs is supported,
e.g., --pmid FIRST_PMID --pmid SECOND_PMID ...
-t, --title TEXT Title string. Specifying multiple titles is
supported, e.g., --title FIRST_TITLE --title
SECOND_TITLE ...
-o, --out TEXT Output directory or file path, which could be an
absolute path or a relative path. Output directory
examples: /absolute/path/to/download/,
./relative/path/to/download/, Output file examples:
/absolute/dir/paper.pdf, ../relative/dir/paper.pdf.
If --out is not specified, paper will be downloaded
to the current directory with the file name of the
paper's title. If multiple DOIs or multiple PMIDs are
provided, the --out option is always considered as
the output directory, rather than the output file
path.
-u, --scihub-url TEXT Scihub domain url. If not specified, automatically
choose one from local saved domains. It's recommended
to leave this option empty.
-h, --help Show this message and exit.
使用選項-d
或--doi
下載使用DOI,選項-p
或--pmid
下載使用PMID的論文以及選項-t
或--title
論文,以下載帶有標題的論文。您可以多次指定這些選項,甚至可以混合使用。
# with a single DOI
$ scidownl download --doi https://doi.org/10.1145/3375633
# with multiple DOIs
$ scidownl download --doi https://doi.org/10.1145/3375633 --doi https://doi.org/10.1145/2785956.2787496
# with a single PMID
$ scidownl download --pmid 31395057
# with multiple PMIDs
$ scidownl download --pmid 31395057 --pmid 24686414
# with a single title
$ scidownl download --title " ImageNet Classification with Deep Convolutional Neural Networks "
# with multiple titles
$ scidownl download --title " ImageNet Classification with Deep Convolutional Neural Networks " --title " Aggregated residual transformations for deep neural networks "
# with a mix of DOIs and PMIDs
$ scidownl download --doi https://doi.org/10.1145/3375633 --pmid 31395057 --pmid 24686414
默認情況下,下載的紙是由紙的標題命名的。使用選項-o
或--out
,您可以自定義下載論文的輸出位置,WHCIH可能是絕對路徑或相對路徑,而直接路徑或文件路徑。
將PAEPR輸出到目錄:
$ scidownl download --pmid 31395057 --out /absolute/path/of/a/directory/
# NOTE that the '/' at the end of the directory path is required, otherwise the last segment will be treated as the filename rather than a directory.
$ scidownl download --pmid 31395057 --out ../relative/path/of/a/directory/
# The '/' at the end of the directory path is required too.
用文件路徑輸出紙張。
$ scidownl download --pmid 31395057 --out /absolute/dir/paper.pdf
$ scidownl download --pmid 31395057 --out ../relative/dir/paper.pdf
$ scidownl download --pmid 31395057 --out relative/dir/paper.pdf
$ scidownl download --pmid 31395057 --out paper # will be downlaoded as ./paper.pdf
請注意,如果要下載多個論文,則--out
選項的值將始終被視為目錄,而不是文件路徑。
$ scidownl download --pmid 31395057 --pmid 24686414 --out paper
# will be downloaded to ./paper/ directory:
# ./paper/<paper-title-1>.pdf
# ./paper/<paper-title-2>.pdf
如果該選項中的某些目錄不存在,Scidownl將為您創建它們?
使用Option -u
或--scihub-url
,您可以使用所需的特定Scihub URL,而不是讓Scidownl自動從本地保存的Scihub域為您選擇一個。建議讓Scidownl選擇一個Scihub URL,因此您無需在正常使用中使用此選項。
$ scidownl download --pmid 31395057 --scihub-url http://sci-hub.se
您可以使用scihub_download
功能下載論文。
from scidownl import scihub_download
paper = "https://doi.org/10.1145/3375633"
paper_type = "doi"
out = "./paper/one_paper.pdf"
proxies = {
'http' : 'socks5://127.0.0.1:7890'
}
scihub_download ( paper , paper_type = paper_type , out = out , proxies = proxies )
在示例中可以看到更多示例。
版權(C)2022 Tishacy。
根據MIT許可獲得許可。