NCBI gene sequence Downloader Download - NCBI gene sequence Downloader Source code download

NCBI gene sequence Downloader

Other source code

Download

This downloader can quickly retrieve genes with the same name from different species with known GenBank numbers in the NCBI nucleotide database. The retrieved files will be named in the format of " species name_GenBank number_gene name_sequence position.fasta " .

The downloaded file can be used to compare the nucleotide sequences of a certain gene between different species and draw a genetic evolutionary tree (other programs are required).

This work aims to establish a large-scale, automated method for downloading specified gene (nucleotide) sequences in the NCBI database to reduce unnecessary repetitive work and improve the efficiency of genetic evolution analysis.

How to use

This downloader is written in Python language.

Automatic parsing of web pages is completed by selenium and lxml, and resource downloading is completed by urllib.

Selenium needs to be configured.

Modify the save path of downloaded files
Modify savepath_prefix to a customized folder path.
```
 savepath_prefix = 'file save path prefix'
```
Modify the path to import Gebank table
Currently only csv format is supported.
Modify csv_path to a customized file path.
```
 csv_path = '*.csv'
```
The csv file needs to be filled in strictly according to the three column titles of serum_type, representative_strain, and GenBank. Serum_type is the serum type , representative_strain is the representative strain , and GenBank is the number . The serum type and GenBank number are required, and the representative strain is optional.

Execute the downloader.py code to start crawling and downloading.

Notice

This code currently only supports the gene fragment sequence of the three product keywords of product gene note , which are hexon hexon protein fiber fiber protein fiber1 fiber1 protein fiber2 fiber2 protein , as shown in the figure below.

If you have any questions, please send an email to [email protected]

Expand

Additional Information

Version
Type Other source code
Update Time 2024-11-14
size 50MB
From Github

Related Applications

sequence v1.0

2024-11-13
TikTok Downloader

2024-11-02
The Lost Gene

2023-03-24
Gene Rain: Wind Tower

2022-08-17
YouTube Downloader

2009-05-07
RapidGet downloader

2009-04-28

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
waymo open dataset

Other source code

December 2023 Update
Sunamu

Other source code

Release 2.2.0
chat.petals.dev

Other source code

1.0.0
waymo open dataset

Other source code

December 2023 Update
termwind

Other categories

v2.3.0
wp functions

Other categories

1.0.0

Related Information All