AntiFold 다운로드 - AntiFold 소스 코드 다운로드

안티폴드

AntiFold는 항체 가변 도메인 구조에 적합한 서열을 예측합니다. 이 도구는 잔류 로그 우도를 CSV 형식으로 출력하고 시퀀스를 FASTA 형식으로 직접 샘플링할 수 있습니다. 샘플링된 시퀀스는 실험 구조와 높은 구조적 일치를 보여줍니다.

AntiFold는 ESM-IF1 모델을 기반으로 하며 SAbDab 및 OAS의 해결 및 예측 항체 구조를 미세 조정합니다.

용지: arXiv 사전 인쇄
웹서버: OPIG 웹서버
공동 연구실:
모델: model.pt
라이센스: BSD 3절

웹서버

AntiFold를 설치하지 않고 사용해 보려면 OPIG 웹서버(https://opig.stats.ox.ac.uk/webapps/antifold/)를 참조하세요.

AntiFold 설치 및 실행

Github 소스에서 다운로드 및 설치(권장 - 최신 릴리스)

conda create --name antifold python=3.10 -y && conda activate antifold
conda install -c conda-forge pytorch
git clone https://github.com/oxpig/AntiFold && cd AntiFold
pip install .

GPU 전용: Environment.yml을 사용하여 설치

conda create env -f environment.yml
python -m pip install .

CUDA 버전에 따라 Environment.yml 파일에서 종속성 pytorch-cuda=12.1 변경해야 할 수도 있습니다. 시스템에 pytorch를 올바르게 설치하는 방법에 대한 자세한 지침은 여기에서 확인할 수 있습니다.

AntiFold 실행(역폴딩 확률, 샘플 시퀀스)

 # Run AntiFold on single PDB/CIF file
# Nb: Assumes first chain heavy, second chain light
python antifold/main.py 
    --pdb_file data/pdbs/6y1l_imgt.pdb

# Antibody-antigen complex
python antifold/main.py 
    --pdb_file data/antibody_antigen/3hfm.pdb 
    --heavy_chain H 
    --light_chain L 
    --antigen_chain Y

# Nanobody or single-chain
python antifold/main.py 
    --pdb_file data/nanobody/8oi2_imgt.pdb 
    --nanobody_chain B

# Folder of PDB/CIFs
# Nb: Assumes first chain heavy, second light
python antifold/main.py 
    --pdb_dir data/pdbs

# Specify chains to run in a CSV file (e.g. antibody-antigen complex)
python antifold/main.py 
    --pdb_dir data/antibody_antigen 
    --pdbs_csv data/antibody_antigen.csv

# Sample sequences 10x (paired VH/VL only)
python antifold/main.py 
    --pdb_file data/pdbs/6y1l_imgt.pdb 
    --heavy_chain H 
    --light_chain L 
    --num_seq_per_target 10 
    --sampling_temp " 0.2 " 
    --regions " CDR1 CDR2 CDR3 "

# Run all chains with ESM-IF1 model weights
python antifold/main.py 
    --pdb_dir data/pdbs 
    --esm_if1_mode

주피터 노트북

노트북: 노트북.ipynb

공동 연구실:

 import antifold
import antifold . main

# Load model
model = antifold . main . load_model ()

# PDB directory
pdb_dir = "data/pdbs"

# Assumes first chain heavy, second chain light
pdbs_csv = antifold . main . generate_pdbs_csv ( pdb_dir , max_chains = 2 )

# Sample from PDBs
df_logits_list = antifold . main . get_pdbs_logits (
    model = model ,
    pdbs_csv_or_dataframe = pdbs_csv ,
    pdb_dir = pdb_dir ,
)

# Output log probabilites
df_logits_list [ 0 ]

입력 매개변수

필수 매개변수:

 Input PDBs should be antibody variable domain structures (IMGT positions 1-128).

If no chains are specified, the first two chains will be assumed to be heavy light.
If custom_chain_mode is set, all (10) chains will be run.

- Option 1: PDB file (--pdb_file). We recommend specifying heavy and light chain (--heavy_chain and --light_chain)
- Option 2: PDB folder (--pdb_dir) + CSV file specifying chains (--pdbs_csv)
- Option 3: PDB folder, infer 1st chain heavy, 2nd chain light

새 시퀀스를 생성하기 위한 매개변수:

 PDBs should be IMGT annotated for the sequence sampling regions to be valid.

- Number of sequences to generate (--num_seq_per_target)
- Region to mutate (--region) based on inverse folding probabilities. Select from list in IMGT_dict (e.g. 'CDRH1 CDRH2 CDRH3')
- Sampling temperature (--sampling_temp) controls generated sequence diversity, by scaling the inverse folding probabilities before sampling. Temperature = 1 means no change, while temperature ~ 0 only samples the most likely amino-acid at each position (acts as argmax).

선택적 매개변수:

 - Multi-chain mode for including antigen or other chains (--custom_chain_mode)
- Extract latent representations of PDB within model (--extract_embeddings)
- Use ESM-IF1 instead of AntiFold model weights (--esm_if1_mode), enables custom_chain_mode

예제 출력

웹 서버 출력의 예는 https://opig.stats.ox.ac.uk/webapps/antifold/results/example_job/을 참조하세요.

잔여 로그 확률이 포함된 CSV 출력: 잔여 확률: 6y1l_imgt.csv

pdb_pos - PDB 잔여 번호
pdb_chain - PDB 체인
aa_orig - PDB 잔여물(예: 112)
aa_pred - 이 위치에 대해 AntiFold(예: argmax)에 의해 예측된 상위 잔여물
pdb_posins - 삽입 코드가 있는 PDB 잔여 번호(예: 112A)
난처함(perplexity) - 돌연변이에 대한 역접힘 내성(높을수록 더 관대함)입니다. 자세한 내용은 논문을 참조하세요.
아미노산 - 주어진 위치에 대한 역 접기 점수(로그 우도)

 pdb_pos,pdb_chain,aa_orig,aa_pred,pdb_posins,perplexity,A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
2,H,V,M,2,1.6488,-4.9963,-6.6117,-6.3181,-6.3243,-6.7570,-4.2518,-6.7514,-5.2540,-6.8067,-5.8619,-0.0904,-6.5493,-4.8639,-6.6316,-6.3084,-5.1900,-5.0988,-3.7295,-8.0480,-7.3236
3,H,Q,Q,3,1.3889,-10.5258,-12.8463,-8.4800,-4.7630,-12.9094,-11.0924,-5.6136,-10.9870,-3.1119,-8.1113,-9.4382,-6.2246,-13.3660,-0.0701,-4.9957,-10.0301,-6.8618,-7.5810,-13.6721,-11.4157
4,H,L,L,4,1.0021,-13.3581,-12.6206,-17.5484,-12.4801,-9.8792,-13.6382,-14.8609,-13.9344,-16.4080,-0.0002,-9.2727,-16.6532,-14.0476,-12.5943,-15.4559,-16.9103,-17.0809,-10.5670,-13.5334,-13.4324
...

샘플링된 시퀀스가 포함된 출력 FASTA 파일: 6y1l_imgt.fasta

T : 설계에 사용된 온도
점수: 샘플링된 지역의 잔류물의 평균 로그 확률
global_score: 모든 잔기의 평균 로그 확률(IMGT 위치 1-128)
지역: 디자인을 위해 선택한 지역
seq_recovery: # 돌연변이 / 총 서열 길이
mutations: # 원래 PDB 시퀀스의 돌연변이

 >6y1l_imgt , score=0.2934, global_score=0.2934, regions=['CDR1', 'CDR2', 'CDRH3'], model_name=AntiFold, seed=42
VQLQESGPGLVKPSETLSLTCAVSGYSISSGYYWGWIRQPPGKGLEWIGSIYHSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLTQSSHNDANWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
> T=0.20, sample=1, score=0.3930, global_score=0.1869, seq_recovery=0.8983, mutations=12
VQLQESGPGLVKPSETLSLTCAVSGASITSSYYWGWIRQPPGKGLEWIGSIYYSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLYGSPWSNPYWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
...

용법

usage:
    # Predict on example PDBs in folder
python antifold/main.py 
    --pdb_file data/antibody_antigen/3hfm.pdb 
    --heavy_chain H 
    --light_chain L 
    --antigen_chain Y # Optional

Predict inverse folding probabilities for antibody variable domain, and sample sequences with maintained fold.
PDB structures should be IMGT-numbered, paired heavy and light chain variable domains (positions 1-128).

For IMGT numbering PDBs use SAbDab or https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/anarci/

options:
  -h, --help            show this help message and exit
  --pdb_file PDB_FILE   Input PDB file (for single PDB predictions)
  --heavy_chain HEAVY_CHAIN
                        Ab heavy chain (for single PDB predictions)
  --light_chain LIGHT_CHAIN
                        Ab light chain (for single PDB predictions)
  --antigen_chain ANTIGEN_CHAIN
                        Antigen chain (optional)
  --pdbs_csv PDBS_CSV   Input CSV file with PDB names and H/L chains (multi-PDB predictions)
  --pdb_dir PDB_DIR     Directory with input PDB files (multi-PDB predictions)
  --out_dir OUT_DIR     Output directory
  --regions REGIONS     Space-separated regions to mutate. Default ' CDR1 CDR2 CDR3H '
  --num_seq_per_target NUM_SEQ_PER_TARGET
                        Number of sequences to sample from each antibody PDB (default 0)
  --sampling_temp SAMPLING_TEMP
                        A string of temperatures e.g. ' 0.20 0.25 0.50 ' (default 0.20). Sampling temperature for amino acids. Suggested values 0.10, 0.15, 0.20, 0.25, 0.30. Higher values will lead to more diversity.
  --limit_variation     Limit variation to as many mutations as expected from temperature sampling
  --extract_embeddings  Extract per-residue embeddings from AntiFold / ESM-IF1
  --custom_chain_mode   Run all specified chains (for antibody-antigen complexes or any combination of chains)
  --exclude_heavy       Exclude heavy chain from sampling
  --exclude_light       Exclude light chain from sampling
  --batch_size BATCH_SIZE
                        Batch-size to use
  --num_threads NUM_THREADS
                        Number of CPU threads to use for parallel processing (0 = all available)
  --seed SEED           Seed for reproducibility
  --model_path MODEL_PATH
                        Alternative model weights (default models/model.pt). See --use_esm_if1_weights flag to use ESM-IF1 weights instead of AntiFold
  --esm_if1_mode        Use ESM-IF1 weights instead of AntiFold
  --verbose VERBOSE     Verbose printing

IMGT 지역 사전

IMGT 번호가 매겨진 PDB에서 변경할 영역을 지정하는 데 사용됩니다.

IMGT 번호가 매겨진 PDB: https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab
ANARCI를 사용하여 기존 PDB 번호 다시 매기기: https://github.com/oxpig/ANARCI
더 읽어보세요: https://www.imgt.org/IMGTScientificChart/Numbering/IMGTIGVLsuperfamily.html

 IMGT_dict = {
    "all" : range ( 1 , 128 + 1 ),
    "allH" : range ( 1 , 128 + 1 ),
    "allL" : range ( 1 , 128 + 1 ),
    "FWH" : list ( range ( 1 , 26 + 1 )) + list ( range ( 40 , 55 + 1 )) + list ( range ( 66 , 104 + 1 )),
    "FWL" : list ( range ( 1 , 26 + 1 )) + list ( range ( 40 , 55 + 1 )) + list ( range ( 66 , 104 + 1 )),
    "CDRH" : list ( range ( 27 , 39 )) + list ( range ( 56 , 65 + 1 )) + list ( range ( 105 , 117 + 1 )),
    "CDRL" : list ( range ( 27 , 39 )) + list ( range ( 56 , 65 + 1 )) + list ( range ( 105 , 117 + 1 )),
    "FW1" : range ( 1 , 26 + 1 ),
    "FWH1" : range ( 1 , 26 + 1 ),
    "FWL1" : range ( 1 , 26 + 1 ),
    "CDR1" : range ( 27 , 39 ),
    "CDRH1" : range ( 27 , 39 ),
    "CDRL1" : range ( 27 , 39 ),
    "FW2" : range ( 40 , 55 + 1 ),
    "FWH2" : range ( 40 , 55 + 1 ),
    "FWL2" : range ( 40 , 55 + 1 ),
    "CDR2" : range ( 56 , 65 + 1 ),
    "CDRH2" : range ( 56 , 65 + 1 ),
    "CDRL2" : range ( 56 , 65 + 1 ),
    "FW3" : range ( 66 , 104 + 1 ),
    "FWH3" : range ( 66 , 104 + 1 ),
    "FWL3" : range ( 66 , 104 + 1 ),
    "CDR3" : range ( 105 , 117 + 1 ),
    "CDRH3" : range ( 105 , 117 + 1 ),
    "CDRL3" : range ( 105 , 117 + 1 ),
    "FW4" : range ( 118 , 128 + 1 ),
    "FWH4" : range ( 118 , 128 + 1 ),
    "FWL4" : range ( 118 , 128 + 1 ),
}

이 작품을 인용하면

이 패키지의 코드와 데이터는 다음 논문 AntiFold를 기반으로 합니다. 사용하는 경우 다음을 인용해 주세요.

@misc{antifold,
      title={AntiFold: Improved antibody structure-based design using inverse folding},
      author={Magnus Haraldson Høie and Alissa Hummer and Tobias H. Olsen and Broncio Aguilar-Sanjuan and Morten Nielsen and Charlotte M. Deane},
      year={2024},
      eprint={2405.03370},
      archivePrefix={arXiv},
      primaryClass={q-bio.BM}
}

확장하다