AntiFold는 항체 가변 도메인 구조에 맞는 서열을 예측합니다. 이 도구는 잔류 로그 우도를 CSV 형식으로 출력하고 시퀀스를 FASTA 형식으로 직접 샘플링할 수 있습니다. 샘플링된 시퀀스는 실험 구조와 높은 구조적 일치를 보여줍니다.
AntiFold는 ESM-IF1 모델을 기반으로 하며 SAbDab 및 OAS의 해결 및 예측 항체 구조를 미세 조정합니다.
AntiFold를 설치하지 않고 사용해 보려면 OPIG 웹서버(https://opig.stats.ox.ac.uk/webapps/antifold/)를 참조하세요.
conda create --name antifold python=3.10 -y && conda activate antifold
conda install -c conda-forge pytorch
git clone https://github.com/oxpig/AntiFold && cd AntiFold
pip install .
GPU 전용: Environment.yml을 사용하여 설치
conda create env -f environment.yml
python -m pip install .
CUDA 버전에 따라 Environment.yml 파일에서 종속성 pytorch-cuda=12.1
변경해야 할 수도 있습니다. 시스템에 pytorch를 올바르게 설치하는 방법에 대한 자세한 지침은 여기에서 확인할 수 있습니다.
# Run AntiFold on single PDB/CIF file
# Nb: Assumes first chain heavy, second chain light
python antifold/main.py
--pdb_file data/pdbs/6y1l_imgt.pdb
# Antibody-antigen complex
python antifold/main.py
--pdb_file data/antibody_antigen/3hfm.pdb
--heavy_chain H
--light_chain L
--antigen_chain Y
# Nanobody or single-chain
python antifold/main.py
--pdb_file data/nanobody/8oi2_imgt.pdb
--nanobody_chain B
# Folder of PDB/CIFs
# Nb: Assumes first chain heavy, second light
python antifold/main.py
--pdb_dir data/pdbs
# Specify chains to run in a CSV file (e.g. antibody-antigen complex)
python antifold/main.py
--pdb_dir data/antibody_antigen
--pdbs_csv data/antibody_antigen.csv
# Sample sequences 10x (paired VH/VL only)
python antifold/main.py
--pdb_file data/pdbs/6y1l_imgt.pdb
--heavy_chain H
--light_chain L
--num_seq_per_target 10
--sampling_temp " 0.2 "
--regions " CDR1 CDR2 CDR3 "
# Run all chains with ESM-IF1 model weights
python antifold/main.py
--pdb_dir data/pdbs
--esm_if1_mode
노트북: 노트북.ipynb
공동 연구실:
import antifold
import antifold . main
# Load model
model = antifold . main . load_model ()
# PDB directory
pdb_dir = "data/pdbs"
# Assumes first chain heavy, second chain light
pdbs_csv = antifold . main . generate_pdbs_csv ( pdb_dir , max_chains = 2 )
# Sample from PDBs
df_logits_list = antifold . main . get_pdbs_logits (
model = model ,
pdbs_csv_or_dataframe = pdbs_csv ,
pdb_dir = pdb_dir ,
)
# Output log probabilites
df_logits_list [ 0 ]
필수 매개변수:
Input PDBs should be antibody variable domain structures (IMGT positions 1-128).
If no chains are specified, the first two chains will be assumed to be heavy light.
If custom_chain_mode is set, all (10) chains will be run.
- Option 1: PDB file (--pdb_file). We recommend specifying heavy and light chain (--heavy_chain and --light_chain)
- Option 2: PDB folder (--pdb_dir) + CSV file specifying chains (--pdbs_csv)
- Option 3: PDB folder, infer 1st chain heavy, 2nd chain light
새 시퀀스 생성을 위한 매개변수:
PDBs should be IMGT annotated for the sequence sampling regions to be valid.
- Number of sequences to generate (--num_seq_per_target)
- Region to mutate (--region) based on inverse folding probabilities. Select from list in IMGT_dict (e.g. 'CDRH1 CDRH2 CDRH3')
- Sampling temperature (--sampling_temp) controls generated sequence diversity, by scaling the inverse folding probabilities before sampling. Temperature = 1 means no change, while temperature ~ 0 only samples the most likely amino-acid at each position (acts as argmax).
선택적 매개변수:
- Multi-chain mode for including antigen or other chains (--custom_chain_mode)
- Extract latent representations of PDB within model (--extract_embeddings)
- Use ESM-IF1 instead of AntiFold model weights (--esm_if1_mode), enables custom_chain_mode
웹 서버 출력의 예는 https://opig.stats.ox.ac.uk/webapps/antifold/results/example_job/을 참조하세요.
잔여 로그 확률이 포함된 CSV 출력: 잔여 확률: 6y1l_imgt.csv
pdb_pos,pdb_chain,aa_orig,aa_pred,pdb_posins,perplexity,A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
2,H,V,M,2,1.6488,-4.9963,-6.6117,-6.3181,-6.3243,-6.7570,-4.2518,-6.7514,-5.2540,-6.8067,-5.8619,-0.0904,-6.5493,-4.8639,-6.6316,-6.3084,-5.1900,-5.0988,-3.7295,-8.0480,-7.3236
3,H,Q,Q,3,1.3889,-10.5258,-12.8463,-8.4800,-4.7630,-12.9094,-11.0924,-5.6136,-10.9870,-3.1119,-8.1113,-9.4382,-6.2246,-13.3660,-0.0701,-4.9957,-10.0301,-6.8618,-7.5810,-13.6721,-11.4157
4,H,L,L,4,1.0021,-13.3581,-12.6206,-17.5484,-12.4801,-9.8792,-13.6382,-14.8609,-13.9344,-16.4080,-0.0002,-9.2727,-16.6532,-14.0476,-12.5943,-15.4559,-16.9103,-17.0809,-10.5670,-13.5334,-13.4324
...
샘플링된 시퀀스가 포함된 출력 FASTA 파일: 6y1l_imgt.fasta
>6y1l_imgt , score=0.2934, global_score=0.2934, regions=['CDR1', 'CDR2', 'CDRH3'], model_name=AntiFold, seed=42
VQLQESGPGLVKPSETLSLTCAVSGYSISSGYYWGWIRQPPGKGLEWIGSIYHSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLTQSSHNDANWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
> T=0.20, sample=1, score=0.3930, global_score=0.1869, seq_recovery=0.8983, mutations=12
VQLQESGPGLVKPSETLSLTCAVSGASITSSYYWGWIRQPPGKGLEWIGSIYYSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLYGSPWSNPYWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
...
usage:
# Predict on example PDBs in folder
python antifold/main.py
--pdb_file data/antibody_antigen/3hfm.pdb
--heavy_chain H
--light_chain L
--antigen_chain Y # Optional
Predict inverse folding probabilities for antibody variable domain, and sample sequences with maintained fold.
PDB structures should be IMGT-numbered, paired heavy and light chain variable domains (positions 1-128).
For IMGT numbering PDBs use SAbDab or https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/anarci/
options:
-h, --help show this help message and exit
--pdb_file PDB_FILE Input PDB file (for single PDB predictions)
--heavy_chain HEAVY_CHAIN
Ab heavy chain (for single PDB predictions)
--light_chain LIGHT_CHAIN
Ab light chain (for single PDB predictions)
--antigen_chain ANTIGEN_CHAIN
Antigen chain (optional)
--pdbs_csv PDBS_CSV Input CSV file with PDB names and H/L chains (multi-PDB predictions)
--pdb_dir PDB_DIR Directory with input PDB files (multi-PDB predictions)
--out_dir OUT_DIR Output directory
--regions REGIONS Space-separated regions to mutate. Default ' CDR1 CDR2 CDR3H '
--num_seq_per_target NUM_SEQ_PER_TARGET
Number of sequences to sample from each antibody PDB (default 0)
--sampling_temp SAMPLING_TEMP
A string of temperatures e.g. ' 0.20 0.25 0.50 ' (default 0.20). Sampling temperature for amino acids. Suggested values 0.10, 0.15, 0.20, 0.25, 0.30. Higher values will lead to more diversity.
--limit_variation Limit variation to as many mutations as expected from temperature sampling
--extract_embeddings Extract per-residue embeddings from AntiFold / ESM-IF1
--custom_chain_mode Run all specified chains (for antibody-antigen complexes or any combination of chains)
--exclude_heavy Exclude heavy chain from sampling
--exclude_light Exclude light chain from sampling
--batch_size BATCH_SIZE
Batch-size to use
--num_threads NUM_THREADS
Number of CPU threads to use for parallel processing (0 = all available)
--seed SEED Seed for reproducibility
--model_path MODEL_PATH
Alternative model weights (default models/model.pt). See --use_esm_if1_weights flag to use ESM-IF1 weights instead of AntiFold
--esm_if1_mode Use ESM-IF1 weights instead of AntiFold
--verbose VERBOSE Verbose printing
IMGT 번호가 매겨진 PDB에서 변경할 영역을 지정하는 데 사용됩니다.
IMGT_dict = {
"all" : range ( 1 , 128 + 1 ),
"allH" : range ( 1 , 128 + 1 ),
"allL" : range ( 1 , 128 + 1 ),
"FWH" : list ( range ( 1 , 26 + 1 )) + list ( range ( 40 , 55 + 1 )) + list ( range ( 66 , 104 + 1 )),
"FWL" : list ( range ( 1 , 26 + 1 )) + list ( range ( 40 , 55 + 1 )) + list ( range ( 66 , 104 + 1 )),
"CDRH" : list ( range ( 27 , 39 )) + list ( range ( 56 , 65 + 1 )) + list ( range ( 105 , 117 + 1 )),
"CDRL" : list ( range ( 27 , 39 )) + list ( range ( 56 , 65 + 1 )) + list ( range ( 105 , 117 + 1 )),
"FW1" : range ( 1 , 26 + 1 ),
"FWH1" : range ( 1 , 26 + 1 ),
"FWL1" : range ( 1 , 26 + 1 ),
"CDR1" : range ( 27 , 39 ),
"CDRH1" : range ( 27 , 39 ),
"CDRL1" : range ( 27 , 39 ),
"FW2" : range ( 40 , 55 + 1 ),
"FWH2" : range ( 40 , 55 + 1 ),
"FWL2" : range ( 40 , 55 + 1 ),
"CDR2" : range ( 56 , 65 + 1 ),
"CDRH2" : range ( 56 , 65 + 1 ),
"CDRL2" : range ( 56 , 65 + 1 ),
"FW3" : range ( 66 , 104 + 1 ),
"FWH3" : range ( 66 , 104 + 1 ),
"FWL3" : range ( 66 , 104 + 1 ),
"CDR3" : range ( 105 , 117 + 1 ),
"CDRH3" : range ( 105 , 117 + 1 ),
"CDRL3" : range ( 105 , 117 + 1 ),
"FW4" : range ( 118 , 128 + 1 ),
"FWH4" : range ( 118 , 128 + 1 ),
"FWL4" : range ( 118 , 128 + 1 ),
}
이 패키지의 코드와 데이터는 다음 논문 AntiFold를 기반으로 합니다. 사용하는 경우 다음을 인용해 주세요.
@misc{antifold,
title={AntiFold: Improved antibody structure-based design using inverse folding},
author={Magnus Haraldson Høie and Alissa Hummer and Tobias H. Olsen and Broncio Aguilar-Sanjuan and Morten Nielsen and Charlotte M. Deane},
year={2024},
eprint={2405.03370},
archivePrefix={arXiv},
primaryClass={q-bio.BM}
}