Téléchargement AntiFold - Téléchargement du code source AntiFold

Anti-pli

AntiFold prédit les séquences qui s’inscrivent dans les structures de domaines variables des anticorps. L'outil génère les log-vraisemblances des résidus au format CSV et peut échantillonner directement des séquences au format FASTA. Les séquences échantillonnées montrent un accord structurel élevé avec les structures expérimentales.

AntiFold est basé sur le modèle ESM-IF1 et est affiné sur les structures d'anticorps résolues et prédites par SAbDab et OAS.

Papier : pré-impression arXiv
Serveur Web : serveur Web OPIG
Colab :
Modèle : model.pt
Licence : BSD 3 clauses

Serveur Web

Pour essayer AntiFold sans l'installer, veuillez consulter notre serveur Web OPIG : https://opig.stats.ox.ac.uk/webapps/antifold/

Installer et exécuter AntiFold

Téléchargez et installez à partir de la source Github (recommandé – dernière version)

conda create --name antifold python=3.10 -y && conda activate antifold
conda install -c conda-forge pytorch
git clone https://github.com/oxpig/AntiFold && cd AntiFold
pip install .

GPU uniquement : installation à l'aide d'environment.yml

conda create env -f environment.yml
python -m pip install .

En fonction de votre version de CUDA, vous devrez peut-être modifier la dépendance pytorch-cuda=12.1 dans le fichier environnement.yml. Des instructions détaillées sur la façon d'installer correctement pytorch pour votre système peuvent être trouvées ici

Exécuter AntiFold (probabilités de repliement inverse, séquences d'échantillons)

 # Run AntiFold on single PDB/CIF file
# Nb: Assumes first chain heavy, second chain light
python antifold/main.py 
    --pdb_file data/pdbs/6y1l_imgt.pdb

# Antibody-antigen complex
python antifold/main.py 
    --pdb_file data/antibody_antigen/3hfm.pdb 
    --heavy_chain H 
    --light_chain L 
    --antigen_chain Y

# Nanobody or single-chain
python antifold/main.py 
    --pdb_file data/nanobody/8oi2_imgt.pdb 
    --nanobody_chain B

# Folder of PDB/CIFs
# Nb: Assumes first chain heavy, second light
python antifold/main.py 
    --pdb_dir data/pdbs

# Specify chains to run in a CSV file (e.g. antibody-antigen complex)
python antifold/main.py 
    --pdb_dir data/antibody_antigen 
    --pdbs_csv data/antibody_antigen.csv

# Sample sequences 10x (paired VH/VL only)
python antifold/main.py 
    --pdb_file data/pdbs/6y1l_imgt.pdb 
    --heavy_chain H 
    --light_chain L 
    --num_seq_per_target 10 
    --sampling_temp " 0.2 " 
    --regions " CDR1 CDR2 CDR3 "

# Run all chains with ESM-IF1 model weights
python antifold/main.py 
    --pdb_dir data/pdbs 
    --esm_if1_mode

Carnet Jupyter

Carnet : notebook.ipynb

Colab :

 import antifold
import antifold . main

# Load model
model = antifold . main . load_model ()

# PDB directory
pdb_dir = "data/pdbs"

# Assumes first chain heavy, second chain light
pdbs_csv = antifold . main . generate_pdbs_csv ( pdb_dir , max_chains = 2 )

# Sample from PDBs
df_logits_list = antifold . main . get_pdbs_logits (
    model = model ,
    pdbs_csv_or_dataframe = pdbs_csv ,
    pdb_dir = pdb_dir ,
)

# Output log probabilites
df_logits_list [ 0 ]

Paramètres d'entrée

Paramètres requis :

 Input PDBs should be antibody variable domain structures (IMGT positions 1-128).

If no chains are specified, the first two chains will be assumed to be heavy light.
If custom_chain_mode is set, all (10) chains will be run.

- Option 1: PDB file (--pdb_file). We recommend specifying heavy and light chain (--heavy_chain and --light_chain)
- Option 2: PDB folder (--pdb_dir) + CSV file specifying chains (--pdbs_csv)
- Option 3: PDB folder, infer 1st chain heavy, 2nd chain light

Paramètres de génération de nouvelles séquences :

 PDBs should be IMGT annotated for the sequence sampling regions to be valid.

- Number of sequences to generate (--num_seq_per_target)
- Region to mutate (--region) based on inverse folding probabilities. Select from list in IMGT_dict (e.g. 'CDRH1 CDRH2 CDRH3')
- Sampling temperature (--sampling_temp) controls generated sequence diversity, by scaling the inverse folding probabilities before sampling. Temperature = 1 means no change, while temperature ~ 0 only samples the most likely amino-acid at each position (acts as argmax).

Paramètres facultatifs :

 - Multi-chain mode for including antigen or other chains (--custom_chain_mode)
- Extract latent representations of PDB within model (--extract_embeddings)
- Use ESM-IF1 instead of AntiFold model weights (--esm_if1_mode), enables custom_chain_mode

Exemple de sortie

Pour un exemple de sortie du serveur Web, voir : https://opig.stats.ox.ac.uk/webapps/antifold/results/example_job/

Sortie CSV avec probabilités de résidus : Probabilités de résidus : 6y1l_imgt.csv

pdb_pos - Numéro de résidu PDB
pdb_chain - Chaîne PDB
aa_orig - résidu PDB (par exemple 112)
aa_pred - Résidu supérieur prédit par AntiFold (c'est-à-dire argmax) pour cette position
pdb_posins - Numéro de résidu PDB avec code d'insertion (par exemple 112A)
perplexité - Tolérance de pliage inverse (plus haut est plus tolérant) aux mutations. Voir l'article pour plus de détails.
Acides aminés - Scores de repliement inverse (log-vraisemblance) pour la position donnée

 pdb_pos,pdb_chain,aa_orig,aa_pred,pdb_posins,perplexity,A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
2,H,V,M,2,1.6488,-4.9963,-6.6117,-6.3181,-6.3243,-6.7570,-4.2518,-6.7514,-5.2540,-6.8067,-5.8619,-0.0904,-6.5493,-4.8639,-6.6316,-6.3084,-5.1900,-5.0988,-3.7295,-8.0480,-7.3236
3,H,Q,Q,3,1.3889,-10.5258,-12.8463,-8.4800,-4.7630,-12.9094,-11.0924,-5.6136,-10.9870,-3.1119,-8.1113,-9.4382,-6.2246,-13.3660,-0.0701,-4.9957,-10.0301,-6.8618,-7.5810,-13.6721,-11.4157
4,H,L,L,4,1.0021,-13.3581,-12.6206,-17.5484,-12.4801,-9.8792,-13.6382,-14.8609,-13.9344,-16.4080,-0.0002,-9.2727,-16.6532,-14.0476,-12.5943,-15.4559,-16.9103,-17.0809,-10.5670,-13.5334,-13.4324
...

Fichier FASTA de sortie avec des séquences échantillonnées : 6y1l_imgt.fasta

T : Température utilisée pour la conception
score : log-cotes moyennes des résidus dans la région échantillonnée
global_score : log-cotes moyennes de tous les résidus (positions IMGT 1-128)
régions : régions sélectionnées pour la conception
seq_recovery : # mutations / longueur totale de la séquence
mutations : # mutations de la séquence PDB originale

 >6y1l_imgt , score=0.2934, global_score=0.2934, regions=['CDR1', 'CDR2', 'CDRH3'], model_name=AntiFold, seed=42
VQLQESGPGLVKPSETLSLTCAVSGYSISSGYYWGWIRQPPGKGLEWIGSIYHSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLTQSSHNDANWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
> T=0.20, sample=1, score=0.3930, global_score=0.1869, seq_recovery=0.8983, mutations=12
VQLQESGPGLVKPSETLSLTCAVSGASITSSYYWGWIRQPPGKGLEWIGSIYYSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLYGSPWSNPYWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
...

Usage

usage:
    # Predict on example PDBs in folder
python antifold/main.py 
    --pdb_file data/antibody_antigen/3hfm.pdb 
    --heavy_chain H 
    --light_chain L 
    --antigen_chain Y # Optional

Predict inverse folding probabilities for antibody variable domain, and sample sequences with maintained fold.
PDB structures should be IMGT-numbered, paired heavy and light chain variable domains (positions 1-128).

For IMGT numbering PDBs use SAbDab or https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/anarci/

options:
  -h, --help            show this help message and exit
  --pdb_file PDB_FILE   Input PDB file (for single PDB predictions)
  --heavy_chain HEAVY_CHAIN
                        Ab heavy chain (for single PDB predictions)
  --light_chain LIGHT_CHAIN
                        Ab light chain (for single PDB predictions)
  --antigen_chain ANTIGEN_CHAIN
                        Antigen chain (optional)
  --pdbs_csv PDBS_CSV   Input CSV file with PDB names and H/L chains (multi-PDB predictions)
  --pdb_dir PDB_DIR     Directory with input PDB files (multi-PDB predictions)
  --out_dir OUT_DIR     Output directory
  --regions REGIONS     Space-separated regions to mutate. Default ' CDR1 CDR2 CDR3H '
  --num_seq_per_target NUM_SEQ_PER_TARGET
                        Number of sequences to sample from each antibody PDB (default 0)
  --sampling_temp SAMPLING_TEMP
                        A string of temperatures e.g. ' 0.20 0.25 0.50 ' (default 0.20). Sampling temperature for amino acids. Suggested values 0.10, 0.15, 0.20, 0.25, 0.30. Higher values will lead to more diversity.
  --limit_variation     Limit variation to as many mutations as expected from temperature sampling
  --extract_embeddings  Extract per-residue embeddings from AntiFold / ESM-IF1
  --custom_chain_mode   Run all specified chains (for antibody-antigen complexes or any combination of chains)
  --exclude_heavy       Exclude heavy chain from sampling
  --exclude_light       Exclude light chain from sampling
  --batch_size BATCH_SIZE
                        Batch-size to use
  --num_threads NUM_THREADS
                        Number of CPU threads to use for parallel processing (0 = all available)
  --seed SEED           Seed for reproducibility
  --model_path MODEL_PATH
                        Alternative model weights (default models/model.pt). See --use_esm_if1_weights flag to use ESM-IF1 weights instead of AntiFold
  --esm_if1_mode        Use ESM-IF1 weights instead of AntiFold
  --verbose VERBOSE     Verbose printing

Dicton des régions IMGT

Utilisé pour spécifier les régions à muter dans une PDB numérotée IMGT

PDB numérotés IMGT : https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab
Renuméroter les PDB existantes avec ANARCI : https://github.com/oxpig/ANARCI
Lire la suite : https://www.imgt.org/IMGTScientificChart/Numbering/IMGTIGVLsuperfamily.html

 IMGT_dict = {
    "all" : range ( 1 , 128 + 1 ),
    "allH" : range ( 1 , 128 + 1 ),
    "allL" : range ( 1 , 128 + 1 ),
    "FWH" : list ( range ( 1 , 26 + 1 )) + list ( range ( 40 , 55 + 1 )) + list ( range ( 66 , 104 + 1 )),
    "FWL" : list ( range ( 1 , 26 + 1 )) + list ( range ( 40 , 55 + 1 )) + list ( range ( 66 , 104 + 1 )),
    "CDRH" : list ( range ( 27 , 39 )) + list ( range ( 56 , 65 + 1 )) + list ( range ( 105 , 117 + 1 )),
    "CDRL" : list ( range ( 27 , 39 )) + list ( range ( 56 , 65 + 1 )) + list ( range ( 105 , 117 + 1 )),
    "FW1" : range ( 1 , 26 + 1 ),
    "FWH1" : range ( 1 , 26 + 1 ),
    "FWL1" : range ( 1 , 26 + 1 ),
    "CDR1" : range ( 27 , 39 ),
    "CDRH1" : range ( 27 , 39 ),
    "CDRL1" : range ( 27 , 39 ),
    "FW2" : range ( 40 , 55 + 1 ),
    "FWH2" : range ( 40 , 55 + 1 ),
    "FWL2" : range ( 40 , 55 + 1 ),
    "CDR2" : range ( 56 , 65 + 1 ),
    "CDRH2" : range ( 56 , 65 + 1 ),
    "CDRL2" : range ( 56 , 65 + 1 ),
    "FW3" : range ( 66 , 104 + 1 ),
    "FWH3" : range ( 66 , 104 + 1 ),
    "FWL3" : range ( 66 , 104 + 1 ),
    "CDR3" : range ( 105 , 117 + 1 ),
    "CDRH3" : range ( 105 , 117 + 1 ),
    "CDRL3" : range ( 105 , 117 + 1 ),
    "FW4" : range ( 118 , 128 + 1 ),
    "FWH4" : range ( 118 , 128 + 1 ),
    "FWL4" : range ( 118 , 128 + 1 ),
}

Citant cet ouvrage

Le code et les données de ce package sont basés sur l'article suivant AntiFold. Si vous l'utilisez, merci de citer :

@misc{antifold,
      title={AntiFold: Improved antibody structure-based design using inverse folding},
      author={Magnus Haraldson Høie and Alissa Hummer and Tobias H. Olsen and Broncio Aguilar-Sanjuan and Morten Nielsen and Charlotte M. Deane},
      year={2024},
      eprint={2405.03370},
      archivePrefix={arXiv},
      primaryClass={q-bio.BM}
}

Développer