BindCraft下載 - BindCraft原始碼下載

BindCraft

其他源碼

v1.2.0

下載

綁定工藝

替代文字

使用 AlphaFold2 反向傳播、MPNN 和 PyRosetta 的簡單活頁夾設計流程。選擇您的目標，讓腳本完成其餘的工作，並在您有足夠的設計可供訂購後完成！

BindCraft 的預印本鏈接

安裝

首先，您需要克隆此存儲庫。將[install_folder]替換為您要安裝的路徑。

git clone https://github.com/martinpacesa/BindCraft [install_folder]

使用cd導航到安裝資料夾並運行安裝程式碼。 BindCraft 需要相容 CUDA 的 Nvidia 顯示卡才能運作。在cuda設定中，請指定與您的顯示卡相容的CUDA版本，例如「11.8」。如果不確定，請留空，但安裝可能會選擇錯誤的版本，從而導致錯誤。在pkg_manager中指定您使用的是“mamba”還是“conda”，如果留空它將預設使用“conda”。

注意：此安裝腳本將安裝 PyRosetta，它需要商業用途的許可證。

bash install_bindcraft.sh --cuda '12.4' --pkg_manager 'conda'

Google合作實驗室

我們準備了一個方便的 google colab 筆記本來測試 bindcraft 程式碼功能。但是，由於管道需要大量 GPU 記憶體才能運行較大的目標+綁定器複合體，因此我們強烈建議使用本地安裝和至少 32 GB 的 GPU 記憶體來運行它。

始終嘗試將輸入目標 PDB 修剪到盡可能小的大小！它將顯著加快綁定器產生速度並最大限度地減少 GPU 記憶體需求。

準備好運行至少幾百個軌跡來查看一些可接受的綁定器，對於困難的目標，甚至可能是數千個。

在本地運行腳本並解釋設置

要在本機運行腳本，首先需要在settings_target資料夾中配置目標 .json 檔案。 json 檔案中有以下設定：

 design_path         -> path where to save designs and statistics
binder_name         -> what to prefix your designed binder files with
starting_pdb        -> the path to the PDB of your target protein
chains                -> which chains to target in your protein, rest will be ignored
target_hotspot_residues   -> which position to target for binder design, for example `1,2-10` or chain specific `A1-10,B1-20` or entire chains `A`, set to null if you want AF2 to select binding site; better to select multiple target residues or a small patch to reduce search space for binder
lengths           -> range of binder lengths to design
number_of_final_designs   -> how many designs that pass all filters to aim for, script will stop if this many are reached

然後運行活頁夾設計腳本：

sbatch ./bindcraft.slurm --settings './settings_target/PDL1.json' --filters './settings_filters/default_filters.json' --advanced './settings_advanced/default_4stage_multimer.json'

設定標誌應指向您在上面設定的目標 .json。過濾器標誌指向指定設計過濾器的 json（預設為 ./filters/default_filters.json）。進階標誌指向您的進階設定（預設為 ./advanced_settings/default_4stage_multimer.json）。如果您省略過濾器和進階設定標誌，它將自動指向預設值。

或者，如果您的機器不支援SLURM，您可以透過啟動conda中的環境並執行python程式碼來直接執行程式碼：

 conda activate BindCraft
cd /path/to/bindcraft/folder/
python -u ./bindcraft.py --settings './settings_target/PDL1.json' --filters './settings_filters/default_filters.json' --advanced './settings_advanced/default_4stage_multimer.json'

我們建議產生至少 100 個通過所有過濾器的最終設計，然後訂購前 5-20 個進行實驗表徵。如果需要高親和力結合物，最好進行更多篩選，因為用於排名的 ipTM 指標不是親和力的良好預測因子，但已被證明是結合的良好二元預測因子。

以下是各個過濾器和進階設定的說明。

進階設定

以下是控制設計過程的進階設定：

 omit_AAs                        -> which amino acids to exclude from design (note: they can still occur if no other options are possible in the position)
force_reject_AA                 -> whether to force reject design if it contains any amino acids specified in omit_AAs
design_algorithm                -> which design algorithm for the trajecory to use, the currently implemented algorithms are below
use_multimer_design             -> whether to use AF2-ptm or AF2-multimer for binder design; the other model will be used for validation then
num_recycles_design             -> how many recycles of AF2 for design
num_recycles_validation         -> how many recycles of AF2 use for structure prediction and validation
sample_models = True            -> whether to randomly sample parameters from AF2 models, recommended to avoid overfitting
rm_template_seq_design          -> remove target template sequence for design (increases target flexibility)
rm_template_seq_predict         -> remove target template sequence for reprediction (increases target flexibility)
rm_template_sc_design           -> remove sidechains from target template for design
rm_template_sc_predict          -> remove sidechains from target template for reprediction

# Design iterations
soft_iterations                 -> number of soft iterations (all amino acids considered at all positions)
temporary_iterations            -> number of temporary iterations (softmax, most probable amino acids considered at all positions)
hard_iterations                 -> number of hard iterations (one hot encoding, single amino acids considered at all positions)
greedy_iterations               -> number of iterations to sample random mutations from PSSM that reduce loss
greedy_percentage               -> What percentage of protein length to mutate during each greedy iteration

# Design weights, higher value puts more weight on optimising the parameter.
weights_plddt                   -> Design weight - pLDDT of designed chain
weights_pae_intra               -> Design weight - PAE within designed chain
weights_pae_inter               -> Design weight - PAE between chains
weights_con_intra               -> Design weight - maximise number of contacts within designed chain
weights_con_inter               -> Design weight - maximise number of contacts between chains
intra_contact_distance          -> Cbeta-Cbeta cutoff distance for contacts within the binder
inter_contact_distance          -> Cbeta-Cbeta cutoff distance for contacts between binder and target
intra_contact_number            -> how many contacts each contact esidue should make within a chain, excluding immediate neighbours
inter_contact_number            -> how many contacts each contact residue should make between chains
weights_helicity                -> Design weight - helix propensity of the design, Default 0, negative values bias towards beta sheets
random_helicity                 -> whether to randomly sample helicity weights for trajectories, from -1 to 1

# Additional losses
use_i_ptm_loss                  -> Use i_ptm loss to optimise for interface pTM score?
weights_iptm                    -> Design weight - i_ptm between chains
use_rg_loss                     -> use radius of gyration loss?
weights_rg                      -> Design weight - radius of gyration weight for binder
use_termini_distance_loss       -> Try to minimise distance between N- and C-terminus of binder? Helpful for grafting
weights_termini_loss            -> Design weight - N- and C-terminus distance minimisation weight of binder

# MPNN settings
mpnn_fix_interface              -> whether to fix the interface designed in the starting trajectory
num_seqs                        -> number of MPNN generated sequences to sample and predict per binder
max_mpnn_sequences              -> how many maximum MPNN sequences per trajectory to save if several pass filters
max_tm-score_filter             -> filter out final lower ranking designs by this TM score cut off relative to all passing designs
max_seq-similarity_filter       -> filter out final lower ranking designs by this sequence similarity cut off relative to all passing designs
sampling_temp = 0.1             -> sampling temperature for amino acids, T=0.0 means taking argmax, T>>1.0 means sampling randomly.")

# MPNN settings - advanced
sample_seq_parallel             -> how many sequences to sample in parallel, reduce if running out of memory
backbone_noise                  -> backbone noise during sampling, 0.00-0.02 are good values
model_path                      -> path to the MPNN model weights
mpnn_weights                    -> whether to use "original" mpnn weights or "soluble" weights
save_mpnn_fasta                 -> whether to save MPNN sequences as fasta files, normally not needed as the sequence is also in the CSV file

# AF2 design settings - advanced
num_recycles_design             -> how many recycles of AF2 for design
num_recycles_validation         -> how many recycles of AF2 use for structure prediction and validation
optimise_beta                   -> optimise predictions if beta sheeted trajectory detected?
optimise_beta_extra_soft        -> how many extra soft iterations to add if beta sheets detected
optimise_beta_extra_temp        -> how many extra temporary iterations to add if beta sheets detected
optimise_beta_recycles_design   -> how many recycles to do during design if beta sheets detected
optimise_beta_recycles_valid    -> how many recycles to do during reprediction if beta sheets detected

# Optimise script
remove_unrelaxed_trajectory     -> remove the PDB files of unrelaxed designed trajectories, relaxed PDBs are retained
remove_unrelaxed_complex        -> remove the PDB files of unrelaxed predicted MPNN-optimised complexes, relaxed PDBs are retained
remove_binder_monomer           -> remove the PDB files of predicted binder monomers after scoring to save space
zip_animations                  -> at the end, zip Animations trajectory folder to save space
zip_plots                       -> at the end, zip Plots trajectory folder to save space
save_trajectory_pickle          -> save pickle file of the generated trajectory, careful, takes up a lot of storage space!
max_trajectories                -> how many maximum trajectories to generate, for benchmarking
acceptance_rate                 -> what fraction of trajectories should yield designs passing the filters, if the proportion of successful designs is less than this fraction then the script will stop and you should adjust your design weights
start_monitoring                -> after what number of trajectories should we start monitoring acceptance_rate, do not set too low, could terminate prematurely

# debug settings
enable_mpnn = True              -> whether to enable MPNN design
enable_rejection_check          -> enable rejection rate check

過濾器

以下是您的設計將被過濾的功能，如果您不想使用某些功能，只需將閾值設為null即可。較高選項指示是否應保留高於閾值的值（true）或低於閾值（false）。以 N_ 開頭的特徵對應於每個 AlphaFold 模型的統計數據，平均值是所有預測模型的平均值。

 MPNN_score            -> MPNN sequence score, generally not recommended as it depends on protein
MPNN_seq_recovery       -> MPNN sequence recovery of original trajectory
pLDDT             -> pLDDT confidence score of AF2 complex prediction, normalised to 0-1
pTM               -> pTM confidence score of AF2 complex prediction, normalised to 0-1
i_pTM             -> interface pTM confidence score of AF2 complex prediction, normalised to 0-1
pAE               -> predicted alignment error of AF2 complex prediction, normalised compared AF2 by n/31 to 0-1
i_pAE             -> predicted interface alignment error of AF2 complex prediction,  normalised compared AF2 by n/31 to 0-1
i_pLDDT             -> interface pLDDT confidence score of AF2 complex prediction, normalised to 0-1
ss_pLDDT            -> secondary structure pLDDT confidence score of AF2 complex prediction, normalised to 0-1
Unrelaxed_Clashes       -> number of interface clashes before relaxation
Relaxed_Clashes         -> number of interface clashes after relaxation
Binder_Energy_Score       -> Rosetta energy score for binder alone
Surface_Hydrophobicity      -> surface hydrophobicity fraction for binder
ShapeComplementarity      -> interface shape complementarity
PackStat            -> interface packstat rosetta score
dG                -> interface rosetta dG energy
dSASA             -> interface delta SASA (size)
dG/dSASA            -> interface energy divided by interface size
Interface_SASA_%        -> Fraction of binder surface covered by the interface
Interface_Hydrophobicity        -> Interface hydrophobicity fraction of binder interface
n_InterfaceResidues       -> number of interface residues
n_InterfaceHbonds       -> number of hydrogen bonds at the interface
InterfaceHbondsPercentage   -> number of hydrogen bonds compared to interface size
n_InterfaceUnsatHbonds      -> number of unsatisfied buried hydrogen bonds at the interface
InterfaceUnsatHbondsPercentage  -> number of unsatisfied buried hydrogen bonds compared to interface size
Interface_Helix%        -> proportion of alfa helices at the interface
Interface_BetaSheet%      -> proportion of beta sheets at the interface
Interface_Loop%         -> proportion of loops at the interface
Binder_Helix%         -> proportion of alfa helices in the binder structure
Binder_BetaSheet%       -> proportion of beta sheets in the binder structure
Binder_Loop%          -> proportion of loops in the binder structure
InterfaceAAs          -> number of amino acids of each type at the interface
HotspotRMSD           -> unaligned RMSD of binder compared to original trajectory, in other words how far is binder in the repredicted complex from the original binding site
Target_RMSD           -> RMSD of target predicted in context of the designed binder compared to input PDB
Binder_pLDDT          -> pLDDT confidence score of binder predicted alone
Binder_pTM            -> pTM confidence score of binder predicted alone
Binder_pAE            -> predicted alignment error of binder predicted alone
Binder_RMSD           -> RMSD of binder predicted alone compared to original trajectory

實現的設計演算法

2stage - 使用 logits->pssm_semigreedy 進行設計（更快）
3stage - 使用 logits->softmax(logits)->one-hot (標準) 進行設計
4stage - 使用 logits->softmax(logits)->one-hot->pssm_semigreedy 進行設計（默認，廣泛）
貪婪 - 設計隨機突變以減少損失（記憶體強度較小，速度較慢，效率較低）
mcmc - 具有減少損失的隨機突變的設計，類似於 Wicky 等人。（記憶體消耗較少，速度較慢，效率較低）

已知的限制

設定可能不適用於所有目標！可能需要調整迭代次數、設計權重和/或濾波器。目標位點選擇也很重要，但如果沒有指定熱點，AF2 非常擅長偵測良好的結合位點。
AF2 在預測/設計親水界面方面比在疏水界面方面表現更差。
有時，軌跡最終可能會變形或“壓扁”。這對於 AF2 多聚體設計來說是正常的，因為它對序列輸入非常敏感，如果不重新訓練模型，這是無法避免的。然而，這些軌跡很快就會被偵測到並被丟棄。