DiffAbXL下载 - DiffAbXL源代码下载

DiffAbXL

其他源码

1.0.0

下载

差异抗体XL：

作者：Talip Ucar ([email protected])

DiffAbXL 的实现在论文中进行了基准测试：Exploring Log-Likelihood Scores for Ranking Antibody Sequence Designs。

请注意，该论文最初的标题是“抗体设计的基准生成模型”，但我们决定对其进行更改以更好地突出其核心贡献。
这是原始作品 DiffAb 的重新实现：[论文和代码]

当前排行榜

秩	型号	HER2 脱落细胞		自然		AZ目标2	大街？
秩	型号	零射击	SPR控制	赫尔	HER2	AZ目标2	大街？
1	DiffAbXL-A-DN	0.43	0.22	0.62	0.37	0.41	0.41
2	DiffAbXL-A-SG	0.46	0.22	0.64	-0.38	0.43	0.274
3	DiffAbXL-H3-DN	0.49	0	0.52	-0.08	0.37	0.26
4	IgBlend（仅结构体）	0.40	0.21	0.54	-0.30	0.31	0.232
5	反褶皱	0.43	0.22	0.4	-0.47	0.38	0.192
6	DiffAbXL-H3-SG	0.48	0	0.4	-0.41	0.29	0.152
7	环境管理署	0.29	0	0	0.18	0.27	0.148
8	差异抗体	0.34	0.21	0	-0.14	0.22	0.126
9	抗体语言2	0.3	0	0	-0.07	0.36	0.118
10	IgBlend（仅限序列）	0.27	0	0	-0.1	0.36	0.106
11	阿布朗	0.3	0	0	-0.13	0.35	0.104
12	迪均值	0.37	0.15	0	0	0	0.104
13	抗体X	0.28	0.19	0	0	0	0.094
14	抗伯特	0.26	0	0	-0.17	0.35	0.088
15	意思是	0.36	0	0	0.02	0	0.076
16	ESM-IF	0	-0.27	0	-0.53	0.42	-0.076

注 1：平均。 ?指五个数据集的平均 Spearman 相关性。上面的排行榜基于五个目标数据集，将零分分配给未表现出统计显着相关性或不适合分数计算（例如，需要抗原）的模型。
注 2：这项工作中的对数似然分数是使用简单的方法计算的，如论文中公式 11 中所述，以保持模型之间的一致性。然而，值得注意的是，存在用于计算这些分数的更有原则的方法，这些方法可能会根据模型类型（例如，自回归与掩码语言模型）而有所不同。我们计划在未来的工作中研究这些替代方法。

基准测试结果

1- DiffAbXL 的对数似然与不同目标的结合亲和力之间的相关性

结果-1

图 1：DiffAbXL 结果： a) Absci 零样本 HER2 数据的 DiffAbXL-H3-DN b) AZ Target-2 的 DiffAbXL-A-SG， c) Nature HEL 的 DiffAbXL-A-SG， d) DiffAbXL- Nature HER2 的 A-DN。

2- 比较基于扩散、基于 LLM 和基于图的模型

结果-2

表 1： Spearman 相关性结果摘要。缩写：DN：De Novo 模式，SG：结构指导模式，NA：需要表位或复杂结构，但不可用。 *、**、*** 分别表示 p 值低于 0.05、0.01 和 1e-4。

如何构建基准测试模型的界面

为了使我们更容易对您的模型进行基准测试，我们建议您在类中将接口实现为 Python 方法，以便我们可以轻松地与我们的评估管道集成。该方法应接受以下输入：

抗体序列：抗体序列列表。
可选结构信息：如果适用，与序列相关的结构数据（即PDB文件）。
其他特定于模型的参数：模型需要的任何其他输入。

该方法应返回一个包含以下内容的字典：

对数似然评分：用于根据预测的结合亲和力对抗体序列进行排序。
其他相关指标：例如 RMSD、pAE 或您认为相关的任何特定于模型的输出。

以下是用于实现此接口的 Python 基本模板：

    def benchmark ( self , sequences , structure = None , mask = None , ** kwargs ):
        """
        Benchmark the model on provided antibody sequences and structures.

        Parameters:
        sequences (list of str): List of antibody sequences.
        structure (optional): Path to a PDB file. Currently, only one PDB file is provided per target dataset.
                              The PDB file may contain either just the antibody or an antibody-antigen complex,
                              depending on the dataset.
        mask (optional): Binary list or array indicating the regions of interest in the sequences for metric calculations.
        kwargs (optional): Additional parameters required by the model.

        Returns:
        dict: A dictionary containing log-likelihood scores and other relevant metrics.
        """
        pass

请确保您的模型以我们可以直接用于基准抗体序列设计的格式输出对数似然分数。这将帮助我们有效地比较您的模型在我们的数据集中的性能。

训练

有一个配置文件：sabdab.yaml，可用于更改任何参数。您可以使用以下方法训练模型：

 python train.py # For training.

回购协议的结构

- 火车.py

- 源代码
    |-model.py
    
- 配置
    |-sabdab.yaml
    
- 实用程序
    |-load_data.py
    |-arguments.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- 数据
    |-her2
    ...

实验跟踪

权重和偏差可用于跟踪实验。默认情况下它是关闭的，但可以通过更改./config/sabdab.yaml中配置文件中的选项来打开

引用论文

 @article {Ucar2024.10.07.617023,
	author = {Ucar, Talip and Malherbe, Cedric and Gonzalez Hernandez, Ferran},
	title = {Exploring Log-Likelihood Scores for Ranking Antibody Sequence Designs},
	elocation-id = {2024.10.07.617023},
	year = {2024},
	doi = {10.1101/2024.10.07.617023},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/10/24/2024.10.07.617023},
	eprint = {https://www.biorxiv.org/content/early/2024/10/24/2024.10.07.617023.full.pdf},
	journal = {bioRxiv}
}

引用这个仓库

如果您在自己的研究和工作中使用 DiffAbXL，请使用以下内容引用它：

 @Misc{talip_ucar_2024_DiffAbXL,
	author =   {Talip Ucar},
	title = {Exploring Log-Likelihood Scores for Ranking Antibody Sequence Designs},
	URL = {https://github.com/AstraZeneca/DiffAbXL},
	month = {October},
	year = {since 2024}
}

展开

附加信息