alphafold3 pytorchダウンロード - alphafold3 pytorchソースコードのダウンロード

Alphafold 3 - ピトーチ

Pytorch での Alphafold 3 の実装

ここでこの研究について他の研究者とチャットできます

Sergey による論文のレビュー

エラナ・P・サイモンによるイラストガイド

マックス・ジェイダーバーグによる講演

Lightning + Hydra を完全にサポートするフォークは、このリポジトリで Alex によって保守されています。

リポジトリで使用されている生命分子の視覚化は、ここで表示および操作できます。

感謝

Joseph は、相対位置エンコーディングとスムーズな LDDT 損失に貢献してくれました。
Felipe は、加重リジッドアライン、フレーム内座標の高速化、アライメントエラーの計算、およびセンターランダム拡張モジュールに貢献してくれました。
アレックス、転写アルゴリズムのさまざまな問題を修正してくれました
Heng 氏、論文との矛盾点を指摘し、解決策をプルリクエストしてくれました
Heng 氏、ディストグラム損失の分子原子インデックスの問題を発見してくれました
Wei Lu、いくつかの誤ったハイパーパラメータを発見してくれました
PDB データセット準備スクリプトの Alex です。
PDB データセットのクラスタリングスクリプトを最適化するための Milot!
Alex は、PDB の解析からトレーニング用の分子とアトミック入力に至るまでの巨大なフロー全体を基本的に作成してくれました。
Andrei さん、重み付けされた PDB データセットのサンプリングに取り組んでくれました。
WeightedRigidAlignに渡される座標の問題に対する小さな修正を提出してくれた Jimin
@xluo233 は信頼度の尺度、クラッシュペナルティランキング、サンプルランキングロジックに貢献してくれました。
sj900 は、 PDBDataset内でWeightedPDBSampler統合およびテストし、MSA およびテンプレート解析の初期サポートを追加します。
@xluo233 は、モデル選択スコアと未解決の rasa を計算するためのロジックを提供してくれました。
Fandi は、補足説明で解明された原子拡散モジュールのいくつかの矛盾を発見してくれました。
PDB neutral stable molecule仮説を提案してくださったパオロさん!
Alphafold3Inputsの金属イオン分子 ID 割り当てに関連するバグを修正してくださった Dhuvi さん!
Dhuvi は、mmCIF に保存するためにAlphafold3Input BioMoleculeに変換するロジックを引き受けてくれました。
このコードベースのディストグラムおよびテンプレートの単位ベクトル計算と OpenFold の計算との間の不一致を特定してくれた Tom (Discord チャンネルから) (そして、ディストグラムの問題への対処を支援してくれた Andrei)!
Kaihui さん、ポリマー残基での非標準原子の処理方法のバグを特定してくれました。
Andrei さん、gradio フロントエンドインターフェイスを引き受けてくれました。
jaxtyping の Patrick、einx の Florian、そしてもちろん einops の Alex
この作品をオープンソースにする機会を与えてくれた Soumith と Pytorch 組織

インストール

$ pip install alphafold3-pytorch

使用法

 import torch
from alphafold3_pytorch import Alphafold3
from alphafold3_pytorch . utils . model_utils import exclusive_cumsum

alphafold3 = Alphafold3 (
    dim_atom_inputs = 77 ,
    dim_template_feats = 108
)

# mock inputs

seq_len = 16

molecule_atom_indices = torch . randint ( 0 , 2 , ( 2 , seq_len )). long ()
molecule_atom_lens = torch . full (( 2 , seq_len ), 2 ). long ()

atom_seq_len = molecule_atom_lens . sum ( dim = - 1 ). amax ()
atom_offsets = exclusive_cumsum ( molecule_atom_lens )

atom_inputs = torch . randn ( 2 , atom_seq_len , 77 )
atompair_inputs = torch . randn ( 2 , atom_seq_len , atom_seq_len , 5 )

additional_molecule_feats = torch . randint ( 0 , 2 , ( 2 , seq_len , 5 ))
additional_token_feats = torch . randn ( 2 , seq_len , 33 )
is_molecule_types = torch . randint ( 0 , 2 , ( 2 , seq_len , 5 )). bool ()
is_molecule_mod = torch . randint ( 0 , 2 , ( 2 , seq_len , 4 )). bool ()
molecule_ids = torch . randint ( 0 , 32 , ( 2 , seq_len ))

template_feats = torch . randn ( 2 , 2 , seq_len , seq_len , 108 )
template_mask = torch . ones (( 2 , 2 )). bool ()

msa = torch . randn ( 2 , 7 , seq_len , 32 )
msa_mask = torch . ones (( 2 , 7 )). bool ()

additional_msa_feats = torch . randn ( 2 , 7 , seq_len , 2 )

# required for training, but omitted on inference

atom_pos = torch . randn ( 2 , atom_seq_len , 3 )

distogram_atom_indices = molecule_atom_lens - 1

distance_labels = torch . randint ( 0 , 37 , ( 2 , seq_len , seq_len ))
resolved_labels = torch . randint ( 0 , 2 , ( 2 , atom_seq_len ))

# offset indices correctly

distogram_atom_indices += atom_offsets
molecule_atom_indices += atom_offsets

# train

loss = alphafold3 (
    num_recycling_steps = 2 ,
    atom_inputs = atom_inputs ,
    atompair_inputs = atompair_inputs ,
    molecule_ids = molecule_ids ,
    molecule_atom_lens = molecule_atom_lens ,
    additional_molecule_feats = additional_molecule_feats ,
    additional_msa_feats = additional_msa_feats ,
    additional_token_feats = additional_token_feats ,
    is_molecule_types = is_molecule_types ,
    is_molecule_mod = is_molecule_mod ,
    msa = msa ,
    msa_mask = msa_mask ,
    templates = template_feats ,
    template_mask = template_mask ,
    atom_pos = atom_pos ,
    distogram_atom_indices = distogram_atom_indices ,
    molecule_atom_indices = molecule_atom_indices ,
    distance_labels = distance_labels ,
    resolved_labels = resolved_labels
)

loss . backward ()

# after much training ...

sampled_atom_pos = alphafold3 (
    num_recycling_steps = 4 ,
    num_sample_steps = 16 ,
    atom_inputs = atom_inputs ,
    atompair_inputs = atompair_inputs ,
    molecule_ids = molecule_ids ,
    molecule_atom_lens = molecule_atom_lens ,
    additional_molecule_feats = additional_molecule_feats ,
    additional_msa_feats = additional_msa_feats ,
    additional_token_feats = additional_token_feats ,
    is_molecule_types = is_molecule_types ,
    is_molecule_mod = is_molecule_mod ,
    msa = msa ,
    msa_mask = msa_mask ,
    templates = template_feats ,
    template_mask = template_mask
)

sampled_atom_pos . shape # (2, <atom_seqlen>, 3)

分子レベルの入力処理の例

 import torch
from alphafold3_pytorch import Alphafold3 , Alphafold3Input

contrived_protein = 'AG'

mock_atompos = [
    torch . randn ( 5 , 3 ),   # alanine has 5 non-hydrogen atoms
    torch . randn ( 4 , 3 )    # glycine has 4 non-hydrogen atoms
]

train_alphafold3_input = Alphafold3Input (
    proteins = [ contrived_protein ],
    atom_pos = mock_atompos
)

eval_alphafold3_input = Alphafold3Input (
    proteins = [ contrived_protein ]
)

# training

alphafold3 = Alphafold3 (
    dim_atom_inputs = 3 ,
    dim_atompair_inputs = 5 ,
    atoms_per_window = 27 ,
    dim_template_feats = 108 ,
    num_molecule_mods = 0 ,
    confidence_head_kwargs = dict (
        pairformer_depth = 1
    ),
    template_embedder_kwargs = dict (
        pairformer_stack_depth = 1
    ),
    msa_module_kwargs = dict (
        depth = 1
    ),
    pairformer_stack = dict (
        depth = 2
    ),
    diffusion_module_kwargs = dict (
        atom_encoder_depth = 1 ,
        token_transformer_depth = 1 ,
        atom_decoder_depth = 1 ,
    )
)

loss = alphafold3 . forward_with_alphafold3_inputs ([ train_alphafold3_input ])
loss . backward ()

# sampling

alphafold3 . eval ()
sampled_atom_pos = alphafold3 . forward_with_alphafold3_inputs ( eval_alphafold3_input )

assert sampled_atom_pos . shape == ( 1 , ( 5 + 4 ), 3 )

データの準備

PDB データセットのキュレーション

AlphaFold 3 PDB データセットを取得するには、まず最初にすべての最初のアセンブリ (および非対称ユニット) 複合体を Protein Data Bank (PDB) にダウンロードし、次に参照するスクリプトでそれらを前処理します。 PDB は RCSB からダウンロードできます: https://www.wwpdb.org/ftp/pdb-ftp-sites#rcsbpdb。以下の 2 つの Python スクリプト (つまり、 filter_pdb_{train,val,test}_mmcifs.pyとcluster_pdb_{train,val,test}_mmcifs.py ) は、PDB をmmCIF ファイル形式でダウンロードし、その最初のアセンブリと非対称ユニット mmCIF ファイルはdata/pdb_data/unfiltered_assembly_mmcifs/にあり、それぞれdata/pdb_data/unfiltered_asym_mmcifs/ 。

再現性を高めるため、AWS スナップショット (例: 20240101 ) を使用して PDB をダウンロードすることをお勧めします。これを行うには、AWS のドキュメントを参照して AWS CLI をローカルに設定します。あるいは、RCSB Web サイトで「プロトコルのダウンロード」に移動し、お住まいの地域に応じてダウンロード手順に従います。

たとえば、次のコマンドを使用して、PDB を mmCIF ファイルの 2 つのコレクションとしてダウンロードできます。

 # For `assembly1` complexes, use the PDB's `20240101` AWS snapshot:
aws s3 sync s3://pdbsnapshots/20240101/pub/pdb/data/assemblies/mmCIF/divided/ ./data/pdb_data/unfiltered_assembly_mmcifs
# Or as a fallback, use rsync:
rsync -rlpt -v -z --delete --port=33444 
rsync.rcsb.org::ftp_data/assemblies/mmCIF/divided/ ./data/pdb_data/unfiltered_assembly_mmcifs/

# For asymmetric unit complexes, also use the PDB's `20240101` AWS snapshot:
aws s3 sync s3://pdbsnapshots/20240101/pub/pdb/data/structures/divided/mmCIF/ ./data/pdb_data/unfiltered_asym_mmcifs
# Or as a fallback, use rsync:
rsync -rlpt -v -z --delete --port=33444 
rsync.rcsb.org::ftp_data/structures/divided/mmCIF/ ./data/pdb_data/unfiltered_asym_mmcifs/

警告: PDB のダウンロードには、最大 700GB のスペースが必要となる場合があります。

注: PDB は、利用可能なすべての AWS スナップショットをここでホストします: https://pdbsnapshots.s3.us-west-2.amazonaws.com/index.html。

ダウンロード後、次のようにフォーマットされた 2 つのディレクトリが作成されます: https://files.rcsb.org/pub/pdb/data/assemblies/mmCIF/divided/ および https://files.rcsb.org/pub/pdb/data /構造/分割/mmCIF/

00/
01/
02/
..
zz/

これらのディレクトリについては、すべてのファイルを解凍します。

find ./data/pdb_data/unfiltered_assembly_mmcifs/ -type f -name " *.gz " -exec gzip -d {} ;
find ./data/pdb_data/unfiltered_asym_mmcifs/ -type f -name " *.gz " -exec gzip -d {} ;

次にコマンドを実行します

wget -P ./data/ccd_data/ https://files.wwpdb.org/pub/pdb/data/monomers/components.cif.gz
wget -P ./data/ccd_data/ https://files.wwpdb.org/pub/pdb/data/component-models/complete/chem_comp_model.cif.gz

プロジェクトのルートディレクトリから、PDB の化学成分辞書 (CCD) とその構造モデルの最新バージョンをダウンロードします。次のコマンドを使用して、これらの各ファイルを抽出します。

find data/ccd_data/ -type f -name " *.gz " -exec gzip -d {} ;

PDB データセットのフィルタリング

次に、 pdb_assembly_dir 、 pdb_asym_dir 、 ccd_dir 、およびmmcif_output_dir 、最初のアセンブリ PDB、非対称ユニット PDB、CCD、および目的のデータセット出力ディレクトリのローカルコピーの場所 (つまり、 ./data/pdb_data/unfiltered_assembly_mmcifs/ ) に置き換えて次のコマンドを実行します。 ./data/pdb_data/unfiltered_assembly_mmcifs/ 、 ./data/pdb_data/unfiltered_asym_mmcifs/ 、 ./data/ccd_data/ / 、および./data/pdb_data/{train,val,test}_mmcifs/ )。

python scripts/filter_pdb_train_mmcifs.py --mmcif_assembly_dir < pdb_assembly_dir > --mmcif_asym_dir < pdb_asym_dir > --ccd_dir < ccd_dir > --output_dir < mmcif_output_dir >
python scripts/filter_pdb_val_mmcifs.py --mmcif_assembly_dir < pdb_assembly_dir > --mmcif_asym_dir < pdb_asym_dir > --output_dir < mmcif_output_dir >
python scripts/filter_pdb_test_mmcifs.py --mmcif_assembly_dir < pdb_assembly_dir > --mmcif_asym_dir < pdb_asym_dir > --output_dir < mmcif_output_dir >

その他のオプションについては、スクリプトを参照してください。すべての処理ステップを正常に通過した各最初のアセンブリ mmCIF は、mmCIF の 2 番目と 3 番目の PDB ID 文字 (例: 5c ) に従って名付けられたサブディレクトリ内のmmcif_output_dirに書き込まれます。

PDB データセットのクラスタリング

次に、 mmcif_dirと{train,val,test}_clustering_output_dirをそれぞれ、上記のデータセットフィルタリングスクリプトを使用して作成したローカル出力ディレクトリと、目的のクラスタリング出力ディレクトリ (つまり、 ./data/pdb_data/{train,val,test}_mmcifs/ {train) に置き換えて次のコマンドを実行します。 ./data/pdb_data/{train,val,test}_mmcifs/および./data/pdb_data/data_caches/{train,val,test}_clusterings/ ):

python scripts/cluster_pdb_train_mmcifs.py --mmcif_dir < mmcif_dir > --output_dir < train_clustering_output_dir > --clustering_filtered_pdb_dataset
python scripts/cluster_pdb_val_mmcifs.py --mmcif_dir < mmcif_dir > --reference_clustering_dir < train_clustering_output_dir > --output_dir < val_clustering_output_dir > --clustering_filtered_pdb_dataset
python scripts/cluster_pdb_test_mmcifs.py --mmcif_dir < mmcif_dir > --reference_1_clustering_dir < train_clustering_output_dir > --reference_2_clustering_dir < val_clustering_output_dir > --output_dir < test_clustering_output_dir > --clustering_filtered_pdb_dataset

注: --clustering_filtered_pdb_datasetフラグは、上記のスクリプトを使用して厳選されたフィルタリングされた PDB データセットをクラスタリングする場合に推奨されます。このフラグにより、このコンテキストでの実行時間が高速化されるためです (フィルタリングでは各チェーンの残基 ID が 1 から始まるため)。ただし、mmCIF ファイルの他の (つまり、非 PDB) データセットをクラスタリングする場合は、このフラグを指定してはなりません。そうしないと、これらのデータセットの mmCIF ファイルが各チェーンに対して厳密な 1 ベースの残基インデックスを使用しない可能性があるため、インターフェイスのクラスタリングが正しく実行されない可能性があります。

注: 代わりに、PDB 20240101 の前処理された (つまり、フィルタリングされた) mmCIF ( train / val / test ) ファイル (~25GB、148k 複合体で構成) およびチェーン/インターフェイスクラスタリング ( train / val / test ) ファイル (~3GB) をダウンロードできます20240101共有 OneDrive フォルダーを介した AWS スナップショット。これらのtar.gzアーカイブはそれぞれ、 data/pdb_data/ディレクトリ内で解凍する必要があります (例: tar -xzf data_caches.tar.gz -C data/pdb_data/ 。スクリプトscripts/distillation_data_download.sh参照として使用して、PDB 蒸留データをダウンロードして準備することもできます。ダウンロードしたら、 scripts/reduce_uniprot_predictions_to_pdb.py実行して、このデータセットをフィルタリングして、少なくとも 1 つの PDB エントリに関連付けられた例のみに絞り込むことができます。さらに、便宜上、PDB 蒸留データのトレーニング用の UniProt アクセッション ID から PDB ID へのマッピングがすでにダウンロードされ、 data/afdb_data/data_caches/uniprot_to_pdb_id_mapping.datとして抽出されています。

貢献する

プロジェクトのルートで次を実行します。

$ sh ./contribute.sh

次に、モジュールをalphafold3_pytorch/alphafold3.pyに追加し、テストをtests/test_af3.pyに追加して、プルリクエストを送信します。次のようにしてテストをローカルで実行できます

$ pytest tests/

ドッカーイメージ

付属のDockerfileには、パッケージを実行し、GPU で PyTorch を使用してトレーニング/推論するために必要な依存関係が含まれています。

デフォルトのベースイメージはpytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime 、 main GitHub ブランチからこのパッケージの最新バージョンをインストールします。

 # # Build Docker Container
docker build -t af3 .

あるいは、ビルド引数を使用して、異なるソフトウェアバージョンでイメージを再構築します。

PYTORCH_TAG : 基本イメージを変更し、異なる PyTorch、CUDA、cuDNN バージョンでビルドします。
GIT_TAG : このリポジトリのタグを変更して、パッケージを複製してインストールします。

例えば：

 # # Use build argument to change versions
docker build --build-arg " PYTORCH_TAG=2.2.1-cuda12.1-cudnn8-devel " --build-arg " GIT_TAG=0.1.15 " -t af3 .

次に、GPU を使用してコンテナーを実行し、次のコマンドを使用してローカルボリューム (トレーニング用) をマウントします。

 # # Run Container
docker run -v .:/data --gpus all -it af3

引用

 @article { Abramson2024-fj ,
  title    = " Accurate structure prediction of biomolecular interactions with
              {AlphaFold} 3 " ,
  author   = " Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans,
              Richard and Green, Tim and Pritzel, Alexander and Ronneberger,
              Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick,
              Joshua and Bodenstein, Sebastian W and Evans, David A and Hung,
              Chia-Chun and O'Neill, Michael and Reiman, David and
              Tunyasuvunakool, Kathryn and Wu, Zachary and {v Z}emgulyt{.e},
              Akvil{.e} and Arvaniti, Eirini and Beattie, Charles and
              Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and
              Congreve, Miles and Cowen-Rivers, Alexander I and Cowie, Andrew
              and Figurnov, Michael and Fuchs, Fabian B and Gladman, Hannah and
              Jain, Rishub and Khan, Yousuf A and Low, Caroline M R and Perlin,
              Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and
              Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine
              and Yakneen, Sergei and Zhong, Ellen D and Zielinski, Michal and
              {v Z}{'i}dek, Augustin and Bapst, Victor and Kohli, Pushmeet
              and Jaderberg, Max and Hassabis, Demis and Jumper, John M " ,
  journal  = " Nature " ,
  month    = " May " ,
  year     =  2024
}

 @inproceedings { Darcet2023VisionTN ,
    title   = { Vision Transformers Need Registers } ,
    author  = { Timoth'ee Darcet and Maxime Oquab and Julien Mairal and Piotr Bojanowski } ,
    year    = { 2023 } ,
    url     = { https://api.semanticscholar.org/CorpusID:263134283 }
}

 @article { Arora2024SimpleLA ,
    title   = { Simple linear attention language models balance the recall-throughput tradeoff } ,
    author  = { Simran Arora and Sabri Eyuboglu and Michael Zhang and Aman Timalsina and Silas Alberti and Dylan Zinsley and James Zou and Atri Rudra and Christopher R'e } ,
    journal = { ArXiv } ,
    year    = { 2024 } ,
    volume  = { abs/2402.18668 } ,
    url     = { https://api.semanticscholar.org/CorpusID:268063190 }
}

 @article { Puny2021FrameAF ,
    title   = { Frame Averaging for Invariant and Equivariant Network Design } ,
    author  = { Omri Puny and Matan Atzmon and Heli Ben-Hamu and Edward James Smith and Ishan Misra and Aditya Grover and Yaron Lipman } ,
    journal = { ArXiv } ,
    year    = { 2021 } ,
    volume  = { abs/2110.03336 } ,
    url     = { https://api.semanticscholar.org/CorpusID:238419638 }
}

 @article { Duval2023FAENetFA ,
    title   = { FAENet: Frame Averaging Equivariant GNN for Materials Modeling } ,
    author  = { Alexandre Duval and Victor Schmidt and Alex Hernandez Garcia and Santiago Miret and Fragkiskos D. Malliaros and Yoshua Bengio and David Rolnick } ,
    journal = { ArXiv } ,
    year    = { 2023 } ,
    volume  = { abs/2305.05577 } ,
    url     = { https://api.semanticscholar.org/CorpusID:258564608 }
}

 @article { Wang2022DeepNetST ,
    title   = { DeepNet: Scaling Transformers to 1, 000 Layers } ,
    author  = { Hongyu Wang and Shuming Ma and Li Dong and Shaohan Huang and Dongdong Zhang and Furu Wei } ,
    journal = { ArXiv } ,
    year    = { 2022 } ,
    volume  = { abs/2203.00555 } ,
    url     = { https://api.semanticscholar.org/CorpusID:247187905 }
}

 @inproceedings { Ainslie2023CoLT5FL ,
    title   = { CoLT5: Faster Long-Range Transformers with Conditional Computation } ,
    author  = { Joshua Ainslie and Tao Lei and Michiel de Jong and Santiago Ontan'on and Siddhartha Brahma and Yury Zemlyanskiy and David Uthus and Mandy Guo and James Lee-Thorp and Yi Tay and Yun-Hsuan Sung and Sumit Sanghai } ,
    year    = { 2023 }
}

 @article { Ash2019OnTD ,
    title   = { On the Difficulty of Warm-Starting Neural Network Training } ,
    author  = { Jordan T. Ash and Ryan P. Adams } ,
    journal = { ArXiv } ,
    year    = { 2019 } ,
    volume  = { abs/1910.08475 } ,
    url     = { https://api.semanticscholar.org/CorpusID:204788802 }
}

 @ARTICLE { Heinzinger2023.07.23.550085 ,
    author  = { Michael Heinzinger and Konstantin Weissenow and Joaquin Gomez Sanchez and Adrian Henkel and Martin Steinegger and Burkhard Rost } ,
    title   = { ProstT5: Bilingual Language Model for Protein Sequence and Structure } ,
    year    = { 2023 } ,
    doi     = { 10.1101/2023.07.23.550085 } ,
    journal = { bioRxiv }
}

 @article { Lin2022.07.20.500902 ,
    author  = { Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and Zhu, Zhongkai and Lu, Wenting and Santos Costa, Allan dos and Fazel-Zarandi, Maryam and Sercu, Tom and Candido, Sal and Rives, Alexander } ,
    title   = { Language models of protein sequences at the scale of evolution enable accurate structure prediction } ,
    elocation-id = { 2022.07.20.500902 } ,
    year    = { 2022 } ,
    doi     = { 10.1101/2022.07.20.500902 } ,
    publisher = { Cold Spring Harbor Laboratory } ,
    URL     = { https://www.biorxiv.org/content/early/2022/07/21/2022.07.20.500902 } ,
    eprint  = { https://www.biorxiv.org/content/early/2022/07/21/2022.07.20.500902.full.pdf } ,
    journal = { bioRxiv }
}

 @article { Li2024SwitchEA ,
    title   = { Switch EMA: A Free Lunch for Better Flatness and Sharpness } ,
    author  = { Siyuan Li and Zicheng Liu and Juanxi Tian and Ge Wang and Zedong Wang and Weiyang Jin and Di Wu and Cheng Tan and Tao Lin and Yang Liu and Baigui Sun and Stan Z. Li } ,
    journal = { ArXiv } ,
    year    = { 2024 } ,
    volume  = { abs/2402.09240 } ,
    url     = { https://api.semanticscholar.org/CorpusID:267657558 }
}

 @article { Nguyen2023MitigatingOI ,
    title   = { Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals } ,
    author  = { Tam Nguyen and Tan M. Nguyen and Richard G. Baraniuk } ,
    journal = { ArXiv } ,
    year    = { 2023 } ,
    volume  = { abs/2312.00751 } ,
    url     = { https://api.semanticscholar.org/CorpusID:264300597 }
}

 @inproceedings { Zhou2024ValueRL ,
    title   = { Value Residual Learning For Alleviating Attention Concentration In Transformers } ,
    author  = { Zhanchao Zhou and Tianyi Wu and Zhiyun Jiang and Zhenzhong Lan } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:273532030 }
}