alphafold3 pytorch ดาวน์โหลด - alphafold3 pytorch ดาวน์โหลดซอร์สโค้ด

Alphafold 3 - ไพทอร์ช

การใช้งาน Alphafold 3 ใน Pytorch

คุณสามารถพูดคุยกับนักวิจัยคนอื่นๆ เกี่ยวกับงานนี้ได้ที่นี่

ทบทวนบทความโดย Sergey

ภาพประกอบคู่มือโดยเอลานา พี. ไซมอน

พูดคุยโดยแม็กซ์ จาเดอร์เบิร์ก

Alex คอยดูแล Fork ที่รองรับ Lightning + Hydra อย่างเต็มรูปแบบที่ที่เก็บนี้

การแสดงภาพโมเลกุลของสิ่งมีชีวิตที่ใช้ในพื้นที่เก็บข้อมูลสามารถดูและโต้ตอบได้ที่นี่

ความชื่นชม

โจเซฟที่สนับสนุนการเข้ารหัสตำแหน่งแบบสัมพัทธ์และการสูญเสีย LDDT ที่ราบรื่น!
Felipe สำหรับการบริจาค Weighted Rigid Align, Express Coordinates In Frame, Compute Alignment Error และ Center Random Augmentation module!
อเล็กซ์สำหรับแก้ไขปัญหาต่างๆ ในอัลกอริธึมที่ถอดความ
เฮงชี้ความไม่สอดคล้องกับกระดาษและดึงขอแนวทางแก้ไข
เฮงสำหรับการค้นหาปัญหาเกี่ยวกับดัชนีอะตอมของโมเลกุลสำหรับการสูญเสียไดสโตแกรม
Wei Lu สำหรับการตรวจจับไฮเปอร์พารามิเตอร์ที่ผิดพลาดบางประการ
Alex สำหรับสคริปต์การเตรียมชุดข้อมูล PDB!
Milot สำหรับการปรับสคริปต์การจัดกลุ่มชุดข้อมูล PDB ให้เหมาะสม!
อเล็กซ์สำหรับการเขียนโฟลว์ขนาดมหึมาทั้งหมดตั้งแต่การแยกวิเคราะห์ PDB ไปจนถึงโมเลกุลและอินพุตอะตอมสำหรับการฝึกอบรม
Andrei สำหรับการทำงานกับการสุ่มตัวอย่างชุดข้อมูล PDB แบบถ่วงน้ำหนัก!
จีมินส่งการแก้ไขเล็กน้อยเกี่ยวกับปัญหาพิกัดที่ถูกส่งผ่านไปยัง WeightedRigidAlign
@ xluo233 ที่สนับสนุนมาตรการความมั่นใจ การจัดอันดับการลงโทษการปะทะ และตรรกะการจัดอันดับตัวอย่าง!
sj900 สำหรับการรวมและทดสอบ WeightedPDBSampler ภายใน PDBDataset และสำหรับการเพิ่มการรองรับเบื้องต้นสำหรับ MSA และการแยกวิเคราะห์เทมเพลต!
@ xluo233 อีกครั้งเพื่อสนับสนุนตรรกะในการคำนวณคะแนนการเลือกแบบจำลองรวมถึง rasa ที่ยังไม่ได้รับการแก้ไข!
ฟานดีสำหรับการค้นพบความไม่สอดคล้องกันบางประการในโมดูลการแพร่กระจายของอะตอมที่มีการอธิบายอย่างชัดเจนพร้อมกับส่วนเสริม
เปาโลเสนอสมมติฐาน PDB neutral stable molecule !
Dhuvi สำหรับการแก้ไขข้อบกพร่องที่เกี่ยวข้องกับการกำหนด ID โมเลกุลไอออนของโลหะสำหรับ Alphafold3Inputs !
Dhuvi ที่ใช้ตรรกะในการแปล Alphafold3Input เป็น BioMolecule เพื่อบันทึกเป็น mmCIF!
Tom (จากช่อง Discord) สำหรับการระบุความแตกต่างระหว่าง distogram ของ codebase และการคำนวณเวกเตอร์หน่วยเทมเพลตและของ OpenFold (และ Andrei เพื่อช่วยแก้ไขปัญหา distogram)!
Kaihui สำหรับการระบุจุดบกพร่องในการจัดการอะตอมที่ไม่ได้มาตรฐานในสารตกค้างโพลีเมอร์!
Andrei สำหรับการเข้าสู่อินเทอร์เฟซส่วนหน้า gradio!
Patrick สำหรับ jaxtyping, Florian สำหรับ einx และแน่นอน Alex สำหรับ einops
Soumith และองค์กร Pytorch ที่ให้โอกาสฉันได้เปิดงานนี้

ติดตั้ง

$ pip install alphafold3-pytorch

การใช้งาน

 import torch
from alphafold3_pytorch import Alphafold3
from alphafold3_pytorch . utils . model_utils import exclusive_cumsum

alphafold3 = Alphafold3 (
    dim_atom_inputs = 77 ,
    dim_template_feats = 108
)

# mock inputs

seq_len = 16

molecule_atom_indices = torch . randint ( 0 , 2 , ( 2 , seq_len )). long ()
molecule_atom_lens = torch . full (( 2 , seq_len ), 2 ). long ()

atom_seq_len = molecule_atom_lens . sum ( dim = - 1 ). amax ()
atom_offsets = exclusive_cumsum ( molecule_atom_lens )

atom_inputs = torch . randn ( 2 , atom_seq_len , 77 )
atompair_inputs = torch . randn ( 2 , atom_seq_len , atom_seq_len , 5 )

additional_molecule_feats = torch . randint ( 0 , 2 , ( 2 , seq_len , 5 ))
additional_token_feats = torch . randn ( 2 , seq_len , 33 )
is_molecule_types = torch . randint ( 0 , 2 , ( 2 , seq_len , 5 )). bool ()
is_molecule_mod = torch . randint ( 0 , 2 , ( 2 , seq_len , 4 )). bool ()
molecule_ids = torch . randint ( 0 , 32 , ( 2 , seq_len ))

template_feats = torch . randn ( 2 , 2 , seq_len , seq_len , 108 )
template_mask = torch . ones (( 2 , 2 )). bool ()

msa = torch . randn ( 2 , 7 , seq_len , 32 )
msa_mask = torch . ones (( 2 , 7 )). bool ()

additional_msa_feats = torch . randn ( 2 , 7 , seq_len , 2 )

# required for training, but omitted on inference

atom_pos = torch . randn ( 2 , atom_seq_len , 3 )

distogram_atom_indices = molecule_atom_lens - 1

distance_labels = torch . randint ( 0 , 37 , ( 2 , seq_len , seq_len ))
resolved_labels = torch . randint ( 0 , 2 , ( 2 , atom_seq_len ))

# offset indices correctly

distogram_atom_indices += atom_offsets
molecule_atom_indices += atom_offsets

# train

loss = alphafold3 (
    num_recycling_steps = 2 ,
    atom_inputs = atom_inputs ,
    atompair_inputs = atompair_inputs ,
    molecule_ids = molecule_ids ,
    molecule_atom_lens = molecule_atom_lens ,
    additional_molecule_feats = additional_molecule_feats ,
    additional_msa_feats = additional_msa_feats ,
    additional_token_feats = additional_token_feats ,
    is_molecule_types = is_molecule_types ,
    is_molecule_mod = is_molecule_mod ,
    msa = msa ,
    msa_mask = msa_mask ,
    templates = template_feats ,
    template_mask = template_mask ,
    atom_pos = atom_pos ,
    distogram_atom_indices = distogram_atom_indices ,
    molecule_atom_indices = molecule_atom_indices ,
    distance_labels = distance_labels ,
    resolved_labels = resolved_labels
)

loss . backward ()

# after much training ...

sampled_atom_pos = alphafold3 (
    num_recycling_steps = 4 ,
    num_sample_steps = 16 ,
    atom_inputs = atom_inputs ,
    atompair_inputs = atompair_inputs ,
    molecule_ids = molecule_ids ,
    molecule_atom_lens = molecule_atom_lens ,
    additional_molecule_feats = additional_molecule_feats ,
    additional_msa_feats = additional_msa_feats ,
    additional_token_feats = additional_token_feats ,
    is_molecule_types = is_molecule_types ,
    is_molecule_mod = is_molecule_mod ,
    msa = msa ,
    msa_mask = msa_mask ,
    templates = template_feats ,
    template_mask = template_mask
)

sampled_atom_pos . shape # (2, <atom_seqlen>, 3)

ตัวอย่างการจัดการอินพุตระดับโมเลกุล

 import torch
from alphafold3_pytorch import Alphafold3 , Alphafold3Input

contrived_protein = 'AG'

mock_atompos = [
    torch . randn ( 5 , 3 ),   # alanine has 5 non-hydrogen atoms
    torch . randn ( 4 , 3 )    # glycine has 4 non-hydrogen atoms
]

train_alphafold3_input = Alphafold3Input (
    proteins = [ contrived_protein ],
    atom_pos = mock_atompos
)

eval_alphafold3_input = Alphafold3Input (
    proteins = [ contrived_protein ]
)

# training

alphafold3 = Alphafold3 (
    dim_atom_inputs = 3 ,
    dim_atompair_inputs = 5 ,
    atoms_per_window = 27 ,
    dim_template_feats = 108 ,
    num_molecule_mods = 0 ,
    confidence_head_kwargs = dict (
        pairformer_depth = 1
    ),
    template_embedder_kwargs = dict (
        pairformer_stack_depth = 1
    ),
    msa_module_kwargs = dict (
        depth = 1
    ),
    pairformer_stack = dict (
        depth = 2
    ),
    diffusion_module_kwargs = dict (
        atom_encoder_depth = 1 ,
        token_transformer_depth = 1 ,
        atom_decoder_depth = 1 ,
    )
)

loss = alphafold3 . forward_with_alphafold3_inputs ([ train_alphafold3_input ])
loss . backward ()

# sampling

alphafold3 . eval ()
sampled_atom_pos = alphafold3 . forward_with_alphafold3_inputs ( eval_alphafold3_input )

assert sampled_atom_pos . shape == ( 1 , ( 5 + 4 ), 3 )

การเตรียมข้อมูล

การดูแลชุดข้อมูล PDB

หากต้องการรับชุดข้อมูล AlphaFold 3 PDB ขั้นแรกให้ดาวน์โหลดคอมเพล็กซ์ชุดแรก (และหน่วยอสมมาตร) ทั้งหมดใน Protein Data Bank (PDB) จากนั้นประมวลผลล่วงหน้าด้วยสคริปต์ที่อ้างอิงด้านล่าง สามารถดาวน์โหลด PDB ได้จาก RCSB: https://www.wwpdb.org/ftp/pdb-ftp-sites#rcsbpdb สคริปต์ Python สองตัวด้านล่าง (เช่น filter_pdb_{train,val,test}_mmcifs.py และ cluster_pdb_{train,val,test}_mmcifs.py ) ถือว่าคุณได้ดาวน์โหลด PDB ใน รูปแบบไฟล์ mmCIF โดยวางแอสเซมบลีแรกและ ไฟล์ mmCIF ของหน่วยอสมมาตรที่ data/pdb_data/unfiltered_assembly_mmcifs/ และ data/pdb_data/unfiltered_asym_mmcifs/ ตามลำดับ

เพื่อความสามารถในการทำซ้ำ เราขอแนะนำให้ดาวน์โหลด PDB โดยใช้สแน็ปช็อต AWS (เช่น 20240101 ) หากต้องการดำเนินการดังกล่าว โปรดดูเอกสารประกอบของ AWS เพื่อตั้งค่า AWS CLI ภายในเครื่อง หรือบนเว็บไซต์ RCSB ให้นำทางลงไปที่ "ดาวน์โหลดโปรโตคอล" และปฏิบัติตามคำแนะนำในการดาวน์โหลดโดยขึ้นอยู่กับตำแหน่งของคุณ

ตัวอย่างเช่น คุณสามารถใช้คำสั่งต่อไปนี้เพื่อดาวน์โหลด PDB เป็นไฟล์ mmCIF สองคอลเลกชั่น:

 # For `assembly1` complexes, use the PDB's `20240101` AWS snapshot:
aws s3 sync s3://pdbsnapshots/20240101/pub/pdb/data/assemblies/mmCIF/divided/ ./data/pdb_data/unfiltered_assembly_mmcifs
# Or as a fallback, use rsync:
rsync -rlpt -v -z --delete --port=33444 
rsync.rcsb.org::ftp_data/assemblies/mmCIF/divided/ ./data/pdb_data/unfiltered_assembly_mmcifs/

# For asymmetric unit complexes, also use the PDB's `20240101` AWS snapshot:
aws s3 sync s3://pdbsnapshots/20240101/pub/pdb/data/structures/divided/mmCIF/ ./data/pdb_data/unfiltered_asym_mmcifs
# Or as a fallback, use rsync:
rsync -rlpt -v -z --delete --port=33444 
rsync.rcsb.org::ftp_data/structures/divided/mmCIF/ ./data/pdb_data/unfiltered_asym_mmcifs/

คำเตือน: การดาวน์โหลด PDB อาจใช้พื้นที่ถึง 700GB

หมายเหตุ: PDB โฮสต์สแน็ปช็อต AWS ที่มีอยู่ทั้งหมดที่นี่: https://pdbsnapshots.s3.us-west-2.amazonaws.com/index.html

หลังจากดาวน์โหลด คุณควรมีไดเร็กทอรีสองไดเร็กทอรีที่มีรูปแบบดังนี้: https://files.rcsb.org/pub/pdb/data/assemblies/mmCIF/divided/ & https://files.rcsb.org/pub/pdb/data /โครงสร้าง/แบ่ง/mmCIF/

00/
01/
02/
..
zz/

สำหรับไดเร็กทอรีเหล่านี้ ให้แตกไฟล์ทั้งหมด:

find ./data/pdb_data/unfiltered_assembly_mmcifs/ -type f -name " *.gz " -exec gzip -d {} ;
find ./data/pdb_data/unfiltered_asym_mmcifs/ -type f -name " *.gz " -exec gzip -d {} ;

ต่อไปให้รันคำสั่ง

wget -P ./data/ccd_data/ https://files.wwpdb.org/pub/pdb/data/monomers/components.cif.gz
wget -P ./data/ccd_data/ https://files.wwpdb.org/pub/pdb/data/component-models/complete/chem_comp_model.cif.gz

จากไดเร็กทอรีรากของโครงการเพื่อดาวน์โหลดเวอร์ชันล่าสุดของ Chemical Component Dictionary (CCD) ของ PDB และแบบจำลองโครงสร้าง แตกไฟล์แต่ละไฟล์เหล่านี้โดยใช้คำสั่งต่อไปนี้:

find data/ccd_data/ -type f -name " *.gz " -exec gzip -d {} ;

การกรองชุดข้อมูล PDB

จากนั้นรันสิ่งต่อไปนี้ด้วย pdb_assembly_dir , pdb_asym_dir , ccd_dir และ mmcif_output_dir แทนที่ด้วยตำแหน่งของสำเนาในเครื่องของคุณของ PDB แอสเซมบลีชุดแรก, หน่วยอสมมาตร PDB, CCD และไดเร็กทอรีเอาต์พุตชุดข้อมูลที่คุณต้องการ (เช่น ./data/pdb_data/unfiltered_assembly_mmcifs/ , ./data/pdb_data/unfiltered_asym_mmcifs/ , ./data/ccd_data/ และ . ./data/pdb_data/{train,val,test}_mmcifs/ train,val,test__mmcifs/ )

python scripts/filter_pdb_train_mmcifs.py --mmcif_assembly_dir < pdb_assembly_dir > --mmcif_asym_dir < pdb_asym_dir > --ccd_dir < ccd_dir > --output_dir < mmcif_output_dir >
python scripts/filter_pdb_val_mmcifs.py --mmcif_assembly_dir < pdb_assembly_dir > --mmcif_asym_dir < pdb_asym_dir > --output_dir < mmcif_output_dir >
python scripts/filter_pdb_test_mmcifs.py --mmcif_assembly_dir < pdb_assembly_dir > --mmcif_asym_dir < pdb_asym_dir > --output_dir < mmcif_output_dir >

ดูสคริปต์สำหรับตัวเลือกเพิ่มเติม mmCIF แอสเซมบลีแรกแต่ละรายการที่ผ่านขั้นตอนการประมวลผลทั้งหมดสำเร็จจะถูกเขียนไปที่ mmcif_output_dir ภายในไดเร็กทอรีย่อยที่ตั้งชื่อตามอักขระ PDB ID ตัวที่สองและสามของ mmCIF (เช่น 5c )

การจัดกลุ่มชุดข้อมูล PDB

จากนั้น รันคำสั่งต่อไปนี้โดยแทนที่ mmcif_dir และ {train,val,test}_clustering_output_dir ตามลำดับ ด้วยไดเร็กทอรีเอาต์พุตในเครื่องของคุณที่สร้างขึ้นโดยใช้สคริปต์การกรองชุดข้อมูลด้านบน และด้วยไดเร็กทอรีเอาต์พุตการทำคลัสเตอร์ที่คุณต้องการ (เช่น ./data/pdb_data/{train,val,test}_mmcifs/ และ ./data/pdb_data/data_caches/{train,val,test}_clusterings/ {train,val,test__clusterings/ ):

python scripts/cluster_pdb_train_mmcifs.py --mmcif_dir < mmcif_dir > --output_dir < train_clustering_output_dir > --clustering_filtered_pdb_dataset
python scripts/cluster_pdb_val_mmcifs.py --mmcif_dir < mmcif_dir > --reference_clustering_dir < train_clustering_output_dir > --output_dir < val_clustering_output_dir > --clustering_filtered_pdb_dataset
python scripts/cluster_pdb_test_mmcifs.py --mmcif_dir < mmcif_dir > --reference_1_clustering_dir < train_clustering_output_dir > --reference_2_clustering_dir < val_clustering_output_dir > --output_dir < test_clustering_output_dir > --clustering_filtered_pdb_dataset

หมายเหตุ : แนะนำให้ใช้แฟล็ก --clustering_filtered_pdb_dataset เมื่อทำคลัสเตอร์ชุดข้อมูล PDB ที่กรองแล้วตามที่ดูแลจัดการโดยใช้สคริปต์ด้านบน เนื่องจากแฟล็กนี้จะทำให้รันไทม์เร็วขึ้นในบริบทนี้ (เนื่องจากการกรองปล่อยให้ ID ตกค้างของแต่ละเชนเป็นแบบ 1) อย่างไรก็ตาม จะต้อง ไม่ ระบุแฟล็กนี้เมื่อทำคลัสเตอร์ชุดข้อมูลอื่น (เช่น ไม่ใช่ PDB) ของไฟล์ mmCIF มิฉะนั้น การจัดกลุ่มอินเทอร์เฟซอาจดำเนินการไม่ถูกต้อง เนื่องจากไฟล์ mmCIF ของชุดข้อมูลเหล่านี้อาจไม่ได้ใช้การจัดทำดัชนีส่วนที่เหลือแบบ 1 แบบเข้มงวดสำหรับแต่ละห่วงโซ่

หมายเหตุ : คุณสามารถดาวน์โหลดไฟล์ mmCIF ( train / val / test ) ที่ประมวลผลล่วงหน้า (เช่น กรองแล้ว) แทนได้ (~25GB ประกอบด้วยคอมเพล็กซ์ 148k) และไฟล์การทำคลัสเตอร์ลูกโซ่/อินเทอร์เฟซ ( train / val / test ) (~3GB) สำหรับ 20240101 ของ PDB สแน็ปช็อต AWS ผ่านโฟลเดอร์ OneDrive ที่แชร์ ไฟล์เก็บถาวร tar.gz แต่ละไฟล์ควรได้รับการแตกไฟล์ภายในไดเร็กทอรี data/pdb_data/ เช่น ผ่าน tar -xzf data_caches.tar.gz -C data/pdb_data/ เราสามารถดาวน์โหลดและเตรียมข้อมูลการกลั่น PDB โดยใช้สคริปต์ scripts/distillation_data_download.sh เป็นข้อมูลอ้างอิงได้ เมื่อดาวน์โหลดแล้ว คุณสามารถเรียกใช้ scripts/reduce_uniprot_predictions_to_pdb.py เพื่อกรองชุดข้อมูลนี้ให้เหลือเฉพาะตัวอย่างที่เกี่ยวข้องกับรายการ PDB อย่างน้อยหนึ่งรายการ นอกจากนี้ เพื่อความสะดวก การจับคู่ระหว่าง ID ภาคยานุวัติ UniProt กับ PDB ID สำหรับการฝึกอบรมเกี่ยวกับข้อมูลการกลั่น PDB ได้ถูกดาวน์โหลดและแยกเป็น data/afdb_data/data_caches/uniprot_to_pdb_id_mapping.dat แล้ว

มีส่วนร่วม

ที่รูทโปรเจ็กต์ ให้รัน

$ sh ./contribute.sh

จากนั้น เพิ่มโมดูลของคุณไปที่ alphafold3_pytorch/alphafold3.py เพิ่มการทดสอบของคุณไปที่ tests/test_af3.py และส่งคำขอดึง คุณสามารถรันการทดสอบในเครื่องด้วย

$ pytest tests/

รูปภาพนักเทียบท่า

Dockerfile ที่รวมมานั้นประกอบด้วยการขึ้นต่อกันที่จำเป็นในการรันแพ็คเกจและการฝึก/การอนุมานโดยใช้ PyTorch กับ GPU

อิมเมจพื้นฐานเริ่มต้นคือ pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime และติดตั้งเวอร์ชันล่าสุดของแพ็คเกจนี้จากสาขา GitHub main

 # # Build Docker Container
docker build -t af3 .

หรือใช้อาร์กิวเมนต์ build เพื่อสร้างอิมเมจใหม่ด้วยซอฟต์แวร์เวอร์ชันต่างๆ:

PYTORCH_TAG : เปลี่ยนอิมเมจพื้นฐานและสร้างด้วยเวอร์ชัน PyTorch, CUDA และ/หรือ cuDNN ที่แตกต่างกัน
GIT_TAG : เปลี่ยนแท็กของ repo นี้เป็นโคลนและติดตั้งแพ็คเกจ

ตัวอย่างเช่น:

 # # Use build argument to change versions
docker build --build-arg " PYTORCH_TAG=2.2.1-cuda12.1-cudnn8-devel " --build-arg " GIT_TAG=0.1.15 " -t af3 .

จากนั้น รันคอนเทนเนอร์ด้วย GPU และติดตั้งวอลุ่มในเครื่อง (สำหรับการฝึก) โดยใช้คำสั่งต่อไปนี้:

 # # Run Container
docker run -v .:/data --gpus all -it af3

การอ้างอิง

 @article { Abramson2024-fj ,
  title    = " Accurate structure prediction of biomolecular interactions with
              {AlphaFold} 3 " ,
  author   = " Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans,
              Richard and Green, Tim and Pritzel, Alexander and Ronneberger,
              Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick,
              Joshua and Bodenstein, Sebastian W and Evans, David A and Hung,
              Chia-Chun and O'Neill, Michael and Reiman, David and
              Tunyasuvunakool, Kathryn and Wu, Zachary and {v Z}emgulyt{.e},
              Akvil{.e} and Arvaniti, Eirini and Beattie, Charles and
              Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and
              Congreve, Miles and Cowen-Rivers, Alexander I and Cowie, Andrew
              and Figurnov, Michael and Fuchs, Fabian B and Gladman, Hannah and
              Jain, Rishub and Khan, Yousuf A and Low, Caroline M R and Perlin,
              Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and
              Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine
              and Yakneen, Sergei and Zhong, Ellen D and Zielinski, Michal and
              {v Z}{'i}dek, Augustin and Bapst, Victor and Kohli, Pushmeet
              and Jaderberg, Max and Hassabis, Demis and Jumper, John M " ,
  journal  = " Nature " ,
  month    = " May " ,
  year     =  2024
}

 @inproceedings { Darcet2023VisionTN ,
    title   = { Vision Transformers Need Registers } ,
    author  = { Timoth'ee Darcet and Maxime Oquab and Julien Mairal and Piotr Bojanowski } ,
    year    = { 2023 } ,
    url     = { https://api.semanticscholar.org/CorpusID:263134283 }
}

 @article { Arora2024SimpleLA ,
    title   = { Simple linear attention language models balance the recall-throughput tradeoff } ,
    author  = { Simran Arora and Sabri Eyuboglu and Michael Zhang and Aman Timalsina and Silas Alberti and Dylan Zinsley and James Zou and Atri Rudra and Christopher R'e } ,
    journal = { ArXiv } ,
    year    = { 2024 } ,
    volume  = { abs/2402.18668 } ,
    url     = { https://api.semanticscholar.org/CorpusID:268063190 }
}

 @article { Puny2021FrameAF ,
    title   = { Frame Averaging for Invariant and Equivariant Network Design } ,
    author  = { Omri Puny and Matan Atzmon and Heli Ben-Hamu and Edward James Smith and Ishan Misra and Aditya Grover and Yaron Lipman } ,
    journal = { ArXiv } ,
    year    = { 2021 } ,
    volume  = { abs/2110.03336 } ,
    url     = { https://api.semanticscholar.org/CorpusID:238419638 }
}

 @article { Duval2023FAENetFA ,
    title   = { FAENet: Frame Averaging Equivariant GNN for Materials Modeling } ,
    author  = { Alexandre Duval and Victor Schmidt and Alex Hernandez Garcia and Santiago Miret and Fragkiskos D. Malliaros and Yoshua Bengio and David Rolnick } ,
    journal = { ArXiv } ,
    year    = { 2023 } ,
    volume  = { abs/2305.05577 } ,
    url     = { https://api.semanticscholar.org/CorpusID:258564608 }
}

 @article { Wang2022DeepNetST ,
    title   = { DeepNet: Scaling Transformers to 1, 000 Layers } ,
    author  = { Hongyu Wang and Shuming Ma and Li Dong and Shaohan Huang and Dongdong Zhang and Furu Wei } ,
    journal = { ArXiv } ,
    year    = { 2022 } ,
    volume  = { abs/2203.00555 } ,
    url     = { https://api.semanticscholar.org/CorpusID:247187905 }
}

 @inproceedings { Ainslie2023CoLT5FL ,
    title   = { CoLT5: Faster Long-Range Transformers with Conditional Computation } ,
    author  = { Joshua Ainslie and Tao Lei and Michiel de Jong and Santiago Ontan'on and Siddhartha Brahma and Yury Zemlyanskiy and David Uthus and Mandy Guo and James Lee-Thorp and Yi Tay and Yun-Hsuan Sung and Sumit Sanghai } ,
    year    = { 2023 }
}

 @article { Ash2019OnTD ,
    title   = { On the Difficulty of Warm-Starting Neural Network Training } ,
    author  = { Jordan T. Ash and Ryan P. Adams } ,
    journal = { ArXiv } ,
    year    = { 2019 } ,
    volume  = { abs/1910.08475 } ,
    url     = { https://api.semanticscholar.org/CorpusID:204788802 }
}

 @ARTICLE { Heinzinger2023.07.23.550085 ,
    author  = { Michael Heinzinger and Konstantin Weissenow and Joaquin Gomez Sanchez and Adrian Henkel and Martin Steinegger and Burkhard Rost } ,
    title   = { ProstT5: Bilingual Language Model for Protein Sequence and Structure } ,
    year    = { 2023 } ,
    doi     = { 10.1101/2023.07.23.550085 } ,
    journal = { bioRxiv }
}

 @article { Lin2022.07.20.500902 ,
    author  = { Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and Zhu, Zhongkai and Lu, Wenting and Santos Costa, Allan dos and Fazel-Zarandi, Maryam and Sercu, Tom and Candido, Sal and Rives, Alexander } ,
    title   = { Language models of protein sequences at the scale of evolution enable accurate structure prediction } ,
    elocation-id = { 2022.07.20.500902 } ,
    year    = { 2022 } ,
    doi     = { 10.1101/2022.07.20.500902 } ,
    publisher = { Cold Spring Harbor Laboratory } ,
    URL     = { https://www.biorxiv.org/content/early/2022/07/21/2022.07.20.500902 } ,
    eprint  = { https://www.biorxiv.org/content/early/2022/07/21/2022.07.20.500902.full.pdf } ,
    journal = { bioRxiv }
}

 @article { Li2024SwitchEA ,
    title   = { Switch EMA: A Free Lunch for Better Flatness and Sharpness } ,
    author  = { Siyuan Li and Zicheng Liu and Juanxi Tian and Ge Wang and Zedong Wang and Weiyang Jin and Di Wu and Cheng Tan and Tao Lin and Yang Liu and Baigui Sun and Stan Z. Li } ,
    journal = { ArXiv } ,
    year    = { 2024 } ,
    volume  = { abs/2402.09240 } ,
    url     = { https://api.semanticscholar.org/CorpusID:267657558 }
}

 @article { Nguyen2023MitigatingOI ,
    title   = { Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals } ,
    author  = { Tam Nguyen and Tan M. Nguyen and Richard G. Baraniuk } ,
    journal = { ArXiv } ,
    year    = { 2023 } ,
    volume  = { abs/2312.00751 } ,
    url     = { https://api.semanticscholar.org/CorpusID:264300597 }
}

 @inproceedings { Zhou2024ValueRL ,
    title   = { Value Residual Learning For Alleviating Attention Concentration In Transformers } ,
    author  = { Zhanchao Zhou and Tianyi Wu and Zhiyun Jiang and Zhenzhong Lan } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:273532030 }
}