flash attention jax Télécharger - flash attention jax Téléchargement du code source

flash attention jax

Code Source AI

0.3.1

Télécharger

Attention Flash - Jax

Implémentation de Flash Attention dans Jax. Il ne sera probablement pas aussi performant qu'avec la version officielle de CUDA, étant donné le manque de capacité de gestion fine de la mémoire. Mais juste à des fins éducatives et pour voir à quel point le compilateur XLA est intelligent (ou non).

Installer

$ pip install flash-attention-jax

Usage

 from jax import random
from flash_attention_jax import flash_attention

rng_key = random . PRNGKey ( 42 )

q = random . normal ( rng_key , ( 1 , 2 , 131072 , 512 ))  # (batch, heads, seq, dim)
k = random . normal ( rng_key , ( 1 , 2 , 131072 , 512 ))
v = random . normal ( rng_key , ( 1 , 2 , 131072 , 512 ))
mask = random . randint ( rng_key , ( 1 , 131072 ,), 0 , 2 ) # (batch, seq)

out , _ = flash_attention ( q , k , v , mask )

out . shape  # (1, 2, 131072, 512) - (batch, heads, seq, dim)

Vérification rapide de l'intégrité

 from flash_attention_jax import plain_attention , flash_attention , value_and_grad_difference

diff , ( dq_diff , dk_diff , dv_diff ) = value_and_grad_difference (
    plain_attention ,
    flash_attention ,
    seed = 42
)

print ( 'shows differences between normal and flash attention for output, dq, dk, dv' )
print ( f'o: { diff } ' )       # < 1e-4
print ( f'dq: { dq_diff } ' )   # < 1e-6
print ( f'dk: { dk_diff } ' )   # < 1e-6
print ( f'dv: { dv_diff } ' )   # < 1e-6

Attention Flash autorégressive - Attention du décodeur de type GPT

 from jax import random
from flash_attention_jax import causal_flash_attention

rng_key = random . PRNGKey ( 42 )

q = random . normal ( rng_key , ( 131072 , 512 ))
k = random . normal ( rng_key , ( 131072 , 512 ))
v = random . normal ( rng_key , ( 131072 , 512 ))

out , _ = causal_flash_attention ( q , k , v )

out . shape  # (131072, 512)

Faire

Dimensions principales pour la variante d'attention flash causale
comprendre le problème avec les argnums jit et statiques
commentaire avec des références aux algorithmes papier et aux explications
assurez-vous qu'il peut fonctionner avec des clés/valeurs unidirectionnelles, comme dans PaLM

Citations

 @article { Dao2022FlashAttentionFA ,
    title   = { FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness } ,
    author  = { Tri Dao and Daniel Y. Fu and Stefano Ermon and Atri Rudra and Christopher R'e } ,
    journal = { ArXiv } ,
    year    = { 2022 } ,
    volume  = { abs/2205.14135 }
}

 @article { Rabe2021SelfattentionDN ,
    title   = { Self-attention Does Not Need O(n2) Memory } ,
    author  = { Markus N. Rabe and Charles Staats } ,
    journal = { ArXiv } ,
    year    = { 2021 } ,
    volume  = { abs/2112.05682 }
}

Développer

Informations supplémentaires

Version 0.3.1
Type Code Source AI
Date de mise à jour 2025-01-14
taille 143.67KB
Provenant de Github

Applications connexes

Adobe Flash Player

2023-06-18
Système de site complet SXW Flash [SXW Flash CMS]

2012-12-28
Site complet CXT Flash

2010-09-17
Site complet SXW Flash

2009-08-19
Aide Flash 8

2009-06-10
Macromédia Flash

2009-05-31

Recommandé pour vous

chat.petals.dev

Autre code source

1.0.0
GPT Prompt Templates

Autre code source

1.0.0
GPTyped

Autre code source

GPTyped 1.0.5
node telegram bot api

Code Source AI

v0.50.0
typebot.io

Code Source AI

v3.1.2
python wechaty getting started

Code Source AI

1.0.0
waymo open dataset

Autre code source

December 2023 Update
termwind

Autres catégories

v2.3.0
wp functions

Autres catégories

1.0.0

Actualités connexes Tout