Descargar flash attention jax - flash attention jax Descarga del código fuente

flash attention jax

Código Fuente de IA

0.3.1

Descargar

Atención flash - Jax

Implementación de Flash Atención en Jax. Probablemente no tendrá tanto rendimiento como la versión oficial de CUDA, dada la falta de capacidad para una gestión precisa de la memoria. Pero solo con fines educativos y para ver qué tan inteligente es (o no es) el compilador XLA.

Instalar

$ pip install flash-attention-jax

Uso

 from jax import random
from flash_attention_jax import flash_attention

rng_key = random . PRNGKey ( 42 )

q = random . normal ( rng_key , ( 1 , 2 , 131072 , 512 ))  # (batch, heads, seq, dim)
k = random . normal ( rng_key , ( 1 , 2 , 131072 , 512 ))
v = random . normal ( rng_key , ( 1 , 2 , 131072 , 512 ))
mask = random . randint ( rng_key , ( 1 , 131072 ,), 0 , 2 ) # (batch, seq)

out , _ = flash_attention ( q , k , v , mask )

out . shape  # (1, 2, 131072, 512) - (batch, heads, seq, dim)

Comprobación rápida de cordura

 from flash_attention_jax import plain_attention , flash_attention , value_and_grad_difference

diff , ( dq_diff , dk_diff , dv_diff ) = value_and_grad_difference (
    plain_attention ,
    flash_attention ,
    seed = 42
)

print ( 'shows differences between normal and flash attention for output, dq, dk, dv' )
print ( f'o: { diff } ' )       # < 1e-4
print ( f'dq: { dq_diff } ' )   # < 1e-6
print ( f'dk: { dk_diff } ' )   # < 1e-6
print ( f'dv: { dv_diff } ' )   # < 1e-6

Atención de flash autorregresivo: atención de decodificador tipo GPT

 from jax import random
from flash_attention_jax import causal_flash_attention

rng_key = random . PRNGKey ( 42 )

q = random . normal ( rng_key , ( 131072 , 512 ))
k = random . normal ( rng_key , ( 131072 , 512 ))
v = random . normal ( rng_key , ( 131072 , 512 ))

out , _ = causal_flash_attention ( q , k , v )

out . shape  # (131072, 512)

Hacer

Dimensiones principales para la variante causal de atención flash.
resolver el problema con jit y argumentos estáticos
comentario con referencias a algoritmos y explicaciones en papel
asegúrese de que pueda funcionar con claves/valores de un solo cabezal, como en PaLM

Citas

 @article { Dao2022FlashAttentionFA ,
    title   = { FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness } ,
    author  = { Tri Dao and Daniel Y. Fu and Stefano Ermon and Atri Rudra and Christopher R'e } ,
    journal = { ArXiv } ,
    year    = { 2022 } ,
    volume  = { abs/2205.14135 }
}

 @article { Rabe2021SelfattentionDN ,
    title   = { Self-attention Does Not Need O(n2) Memory } ,
    author  = { Markus N. Rabe and Charles Staats } ,
    journal = { ArXiv } ,
    year    = { 2021 } ,
    volume  = { abs/2112.05682 }
}

Expandir

Información adicional

Versión 0.3.1
Tipo Código Fuente de IA
Fecha de actualización 2025-01-14
tamaño 143.67KB
Proviene de Github

Aplicaciones relacionadas

Adobe Flash Player

2023-06-18
Sistema de sitio completo SXW Flash [SXW Flash CMS]

2012-12-28
CXT Flash todo el sitio

2010-09-17
SXW Flash todo el sitio

2009-08-19
Flash 8 Ayuda

2009-06-10
Flash Macromedia

2009-05-31

Recomendado para ti

chat.petals.dev

Otro código fuente

1.0.0
GPT Prompt Templates

Otro código fuente

1.0.0
GPTyped

Otro código fuente

GPTyped 1.0.5
node telegram bot api

Código Fuente de IA

v0.50.0
typebot.io

Código Fuente de IA

v3.1.2
python wechaty getting started

Código Fuente de IA

1.0.0
waymo open dataset

Otro código fuente

December 2023 Update
termwind

Otras categorias

v2.3.0
wp functions

Otras categorias

1.0.0

Información relacionada Todo