memory efficient attention pytorch تنزيل - memory efficient attention pytorch تنزيل كود المصدر

memory efficient attention pytorch

كود الذكاء الاصطناعي

0.1.6

تنزيل

مشعل الانتباه ذو الكفاءة في الذاكرة (عفا عليه الزمن)

تنفيذ اهتمام متعدد الرؤوس بكفاءة الذاكرة كما هو مقترح في الورقة، الاهتمام الذاتي لا يحتاج إلى ذاكرة O(n²). بالإضافة إلى ذلك، ستهتم الوحدة بالإخفاء والإخفاء السببي بالإضافة إلى الانتباه المتقاطع.

يحتوي هذا المستودع أيضًا على تطبيق ساذج غير CUDA للتحسينات التي أجراها Tri Dao باستخدام ورقته Flash Attention 2، للأغراض التعليمية. إنه يغير قواعد اللعبة من حيث الاهتمام وبناء محولات طويلة السياق.

التحديث: من الآن فصاعدًا، يجب عليك فقط استخدام وظيفة F.scaled_dot_product_attention في Pytorch 2.0 لدعم Flash Attention v1 المدمج - أو استخدام Flash Attention v2 في المستودع الرسمي

ثَبَّتَ

$ pip install memory-efficient-attention-pytorch

الاستخدام

لنموذج اللغة الانحدار الذاتي

 import torch
from memory_efficient_attention_pytorch import Attention

attn = Attention (
    dim = 512 ,
    dim_head = 64 ,                # dimension per head
    heads = 8 ,                    # number of attention heads
    causal = True ,                # autoregressive or not
    memory_efficient = True ,      # whether to use memory efficient attention (can be turned off to test against normal attention)
    q_bucket_size = 1024 ,         # bucket size along queries dimension
    k_bucket_size = 2048          # bucket size along key / values dimension
). cuda ()

x = torch . randn ( 1 , 65536 , 512 ). cuda ()
out = attn ( x ) # (1, 65536, 512)

عبر الاهتمام

 import torch
from memory_efficient_attention_pytorch import Attention

cross_attn = Attention (
    dim = 512 ,
    dim_head = 64 ,
    heads = 8 ,
    memory_efficient = True ,
    q_bucket_size = 1024 ,
    k_bucket_size = 2048
). cuda ()

x = torch . randn ( 1 , 65536 , 512 ). cuda ()
context = torch . randn ( 1 , 65536 , 512 ). cuda ()
mask = torch . ones ( 1 , 65536 ). bool (). cuda ()

out = cross_attn ( x , context = context , mask = mask ) # (1, 65536, 512)

الاستشهادات

 @misc { rabe2021selfattention ,
    title   = { Self-attention Does Not Need $O(n^2)$ Memory } , 
    author  = { Markus N. Rabe and Charles Staats } ,
    year    = { 2021 } ,
    eprint  = { 2112.05682 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.LG }
}

 @misc { liu2021swin ,
    title   = { Swin Transformer V2: Scaling Up Capacity and Resolution } ,
    author  = { Ze Liu and Han Hu and Yutong Lin and Zhuliang Yao and Zhenda Xie and Yixuan Wei and Jia Ning and Yue Cao and Zheng Zhang and Li Dong and Furu Wei and Baining Guo } ,
    year    = { 2021 } ,
    eprint  = { 2111.09883 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

 @article { Dao2022FlashAttentionFA ,
    title   = { FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness } ,
    author  = { Tri Dao and Daniel Y. Fu and Stefano Ermon and Atri Rudra and Christopher R'e } ,
    journal = { ArXiv } ,
    year    = { 2022 } ,
    volume  = { abs/2205.14135 }
}

 @article { dao2023flashattention2 ,
  title     = { Flash{A}ttention-2: Faster Attention with Better Parallelism and Work Partitioning,
  author    = {Dao, Tri},
  year      = {2023}
}

يوسع

معلومات إضافية

الإصدار 0.1.6
النوع كود الذكاء الاصطناعي
وقت التحديث 2025-01-14
الحجم 34.87MB
من Github

تطبيقات ذات صلة

efficient language detector

2024-11-06
Parameter Efficient Transfer Learning Benchmark

2024-11-06
pytorch image models

2024-11-03
ذاكرة استغاثة

2023-04-07
الذاكرة الساطعة: لا نهائية

2022-07-29
قاعة الذاكرة نظام موقع شخصي بسيط

2010-12-10

نوصي لك

chat.petals.dev

شفرة المصدر الأخرى

1.0.0
GPT Prompt Templates

شفرة المصدر الأخرى

1.0.0
GPTyped

شفرة المصدر الأخرى

GPTyped 1.0.5
node telegram bot api

كود الذكاء الاصطناعي

v0.50.0
typebot.io

كود الذكاء الاصطناعي

v3.1.2
python wechaty getting started

كود الذكاء الاصطناعي

1.0.0
waymo open dataset

شفرة المصدر الأخرى

December 2023 Update
termwind

فئات أخرى

v2.3.0
wp functions

فئات أخرى

1.0.0

أخبار ذات صلة الكل