ดาวน์โหลด Mega pytorch - ดาวน์โหลด Mega pytorch ซอร์สโค้ด

Mega pytorch

โค้ดแหล่งที่มา AI

0.1.0

ดาวน์โหลด

เมกะ - ค่าเฉลี่ยเคลื่อนที่ ความสนใจที่มีรั้วรอบขอบชิด - Pytorch

การใช้งานเลเยอร์ Mega, Single-head Attention พร้อมเลเยอร์ EMA หลายหัวที่มีอยู่ในสถาปัตยกรรมที่ปัจจุบันมี SOTA บน Long Range Arena, เอาชนะ S4 บน Pathfinder-X และงานอื่น ๆ ทั้งหมดจะบันทึกเป็นเสียง

ติดตั้ง

$ pip install mega-pytorch

การใช้งาน

Mega Layer ที่ผสมผสานระหว่างความสนใจและการเรียนรู้ EMA

 import torch
from mega_pytorch import MegaLayer

layer = MegaLayer (
    dim = 128 ,                   # model dimensions
    ema_heads = 16 ,              # number of EMA heads
    attn_dim_qk = 64 ,            # dimension of queries / keys in attention
    attn_dim_value = 256 ,        # dimension of values in attention
    laplacian_attn_fn = False ,   # whether to use softmax (false) or laplacian attention activation fn (true)
)

x = torch . randn ( 1 , 1024 , 128 )     # (batch, seq, dim)

out = layer ( x ) # (1, 1024, 128)

Full Mega (พร้อม layernorm ในตอนนี้)

 import torch
from mega_pytorch import Mega

mega = Mega (
    num_tokens = 256 ,            # number of tokens
    dim = 128 ,                   # model dimensions
    depth = 6 ,                   # depth
    ema_heads = 16 ,              # number of EMA heads
    attn_dim_qk = 64 ,            # dimension of queries / keys in attention
    attn_dim_value = 256 ,        # dimensino of values in attention
    laplacian_attn_fn = True ,    # whether to use softmax (false) or laplacian attention activation fn (true)
)

x = torch . randint ( 0 , 256 , ( 1 , 1024 ))

logits = mega ( x ) # (1, 1024, 256)

สิ่งที่ต้องทำ

เพิ่มอคติตำแหน่งแบบไดนามิกสำหรับส่วนโค้งการคาดการณ์ความยาวที่ดีที่สุด

การอ้างอิง

 @inproceedings { Ma2022MegaMA ,
    title   = { Mega: Moving Average Equipped Gated Attention } ,
    author  = { Xuezhe Ma and Chunting Zhou and Xiang Kong and Junxian He and Liangke Gui and Graham Neubig and Jonathan May and Luke Zettlemoyer } ,
    year    = { 2022 }
}