MultiModalMamba 다운로드 - MultiModalMamba 소스 코드 다운로드

MultiModalMamba

AI 소스 코드

1.0.0

다운로드

다중 양식

멀티 모달 맘바 - [MMM]

Multi Modal Mamba( MultiModalMamba )는 ViT(Vision Transformer)와 Mamba를 통합하여 고성능 다중 모드 모델을 생성하는 완전히 새로운 AI 모델입니다. MultiModalMamba 기계 학습 모델 관리를 간소화하고 향상하도록 설계된 미니멀하면서도 강력한 AI 프레임워크인 Zeta를 기반으로 구축되었습니다.

여러 데이터 유형을 동시에 처리하고 해석하는 능력은 필수적입니다. 세상은 1차원이 아닙니다. MultiModalMamba Vision Transformer와 Mamba의 기능을 활용하여 텍스트와 이미지 데이터를 모두 효율적으로 처리함으로써 이러한 요구 사항을 해결합니다. 이는 MultiModalMamba 광범위한 AI 작업을 위한 다목적 솔루션으로 만듭니다.

설치하다

pip3 install mmm-zeta

용법

`MultiModalMamba Block`

 # Import the necessary libraries
import torch 
from torch import nn
from mm_mamba import MultiModalMamba Block

# Create some random input tensors
x = torch . randn ( 1 , 16 , 64 )  # Tensor with shape (batch_size, sequence_length, feature_dim)
y = torch . randn ( 1 , 3 , 64 , 64 )  # Tensor with shape (batch_size, num_channels, image_height, image_width)

# Create an instance of the MultiModalMamba Block model
model = MultiModalMamba Block (
    dim = 64 ,  # Dimension of the token embeddings
    depth = 5 ,  # Number of Mamba layers
    dropout = 0.1 ,  # Dropout probability
    heads = 4 ,  # Number of attention heads
    d_state = 16 ,  # Dimension of the state embeddings
    image_size = 64 ,  # Size of the input image
    patch_size = 16 ,  # Size of each image patch
    encoder_dim = 64 ,  # Dimension of the encoder token embeddings
    encoder_depth = 5 ,  # Number of encoder transformer layers
    encoder_heads = 4 ,  # Number of encoder attention heads
    fusion_method = "mlp" ,
)

# Pass the input tensors through the model
out = model ( x , y )

# Print the shape of the output tensor
print ( out . shape )

`MultiModalMamba` , 모델 훈련 준비 완료

데이터 유형의 유연성: MultiModalMamba 모델은 텍스트와 이미지 데이터를 동시에 처리할 수 있습니다. 이를 통해 텍스트와 이미지 데이터 모두에 대한 이해가 필요한 작업을 포함하여 더욱 다양한 데이터 세트와 작업에 대해 훈련할 수 있습니다.
사용자 정의 가능한 아키텍처: MultiModalMamba 모델에는 깊이, 드롭아웃, 헤드, d_state, image_size, patch_size, 인코더_dim, 인코더_깊이, 인코더_헤드 및 fusion_method와 같은 다양한 매개변수가 있습니다. 이러한 매개변수는 현재 작업의 특정 요구 사항에 따라 조정될 수 있으므로 모델 아키텍처에서 높은 수준의 사용자 정의가 가능합니다.
임베딩 반환 옵션: MultiModalMamba 모델에는 return_embeddings 옵션이 있습니다. True로 설정하면 모델은 최종 출력 대신 임베딩을 반환합니다. 이는 전이 학습이나 특징 추출 작업과 같이 모델에서 학습한 중간 표현에 액세스해야 하는 작업에 유용할 수 있습니다.

 import torch  # Import the torch library

# Import the MultiModalMamba model from the mm_mamba module
from mm_mamba import MultiModalMamba

# Generate a random tensor 'x' of size (1, 224) with random elements between 0 and 10000
x = torch . randint ( 0 , 10000 , ( 1 , 196 ))

# Generate a random image tensor 'img' of size (1, 3, 224, 224)
img = torch . randn ( 1 , 3 , 224 , 224 )

# Audio tensor 'aud' of size 2d
aud = torch . randn ( 1 , 224 )

# Video tensor 'vid' of size 5d - (batch_size, channels, frames, height, width)
vid = torch . randn ( 1 , 3 , 16 , 224 , 224 )

# Create a MultiModalMamba model object with the following parameters:
model = MultiModalMamba (
    vocab_size = 10000 ,
    dim = 512 ,
    depth = 6 ,
    dropout = 0.1 ,
    heads = 8 ,
    d_state = 512 ,
    image_size = 224 ,
    patch_size = 16 ,
    encoder_dim = 512 ,
    encoder_depth = 6 ,
    encoder_heads = 8 ,
    fusion_method = "mlp" ,
    return_embeddings = False ,
    post_fuse_norm = True ,
)

# Pass the tensor 'x' and 'img' through the model and store the output in 'out'
out = model ( x , img , aud , vid )

# Print the shape of the output tensor 'out'
print ( out . shape )