q transformer 다운로드 - q transformer 소스 코드 다운로드

q transformer

AI 소스 코드

0.3.0

다운로드

Q-변압기

Google Deepmind의 자동회귀 Q-함수를 통한 확장 가능한 오프라인 강화 학습인 Q-Transformer 구현

제안된 다중 작업에 대한 자동 회귀 Q-학습과의 최종 비교를 위해 단일 작업에 대한 Q-학습 논리를 계속 다루겠습니다. 또한 나 자신과 대중을 위한 교육의 역할도 합니다.

자기회귀 Q-학습 공식은 Kotb et al.에 의해 재현되었습니다.

설치하다

$ pip install q-transformer

용법

 import torch

from q_transformer import (
    QRoboticTransformer ,
    QLearner ,
    Agent ,
    ReplayMemoryDataset
)

# the attention model

model = QRoboticTransformer (
    vit = dict (
        num_classes = 1000 ,
        dim_conv_stem = 64 ,
        dim = 64 ,
        dim_head = 64 ,
        depth = ( 2 , 2 , 5 , 2 ),
        window_size = 7 ,
        mbconv_expansion_rate = 4 ,
        mbconv_shrinkage_rate = 0.25 ,
        dropout = 0.1
    ),
    num_actions = 8 ,
    action_bins = 256 ,
    depth = 1 ,
    heads = 8 ,
    dim_head = 64 ,
    cond_drop_prob = 0.2 ,
    dueling = True
)

# you need to supply your own environment, by overriding BaseEnvironment

from q_transformer . mocks import MockEnvironment

env = MockEnvironment (
    state_shape = ( 3 , 6 , 224 , 224 ),
    text_embed_shape = ( 768 ,)
)

# env.init()     should return instructions and initial state: Tuple[str, Tensor[*state_shape]]
# env(actions)   should return rewards, next state, and done flag: Tuple[Tensor[()], Tensor[*state_shape], Tensor[()]]

# agent is a class that allows the q-model to interact with the environment to generate a replay memory dataset for learning

agent = Agent (
    model ,
    environment = env ,
    num_episodes = 1000 ,
    max_num_steps_per_episode = 100 ,
)

agent ()

# Q learning on the replay memory dataset on the model

q_learner = QLearner (
    model ,
    dataset = ReplayMemoryDataset (),
    num_train_steps = 10000 ,
    learning_rate = 3e-4 ,
    batch_size = 4 ,
    grad_accum_every = 16 ,
)

q_learner ()

# after much learning
# your robot should be better at selecting optimal actions

video = torch . randn ( 2 , 3 , 6 , 224 , 224 )

instructions = [
    'bring me that apple sitting on the table' ,
    'please pass the butter'
]

actions = model . get_optimal_actions ( video , instructions )

감사

StabilityAI, A16Z 오픈 소스 AI 보조금 프로그램, 그리고? 현재 인공 지능 연구를 오픈 소스로 독립시킬 수 있도록 아낌없는 후원과 다른 후원자들에게 포옹을 전합니다.

토도

인용

 @inproceedings { qtransformer ,
    title   = { Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions } ,
    authors = { Yevgen Chebotar and Quan Vuong and Alex Irpan and Karol Hausman and Fei Xia and Yao Lu and Aviral Kumar and Tianhe Yu and Alexander Herzog and Karl Pertsch and Keerthana Gopalakrishnan and Julian Ibarz and Ofir Nachum and Sumedh Sontakke and Grecia Salazar and Huong T Tran and Jodilyn Peralta and Clayton Tan and Deeksha Manjunath and Jaspiar Singht and Brianna Zitkovich and Tomas Jackson and Kanishka Rao and Chelsea Finn and Sergey Levine } ,
    booktitle = { 7th Annual Conference on Robot Learning } ,
    year   = { 2023 }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @inproceedings { Kumar2023MaintainingPI ,
    title   = { Maintaining Plasticity in Continual Learning via Regenerative Regularization } ,
    author  = { Saurabh Kumar and Henrik Marklund and Benjamin Van Roy } ,
    year    = { 2023 } ,
    url     = { https://api.semanticscholar.org/CorpusID:261076021 }
}