在 Pytorch 中使用 Attention 实现 MeshGPT、SOTA 网格生成
还将添加文本调节,以实现最终的文本转 3D 资源
如果您有兴趣与其他人合作复制这项工作,请加入
更新:马库斯已经训练了一个工作模型并将其上传到?抱脸!
StabilityAI、A16Z 开源 AI 资助计划,以及?感谢慷慨的赞助,以及我的其他赞助商,为我提供了开源当前人工智能研究的独立性
Einops 让我的生活变得轻松
Marcus 负责初始代码审查(指出一些缺失的派生功能)以及运行第一个成功的端到端实验
马库斯首次成功训练了以标签为条件的一组形状
Quexi Ma 通过自动 eos 处理发现了大量错误
Yingtian 发现空间标签平滑位置高斯模糊的错误
马库斯再次运行实验来验证可以将系统从三角形扩展到四边形
马库斯(Marcus)发现了文本调节问题并进行了所有导致该问题得到解决的实验
$ pip install meshgpt-pytorch
import torch
from meshgpt_pytorch import (
MeshAutoencoder ,
MeshTransformer
)
# autoencoder
autoencoder = MeshAutoencoder (
num_discrete_coors = 128
)
# mock inputs
vertices = torch . randn (( 2 , 121 , 3 )) # (batch, num vertices, coor (3))
faces = torch . randint ( 0 , 121 , ( 2 , 64 , 3 )) # (batch, num faces, vertices (3))
# make sure faces are padded with `-1` for variable lengthed meshes
# forward in the faces
loss = autoencoder (
vertices = vertices ,
faces = faces
)
loss . backward ()
# after much training...
# you can pass in the raw face data above to train a transformer to model this sequence of face vertices
transformer = MeshTransformer (
autoencoder ,
dim = 512 ,
max_seq_len = 768
)
loss = transformer (
vertices = vertices ,
faces = faces
)
loss . backward ()
# after much training of transformer, you can now sample novel 3d assets
faces_coordinates , face_mask = transformer . generate ()
# (batch, num faces, vertices (3), coordinates (3)), (batch, num faces)
# now post process for the generated 3d asset
对于文本条件 3D 形状合成,只需在MeshTransformer
上设置condition_on_text = True
,然后将描述列表作为texts
关键字参数传递
前任。
transformer = MeshTransformer (
autoencoder ,
dim = 512 ,
max_seq_len = 768 ,
condition_on_text = True
)
loss = transformer (
vertices = vertices ,
faces = faces ,
texts = [ 'a high chair' , 'a small teapot' ],
)
loss . backward ()
# after much training of transformer, you can now sample novel 3d assets conditioned on text
faces_coordinates , face_mask = transformer . generate (
texts = [ 'a long table' ],
cond_scale = 8. , # a cond_scale > 1. will enable classifier free guidance - can be placed anywhere from 3. - 10.
remove_parallel_component = True # from https://arxiv.org/abs/2410.02416
)
如果您想对网格进行标记,以便在多模态转换器中使用,只需在自动编码器上调用.tokenize
(或者在自动编码器训练器实例上调用指数平滑模型的相同方法)
mesh_token_ids = autoencoder . tokenize (
vertices = vertices ,
faces = faces
)
# (batch, num face vertices, residual quantized layer)
在项目根目录下,运行
$ cp .env.sample .env
自动编码器
face_edges
变压器
具有高频加速功能的训练器包装器
使用自己的 CFG 库进行文本调节
分层变压器(使用 RQ 变压器)
修复其他存储库中简单门环层中的缓存
当地关注
修复两级分层变压器的 kv 缓存 - 现在快 7 倍,比原始非分层变压器更快
修复门环层的缓存
允许定制精细与粗略注意力网络的模型维度
弄清楚自动编码器是否真的有必要 - 有必要,论文中有消融
提高变压器效率
推测解码选项
花一天时间研究文档
@inproceedings { Siddiqui2023MeshGPTGT ,
title = { MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers } ,
author = { Yawar Siddiqui and Antonio Alliegro and Alexey Artemov and Tatiana Tommasi and Daniele Sirigatti and Vladislav Rosov and Angela Dai and Matthias Nie{ss}ner } ,
year = { 2023 } ,
url = { https://api.semanticscholar.org/CorpusID:265457242 }
}
@inproceedings { dao2022flashattention ,
title = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
author = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
booktitle = { Advances in Neural Information Processing Systems } ,
year = { 2022 }
}
@inproceedings { Leviathan2022FastIF ,
title = { Fast Inference from Transformers via Speculative Decoding } ,
author = { Yaniv Leviathan and Matan Kalman and Y. Matias } ,
booktitle = { International Conference on Machine Learning } ,
year = { 2022 } ,
url = { https://api.semanticscholar.org/CorpusID:254096365 }
}
@misc { yu2023language ,
title = { Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation } ,
author = { Lijun Yu and José Lezama and Nitesh B. Gundavarapu and Luca Versari and Kihyuk Sohn and David Minnen and Yong Cheng and Agrim Gupta and Xiuye Gu and Alexander G. Hauptmann and Boqing Gong and Ming-Hsuan Yang and Irfan Essa and David A. Ross and Lu Jiang } ,
year = { 2023 } ,
eprint = { 2310.05737 } ,
archivePrefix = { arXiv } ,
primaryClass = { cs.CV }
}
@article { Lee2022AutoregressiveIG ,
title = { Autoregressive Image Generation using Residual Quantization } ,
author = { Doyup Lee and Chiheon Kim and Saehoon Kim and Minsu Cho and Wook-Shin Han } ,
journal = { 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) } ,
year = { 2022 } ,
pages = { 11513-11522 } ,
url = { https://api.semanticscholar.org/CorpusID:247244535 }
}
@inproceedings { Katsch2023GateLoopFD ,
title = { GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling } ,
author = { Tobias Katsch } ,
year = { 2023 } ,
url = { https://api.semanticscholar.org/CorpusID:265018962 }
}