rvq vae gpt
0.0.4
我嘗試將 Soundstream 設計應用於學習的文本標記化,然後將分層轉換器應用於文字生成。
Soundstream 將被修改以利用所有本地註意力。實驗將比較 VQ、RVQ 以及多頭 VQ
一位研究員朋友告訴我這可能會失敗?但無論如何我都會嘗試一下,yolo。如果它不起作用,也許它對基因組學仍然有用。想一想,為什麼它不能至少學習二元組(對於英語)和密碼子(對於基因組學)?為什麼我們沒有分層預測編碼?我們應該
更新:一些現場實驗
@misc { https://doi.org/10.48550/arxiv.2107.03312 ,
title = { SoundStream: An End-to-End Neural Audio Codec } ,
author = { Zeghidour, Neil and Luebs, Alejandro and Omran, Ahmed and Skoglund, Jan and Tagliasacchi, Marco } ,
publisher = { arXiv } ,
url = { https://arxiv.org/abs/2107.03312 } ,
year = { 2021 }
}
@unknown { unknown ,
author = { Lee, Doyup and Kim, Chiheon and Kim, Saehoon and Cho, Minsu and Han, Wook-Shin } ,
year = { 2022 } ,
month = { 03 } ,
title = { Autoregressive Image Generation using Residual Quantization }
}
@article { Sunkara2022NoMS ,
title = { No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects } ,
author = { Raja Sunkara and Tie Luo } ,
journal = { ArXiv } ,
year = { 2022 } ,
volume = { abs/2208.03641 }
}
@inproceedings { Fifty2024RestructuringVQ ,
title = { Restructuring Vector Quantization with the Rotation Trick } ,
author = { Christopher Fifty and Ronald G. Junkins and Dennis Duan and Aniketh Iger and Jerry W. Liu and Ehsan Amid and Sebastian Thrun and Christopher R'e } ,
year = { 2024 } ,
url = { https://api.semanticscholar.org/CorpusID:273229218 }
}