indonlg下載 - indonlg原始碼下載

indonlg

其他源碼

1.0.0

下載

印度NLG

Baca 自述文件是印尼語。

️ 2024 年 11 月 16 日更新：我們更新了 IndoNLG 中資料集和 fasttext 模型的連結！

IndoNLG是印尼語自然語言生成 (NLG) 資源的集合，具有 6 種下游任務。我們提供了重現結果的程式碼以及使用約 40 億個單字語料庫 ( Indo4B-Plus )、約 25 GB 文字資料進行訓練的大型預訓練模型（ IndoBART和IndoGPT ）。該計畫最初是由萬隆技術學院、Universitas Multimedia Nusantara、香港科技大學、印尼大學、DeepMind、Gojek 和 Prosa.AI 等大學和工業界聯合合作啟動。

研究論文

IndoNLG 已被 EMNLP 2021 接受，您可以在我們的論文 https://aclanthology.org/2021.emnlp-main.699 中找到詳細資訊。如果您在工作中使用 IndoNLG 上的任何組件，包括 Indo4B-Plus、IndoBART 或 IndoGPT，請引用以下論文：

 @inproceedings{cahyawijaya-etal-2021-indonlg,
    title = "{I}ndo{NLG}: Benchmark and Resources for Evaluating {I}ndonesian Natural Language Generation",
    author = "Cahyawijaya, Samuel and Winata, Genta Indra and Wilie, Bryan and Vincentio, Karissa and Li, Xiaohong and Kuncoro, Adhiguna and Ruder, Sebastian and Lim, Zhi Yuan and Bahar, Syafri and Khodra, Masayu and Purwarianti, Ayu and Fung, Pascale",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.699",
    pages = "8875--8898",
}