indonlg下载 - indonlg源代码下载

indonlg

其他源码

1.0.0

下载

印度NLG

Baca 自述文件是印度尼西亚语。

️ 2024 年 11 月 16 日更新：我们更新了 IndoNLG 中数据集和 fasttext 模型的链接！

IndoNLG是印度尼西亚语自然语言生成 (NLG) 资源的集合，具有 6 种下游任务。我们提供了重现结果的代码以及使用约 40 亿个单词语料库 ( Indo4B-Plus )、约 25 GB 文本数据进行训练的大型预训练模型（ IndoBART和IndoGPT ）。该项目最初是由万隆技术学院、Universitas Multimedia Nusantara、香港科技大学、印度尼西亚大学、DeepMind、Gojek 和 Prosa.AI 等大学和工业界联合合作启动。

研究论文

IndoNLG 已被 EMNLP 2021 接受，您可以在我们的论文 https://aclanthology.org/2021.emnlp-main.699 中找到详细信息。如果您在工作中使用 IndoNLG 上的任何组件，包括 Indo4B-Plus、IndoBART 或 IndoGPT，请引用以下论文：

 @inproceedings{cahyawijaya-etal-2021-indonlg,
    title = "{I}ndo{NLG}: Benchmark and Resources for Evaluating {I}ndonesian Natural Language Generation",
    author = "Cahyawijaya, Samuel and Winata, Genta Indra and Wilie, Bryan and Vincentio, Karissa and Li, Xiaohong and Kuncoro, Adhiguna and Ruder, Sebastian and Lim, Zhi Yuan and Bahar, Syafri and Khodra, Masayu and Purwarianti, Ayu and Fung, Pascale",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.699",
    pages = "8875--8898",
}