indonlg 다운로드 - indonlg 소스 코드 다운로드

indonlg

기타 소스코드

1.0.0

다운로드

인도NLG

Baca README ini dalam Bahasa Indonesia.

️ 2024년 16월 11일 업데이트: IndoNLG의 데이터 세트 및 fasttext 모델에 대한 링크를 업데이트합니다!

IndoNLG 는 6가지 종류의 다운스트림 작업이 포함된 인도네시아어용 자연어 생성(NLG) 리소스 모음입니다. 우리는 약 40억 단어 코퍼스( Indo4B-Plus ), 약 25GB의 텍스트 데이터로 훈련된 결과와 사전 훈련된 대규모 모델( IndoBART 및 IndoGPT )을 재현하기 위한 코드를 제공합니다. 이 프로젝트는 처음에는 Institut Teknologi Bandung, Universitas Multimedia Nusantara, The Hong Kong University of Science and Technology, Universitas Indonesia, DeepMind, Gojek, Prosa.AI 등 대학과 업계 간의 공동 협력으로 시작되었습니다.

연구 논문

IndoNLG는 EMNLP 2021에서 승인되었으며 자세한 내용은 당사 논문 https://aclanthology.org/2021.emnlp-main.699에서 확인할 수 있습니다. Indo4B-Plus, IndoBART 또는 IndoGPT를 포함하여 IndoNLG의 구성 요소를 작업에 사용하는 경우 다음 문서를 인용하십시오.

 @inproceedings{cahyawijaya-etal-2021-indonlg,
    title = "{I}ndo{NLG}: Benchmark and Resources for Evaluating {I}ndonesian Natural Language Generation",
    author = "Cahyawijaya, Samuel and Winata, Genta Indra and Wilie, Bryan and Vincentio, Karissa and Li, Xiaohong and Kuncoro, Adhiguna and Ruder, Sebastian and Lim, Zhi Yuan and Bahar, Syafri and Khodra, Masayu and Purwarianti, Ayu and Fung, Pascale",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.699",
    pages = "8875--8898",
}