turkish spacy models
Transformer-based model ready
欢迎来到土耳其Spacy模型的页面。您可以在我们的HuggingFace Repo下找到所有型号。此存储库包含配置文件
所有管道都包含一个令牌,可训练的Lemmatizer,POS标记器,依赖关系解析器,形态学器和NER组件。
tr_core_web_lg
是一种基于CNN的大尺寸型号,它具有良好的精度并以不错的速度运行。该模型包括上面的所有组件,并用大型小花词向量包装。
同样, tr_core_web_md
是一个基于CNN的中型模型,它具有不错的精度,并且可能是速度关键应用程序的好选择。并用中型小花词向量包装。
tr_core_web_trf
是基于tranformer的管道。如果您拥有良好的计算资源,它提供了非常准确的准确性,这是您选择的模型(甚至更好的GPU)。
您可以从Huggingface下载所有模型:
pip install https://huggingface.co/turkish-nlp-suite/tr_core_news_trf/resolve/main/tr_core_news_trf-any-py3-none-any.whl
pip install https://huggingface.co/turkish-nlp-suite/tr_core_news_lg/resolve/main/tr_core_news_lg-any-py3-none-any.whl
pip install https://huggingface.co/turkish-nlp-suite/tr_core_news_md/resolve/main/tr_core_news_md-any-py3-none-any.whl
通过PIP安装模型后,您可以通过加载到Spacy直接使用:
import spacy
nlp = spacy.load("tr_core_news_trf")
doc = nlp("Dün ben de gittim.")
文档可在我们的网站上找到:[todo]
请访问我的频道,以获取两个播放列表hızlıspacytürkçeTarifleri和Spacy ModeliinasılYapılır?您会在第一个播放列表中找到与Spacy Turkish的快速食谱,第二个播放列表详细介绍了如何训练和包装新语言的模型。
这项工作得到了Google开发人员专家计划的支持。 Duygu 2022秋冬系列的一部分,“ Turkish NLP与Duygu”/“Duygu'ylaTürkçeNLP”。版权所有。如果您想在自己的作品中使用这些模型,请邀请论文为土耳其语提供各种各样的语言资源:
@inproceedings{altinok-2023-diverse,
title = "A Diverse Set of Freely Available Linguistic Resources for {T}urkish",
author = "Altinok, Duygu",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.768",
pages = "13739--13750",
abstract = "This study presents a diverse set of freely available linguistic resources for Turkish natural language processing, including corpora, pretrained models and education material. Although Turkish is spoken by a sizeable population of over 80 million people, Turkish linguistic resources for natural language processing remain scarce. In this study, we provide corpora to allow practitioners to build their own applications and pretrained models that would assist industry researchers in creating quick prototypes. The provided corpora include named entity recognition datasets of diverse genres, including Wikipedia articles and supplement products customer reviews. In addition, crawling e-commerce and movie reviews websites, we compiled several sentiment analysis datasets of different genres. Our linguistic resources for Turkish also include pretrained spaCy language models. To the best of our knowledge, our models are the first spaCy models trained for the Turkish language. Finally, we provide various types of education material, such as video tutorials and code examples, that can support the interested audience on practicing Turkish NLP. The advantages of our linguistic resources are three-fold: they are freely available, they are first of their kind, and they are easy to use in a broad range of implementations. Along with a thorough description of the resource creation process, we also explain the position of our resources in the Turkish NLP world.",
}