turkish spacy models下载 - turkish spacy models源代码下载

turkish spacy models

其他源码

Transformer-based model ready

下载

土耳其螺旋模型

欢迎来到土耳其Spacy模型的页面。您可以在我们的HuggingFace Repo下找到所有型号。此存储库包含配置文件

tr_core_web_md
tr_core_web_lg
tr_core_web_trf

所有管道都包含一个令牌，可训练的Lemmatizer，POS标记器，依赖关系解析器，形态学器和NER组件。

可用型号

tr_core_web_lg是一种基于CNN的大尺寸型号，它具有良好的精度并以不错的速度运行。该模型包括上面的所有组件，并用大型小花词向量包装。

同样， tr_core_web_md是一个基于CNN的中型模型，它具有不错的精度，并且可能是速度关键应用程序的好选择。并用中型小花词向量包装。

tr_core_web_trf是基于tranformer的管道。如果您拥有良好的计算资源，它提供了非常准确的准确性，这是您选择的模型（甚至更好的GPU）。

安装

您可以从Huggingface下载所有模型：

基于变压器的模型pip install https://huggingface.co/turkish-nlp-suite/tr_core_news_trf/resolve/main/tr_core_news_trf-any-py3-none-any.whl
大型型号： pip install https://huggingface.co/turkish-nlp-suite/tr_core_news_lg/resolve/main/tr_core_news_lg-any-py3-none-any.whl
中型型号： pip install https://huggingface.co/turkish-nlp-suite/tr_core_news_md/resolve/main/tr_core_news_md-any-py3-none-any.whl

看一看

通过PIP安装模型后，您可以通过加载到Spacy直接使用：

 import spacy
nlp = spacy.load("tr_core_news_trf")

doc = nlp("Dün ben de gittim.")

文档可在我们的网站上找到：[todo]

教程

请访问我的频道，以获取两个播放列表hızlıspacytürkçeTarifleri和Spacy ModeliinasılYapılır？您会在第一个播放列表中找到与Spacy Turkish的快速食谱，第二个播放列表详细介绍了如何训练和包装新语言的模型。

 @inproceedings{altinok-2023-diverse,
    title = "A Diverse Set of Freely Available Linguistic Resources for {T}urkish",
    author = "Altinok, Duygu",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.768",
    pages = "13739--13750",
    abstract = "This study presents a diverse set of freely available linguistic resources for Turkish natural language processing, including corpora, pretrained models and education material. Although Turkish is spoken by a sizeable population of over 80 million people, Turkish linguistic resources for natural language processing remain scarce. In this study, we provide corpora to allow practitioners to build their own applications and pretrained models that would assist industry researchers in creating quick prototypes. The provided corpora include named entity recognition datasets of diverse genres, including Wikipedia articles and supplement products customer reviews. In addition, crawling e-commerce and movie reviews websites, we compiled several sentiment analysis datasets of different genres. Our linguistic resources for Turkish also include pretrained spaCy language models. To the best of our knowledge, our models are the first spaCy models trained for the Turkish language. Finally, we provide various types of education material, such as video tutorials and code examples, that can support the interested audience on practicing Turkish NLP. The advantages of our linguistic resources are three-fold: they are freely available, they are first of their kind, and they are easy to use in a broad range of implementations. Along with a thorough description of the resource creation process, we also explain the position of our resources in the Turkish NLP world.",
}

展开

附加信息