tutorial cohere text classifier下载 - tutorial cohere text classifier教程源代码下载

tutorial cohere text classifier

Ai源码

1.0.0

下载

您的个人文本分类器 - Co:here 应用程序

自然语言处理是计算机科学和语言学的一个领域，涉及计算机和人类（自然）语言之间的交互。最简单的形式是，NLP 是开发能够自动理解和生成人类语言的算法。 NLP 的长期目标是创建可用于执行各种任务的人类语言计算模型。这些任务包括自动翻译、摘要、问答、信息提取等等。 NLP研究是高度跨学科的，涉及语言学、认知科学、人工智能和计算机科学等领域的研究人员。

自然语言处理中有许多不同的方法，包括基于规则的方法、统计方法和神经计算方法。基于规则的方法通常基于由 NLP 专家编写的手工规则。这些方法对于特定任务非常有效，但它们的范围通常受到限制，并且需要大量的精力来维护。统计方法基于使用大量数据来训练计算模型。然后，这些模型可用于自动执行各种 NLP 任务。神经网络是一种机器学习算法，特别适合 NLP 任务。神经网络已被用来为机器翻译和分类等任务创建最先进的模型。

公司：这里

Co：这是一个强大的神经网络，它可以生成、嵌入和分类文本。在本教程中，我们将使用 Co:here 对描述进行分类。要使用 Co:here，您需要在 Co:here 创建帐户并获取 API 密钥。

我们将使用Python进行编程，因此我们需要通过pip安装cohere库

 pip install cohere

首先，我们必须实现cohere.Client 。 Client 的参数中应该是您之前生成的 API 密钥以及版本2021-11-08 。我将创建类CoHere ，它将在接下来的步骤中有用。

 class CoHere :
    def __init__ ( self , api_key ):
        self . co = cohere . Client ( f' { api_key } ' , '2021-11-08' )
        self . examples = []

？数据集

每个神经网络的主要部分是一个数据集。在本教程中，我将使用包含 10 个类的 1000 个描述的数据集。如果您想使用相同的，可以在这里下载。

下载的数据集有10个文件夹，每个文件夹有100个files.txt以及描述。文件名是描述标签，例如sport_3.txt 。

在此领域中，任务是从文件中读取描述和标签并创建数据，其中包含描述和标签作为一个数据样本。 Cohere分类器需要样本，其中每个样本应设计为列表[description, label] 。

示例加载路径

首先，我们需要加载所有数据来做到这一点。我们创建函数load_examples 。在此函数中，我们将使用三个外部库：

os.path进入包含数据的文件夹。该代码在 python 的file.py路径中执行。这是一个内部库，所以我们不需要安装它。

numpy这个库对于处理数组很有用。在本教程中，我们将使用它来生成随机数。您必须通过 pip pip install numpy安装此库。

glob帮助我们读取所有文件和文件夹名称。这是一个外部库，因此需要安装 - pip install glob 。

下载的数据集应解压到文件夹data中。通过os.path.join我们可以获得文件夹的通用路径。

 folders_path = os . path . join ( 'data' , '*' )

在 Windows 中，返回等于data* 。

然后我们可以使用glob方法来获取所有文件夹的名称。

 folders_name = glob ( folders_path )

folders_name是一个列表，其中包含文件夹的窗口路径。在本教程中，这些是标签的名称。

[ 'data \ business' , 'data \ entertainment' , 'data \ food' , 'data \ graphics' , 'data \ historical' , 'data \ medical' , 'data \ politics' , 'data \ space' , 'data \ sport' , 'data \ technologie' ]

Co:here训练数据集的大小不能超过 50 个示例，并且每个类必须至少有 5 个示例。通过for循环我们可以获得每个文件的名称。整个函数看起来像这样：

 import os . path
from glob import glob
import numpy as np

def load_examples ():
    examples_path = []

    folders_path = os . path . join ( 'data' , '*' )
    folders_name = glob ( folders_path )

    for folder in folders_name :
        files_path = os . path . join ( folder , '*' )
        files_name = glob ( files_path )
        for i in range ( 50 // len ( folders_name )):
            random_example = np . random . randint ( 0 , len ( files_name ))
            examples_path . append ( files_name [ random_example ])
    return examples_path

最后一个循环随机获取每个标签的 5 个路径并将它们附加到新列表examples_path中。

负载描述

现在，我们必须创建一个训练集。为此，我们将使用load_examples()加载示例。每个路径中都有一个类的名称，我们将使用它来创建示例。描述需要从文件中读取，长度不能太长，因此在本教程中，长度将等于100。要列出texts ，请附加[descroption, class_name]列表。因此，返回就是该列表。

 def examples ():
    texts = []
    examples_path = load_examples ()
    for path in examples_path :
        class_name = path . split ( os . sep )[ 1 ]
        with open ( path , 'r' , encoding = "utf8" ) as file :
            text = file . read ()[: 100 ]
            texts . append ([ text , class_name ])
    return texts

Co：这里分类器

我们回到CoHere课堂。我们必须添加两种方法 - 加载示例和对输入进行分类。

第一个很简单， co:here示例列表必须使用附加的cohere方法 - cohere.classify.Example创建。

 def list_of_examples ( self ):
        for e in examples ():
            self . examples . append ( Example ( text = e [ 0 ], label = e [ 1 ]))

第二种方法是从cohere来分类方法。该方法有多个参数，例如：

模型的model尺寸。

inputs要分类的数据列表。

包含示例的训练集examples列表

所有这些您都可以在这里找到。

在本教程中， cohere方法将作为CoHere类的方法来实现。该方法的一个参数是要预测的描述列表。

 def classify ( self , inputs ):
        return self . co . classify (
            model = 'medium' ,
            inputs = inputs ,
            examples = self . examples
        ). classifications

返回的是input 、输入的prediction和confidence列表。 Confidence是每个类别的可能性列表。

 cohere . Classification {
        input :
        prediction : 
        confidence : []
}

`CoHere`课程

 import cohere
from loadExamples import examples
from cohere . classify import Example

class CoHere :
    def __init__ ( self , api_key ):
        self . co = cohere . Client ( f' { api_key } ' , '2021-11-08' )
        self . examples = []

    def list_of_examples ( self ):
        for e in examples ():
            self . examples . append ( Example ( text = e [ 0 ], label = e [ 1 ]))

    def classify ( self , inputs ):
        return self . co . classify (
            model = 'medium' ,
            taskDescription = '' ,
            outputIndicator = '' ,
            inputs = inputs ,
            examples = self . examples
        ). classifications

？ Web 应用程序 - Streamlit

为了创建一个应用程序，其中有一个文本输入框和一个可能性显示，我们将使用Stramlit 。这是一个简单且非常有用的库。

安装

 pip install streamlit

我们需要两个文本输入用于co:here API 密钥和用于预测的文本。

在streamlit的文档中我们可以找到方法：

st.header()在我们的应用程序上创建标题

st.test_input()发送文本请求

st.button()创建按钮

st.write()显示 cohere 模型的结果。

st.progress()显示进度条

st.column()拆分应用程序

 st . header ( "Your personal text classifier - Co:here application" )

api_key = st . text_input ( "API Key:" , type = "password" )        #text box for API key 

description = [ st . text_input ( "Description:" )]               #text box for text to predict

cohere = CoHere ( api_key )                                    #initialization CoHere
cohere . list_of_examples ()                                   #loading training set 

if st . button ( "Classify" ):   
    here = cohere . classify ( description )[ 0 ]                  #prediction 
    col1 , col2 = st . columns ( 2 )
    for no , con in enumerate ( here . confidence ):              #display likelihood for each label
        if no % 2 == 0 :                                     # in two columns
            col1 . write ( f" { con . label } : { np . round ( con . confidence * 100 , 2 ) } %" )
            col1 . progress ( con . confidence )
        else :
            col2 . write ( f" { con . label } : { np . round ( con . confidence * 100 , 2 ) } %" )
            col2 . progress ( con . confidence )