tutorial cohere text classifier下載 - tutorial cohere text classifier教學原始碼下載

tutorial cohere text classifier

Ai源碼

1.0.0

下載

您的個人文字分類器 - Co:here 應用程式

自然語言處理是電腦科學和語言學的一個領域，涉及電腦和人類（自然）語言之間的互動。最簡單的形式是，NLP 是開發能夠自動理解和產生人類語言的演算法。 NLP 的長期目標是創建可用於執行各種任務的人類語言計算模型。這些任務包括自動翻譯、摘要、問答、資訊提取等等。 NLP研究是高度跨領域的，涉及語言學、認知科學、人工智慧和電腦科學等領域的研究人員。

自然語言處理中有許多不同的方法，包括基於規則的方法、統計方法和神經計算方法。基於規則的方法通常是基於由 NLP 專家編寫的手工規則。這些方法對於特定任務非常有效，但它們的範圍通常受到限制，並且需要大量的精力來維護。統計方法是基於使用大量資料來訓練計算模型。然後，這些模型可用於自動執行各種 NLP 任務。神經網路是一種機器學習演算法，特別適合 NLP 任務。神經網路已被用來為機器翻譯和分類等任務創建最先進的模型。

公司：這裡

Co：這是一個強大的神經網絡，它可以產生、嵌入和分類文字。在本教程中，我們將使用 Co:here 對描述進行分類。要使用 Co:here，您需要在 Co:here 建立帳戶並取得 API 金鑰。

我們將使用Python進行編程，因此我們需要透過pip安裝cohere庫

 pip install cohere

首先，我們必須實作cohere.Client 。 Client 的參數中應該是您之前產生的 API 金鑰以及版本2021-11-08 。我將創建類CoHere ，它將在接下來的步驟中有用。

 class CoHere :
    def __init__ ( self , api_key ):
        self . co = cohere . Client ( f' { api_key } ' , '2021-11-08' )
        self . examples = []

？數據集

每個神經網路的主要部分是一個資料集。在本教程中，我將使用包含 10 個類別的 1000 個描述的資料集。如果您想使用相同的，可以在這裡下載。

下載的資料集有10個資料夾，每個資料夾有100個files.txt以及描述。檔案名稱是描述標籤，例如sport_3.txt 。

在此領域中，任務是從文件中讀取描述和標籤並建立數據，其中包含描述和標籤作為一個數據樣本。 Cohere分類器需要樣本，其中每個樣本應設計為列表[description, label] 。

範例載入路徑

首先，我們需要載入所有資料來做到這一點。我們建立函數load_examples 。在此函數中，我們將使用三個外部函式庫：

os.path進入包含資料的資料夾。程式碼在 python 的file.py路徑中執行。這是一個內部庫，所以我們不需要安裝它。

numpy這個函式庫對於處理陣列很有用。在本教程中，我們將使用它來產生隨機數。您必須透過 pip pip install numpy安裝此程式庫。

glob幫助我們讀取所有檔案和資料夾名稱。這是一個外部函式庫，因此需要安裝 - pip install glob 。

下載的資料集應解壓縮到資料夾data中。透過os.path.join我們可以獲得資料夾的通用路徑。

 folders_path = os . path . join ( 'data' , '*' )

在 Windows 中，傳回等於data* 。

然後我們可以使用glob方法來取得所有資料夾的名稱。

 folders_name = glob ( folders_path )

folders_name是一個列表，其中包含資料夾的視窗路徑。在本教程中，這些是標籤的名稱。

[ 'data \ business' , 'data \ entertainment' , 'data \ food' , 'data \ graphics' , 'data \ historical' , 'data \ medical' , 'data \ politics' , 'data \ space' , 'data \ sport' , 'data \ technologie' ]

Co:here訓練資料集的大小不能超過 50 個範例，每個類別必須至少有 5 個範例。透過for循環我們可以獲得每個檔案的名稱。整個函數看起來像這樣：

 import os . path
from glob import glob
import numpy as np

def load_examples ():
    examples_path = []

    folders_path = os . path . join ( 'data' , '*' )
    folders_name = glob ( folders_path )

    for folder in folders_name :
        files_path = os . path . join ( folder , '*' )
        files_name = glob ( files_path )
        for i in range ( 50 // len ( folders_name )):
            random_example = np . random . randint ( 0 , len ( files_name ))
            examples_path . append ( files_name [ random_example ])
    return examples_path

最後一個循環隨機取得每個標籤的 5 個路徑並將它們附加到新清單examples_path 。

負載描述

現在，我們必須建立一個訓練集。為此，我們將使用load_examples()載入範例。每個路徑中都有一個類別的名稱，我們將使用它來建立範例。描述需要從文件中讀取， texts不能太長，因此在本教程中，長度將等於100 [descroption, class_name]因此，返回就是該列表。

 def examples ():
    texts = []
    examples_path = load_examples ()
    for path in examples_path :
        class_name = path . split ( os . sep )[ 1 ]
        with open ( path , 'r' , encoding = "utf8" ) as file :
            text = file . read ()[: 100 ]
            texts . append ([ text , class_name ])
    return texts

Co：這裡分類器

我們回到CoHere課堂。我們必須添加兩種方法 - 載入範例和對輸入進行分類。

第一個很簡單， co:here範例清單必須使用附加的cohere方法 - cohere.classify.Example建立。

 def list_of_examples ( self ):
        for e in examples ():
            self . examples . append ( Example ( text = e [ 0 ], label = e [ 1 ]))

第二種方法是從cohere來分類方法。此方法有多個參數，例如：

模型的model尺寸。

inputs要分類的資料列表。

包含範例的訓練集examples列表

所有這些您都可以在這裡找到。

在本教程中， cohere方法將作為CoHere類別的方法來實作。此方法的一個參數是要預測的描述清單。

 def classify ( self , inputs ):
        return self . co . classify (
            model = 'medium' ,
            inputs = inputs ,
            examples = self . examples
        ). classifications

傳回的是input 、輸入的prediction和confidence清單。 Confidence是每個類別的可能性清單。

 cohere . Classification {
        input :
        prediction : 
        confidence : []
}

`CoHere`課程

 import cohere
from loadExamples import examples
from cohere . classify import Example

class CoHere :
    def __init__ ( self , api_key ):
        self . co = cohere . Client ( f' { api_key } ' , '2021-11-08' )
        self . examples = []

    def list_of_examples ( self ):
        for e in examples ():
            self . examples . append ( Example ( text = e [ 0 ], label = e [ 1 ]))

    def classify ( self , inputs ):
        return self . co . classify (
            model = 'medium' ,
            taskDescription = '' ,
            outputIndicator = '' ,
            inputs = inputs ,
            examples = self . examples
        ). classifications

？ Web 應用程式 - Streamlit

為了創建一個應用程序，其中有一個文字輸入框和一個可能性顯示，我們將使用Stramlit 。這是一個簡單且非常有用的函式庫。

安裝

 pip install streamlit

我們需要兩個文字輸入用於co:here API 金鑰和用於預測的文字。

在streamlit的文檔中我們可以找到方法：

st.header()在我們的應用程式上建立標題

st.test_input()發送文字請求

st.button()建立按鈕

st.write()顯示 cohere 模型的結果。

st.progress()顯示進度條

st.column()分割應用程式

 st . header ( "Your personal text classifier - Co:here application" )

api_key = st . text_input ( "API Key:" , type = "password" )        #text box for API key 

description = [ st . text_input ( "Description:" )]               #text box for text to predict

cohere = CoHere ( api_key )                                    #initialization CoHere
cohere . list_of_examples ()                                   #loading training set 

if st . button ( "Classify" ):   
    here = cohere . classify ( description )[ 0 ]                  #prediction 
    col1 , col2 = st . columns ( 2 )
    for no , con in enumerate ( here . confidence ):              #display likelihood for each label
        if no % 2 == 0 :                                     # in two columns
            col1 . write ( f" { con . label } : { np . round ( con . confidence * 100 , 2 ) } %" )
            col1 . progress ( con . confidence )
        else :
            col2 . write ( f" { con . label } : { np . round ( con . confidence * 100 , 2 ) } %" )
            col2 . progress ( con . confidence )