autolabel下載 - autolabel源代碼下載

autolabel

其他源碼

v0.0.16

下載

不和諧| Twitter |網站|基準

⚡快速安裝

pip install refuel-autolabel

文件

https://docs.refuel.ai/

？什麼是自動標籤

訪問大型，清潔和多樣的標籤數據集是任何機器學習工作成功的關鍵組成部分。與手動標記相比，像GPT-4這樣的最先進的LLMs能夠以高準確性和成本和時間的一小部分自動標記數據。

AutoLabel是一個使用您選擇的任何大語言模型（LLM）標記，清潔和豐富文本數據集的Python庫。

？（New！）Rupuel的基準測試模型

查看我們的技術報告，以了解有關在我們的基準測試中Refuelllm-V2的性能的更多信息。您可以通過遵循以下步驟來複製基準測試

 cd autolabel / benchmark
curl https : // autolabel - benchmarking . s3 . us - west - 2. amazonaws . com / data . zip - o data . zip
unzip data . zip
python benchmark . py - - model $ model - - base_dir benchmark - results
python results . py - - eval_dir benchmark - results
cat results . csv

您可以通過用基準測試的模型的名稱替換$模型來對相關模型進行基準測試。如果它是API託管的模型，例如gpt-3.5-turbo ， gpt-4-1106-preview ， claude-3-opus-20240229 ， gemini-1.5-pro-preview-0409或其他一些自動標籤支持模型，只需寫入模型的名稱即可。如果要進行基準測試的模型是支持VLLM的模型，請通過本地路徑或與模型相對應的擁抱面路徑。對於所有型號，這將運行基準以及相同的提示。

results.csv將包含一個行的行，每個模型都被標記為行。查看benchmark/results.csv中的示例。

入門

AutoLabel提供了一個簡單的三步過程來標記數據：

指定用於在JSON配置中使用的標籤指南和LLM模型。
乾式運行以確保最終提示看起來不錯。
啟動數據集的標籤運行！

讓我們想像我們正在建立一個ML模型來分析電影評論的情感分析。我們有一個電影評論的數據集，我們希望首先標記。對於這種情況，這是示例數據集和配置的樣子：

{
    "task_name" : "MovieSentimentReview" ,
    "task_type" : "classification" ,
    "model" : {
        "provider" : "openai" ,
        "name" : "gpt-3.5-turbo"
    },
    "dataset" : {
        "label_column" : "label" ,
        "delimiter" : ","
    },
    "prompt" : {
        "task_guidelines" : "You are an expert at analyzing the sentiment of movie reviews. Your job is to classify the provided movie review into one of the following labels: {labels}" ,
        "labels" : [
            "positive" ,
            "negative" ,
            "neutral"
        ],
        "few_shot_examples" : [
            {
                "example" : "I got a fairly uninspired stupid film about how human industry is bad for nature." ,
                "label" : "negative"
            },
            {
                "example" : "I loved this movie. I found it very heart warming to see Adam West, Burt Ward, Frank Gorshin, and Julie Newmar together again." ,
                "label" : "positive"
            },
            {
                "example" : "This movie will be played next week at the Chinese theater." ,
                "label" : "neutral"
            }
        ],
        "example_template" : "Input: {example} n Output: {label}"
    }
}

初始化標籤代理並將其傳遞給配置：

 from autolabel import LabelingAgent , AutolabelDataset

agent = LabelingAgent ( config = 'config.json' )

預覽將發送到LLM的示例提示：

 ds = AutolabelDataset ( 'dataset.csv' , config = config )
agent . plan ( ds )

這打印：

 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 0:00:00 0:00:00
┌──────────────────────────┬─────────┐
│ Total Estimated Cost     │ $0.538  │
│ Number of Examples       │ 200     │
│ Average cost per example │ 0.00269 │
└──────────────────────────┴─────────┘
─────────────────────────────────────────

Prompt Example:
You are an expert at analyzing the sentiment of movie reviews. Your job is to classify the provided movie review into one of the following labels: [positive, negative, neutral]

Some examples with their output answers are provided below:

Example: I got a fairly uninspired stupid film about how human industry is bad for nature.
Output:
negative

Example: I loved this movie. I found it very heart warming to see Adam West, Burt Ward, Frank Gorshin, and Julie Newmar together again.
Output:
positive

Example: This movie will be played next week at the Chinese theater.
Output:
neutral

Now I want you to label the following example:
Input: A rare exception to the rule that great literature makes disappointing films.
Output:

─────────────────────────────────────────────────────────────────────────────────────────

最後，我們可以在數據集的子集或整個數據集上運行標籤：

 ds = agent . run ( ds )

輸出數據幀包含標籤列：

 ds . df . head ()
                                                text  ... MovieSentimentReview_llm_label
0  I was very excited about seeing this film , ant ...  ...                       negative
1  Serum is about a crazy doctor that finds a ser ...  ...                       negative
4  I loved this movie . I knew it would be chocked ...  ...                       positive
...