autolabelダウンロード - autolabelソースコードのダウンロード

autolabel

その他のソースコード

v0.0.16

ダウンロード

不和| Twitter |ウェブサイト|ベンチマーク

actionクイックインストール

pip install refuel-autolabel

ドキュメント

https://docs.refuel.ai/

？オートラベルとは何ですか

大規模で清潔で多様なラベル付けされたデータセットへのアクセスは、機械学習の取り組みが成功するための重要なコンポーネントです。 GPT-4のような最先端のLLMは、手動ラベルと比較して、高精度で、および数分の1のコストと時間で自動的にデータにラベルを付けることができます。

Autolabelは、選択した大規模な言語モデル（LLM）を使用してテキストデータセットにラベルを付け、クリーニングし、充実させるPythonライブラリです。

？（新規！）Refuelのベンチマークに関するベンチマークモデル

テクニカルレポートをご覧ください。ベンチマークでRefuelllm-V2のパフォーマンスの詳細をご覧ください。以下の手順に従ってベンチマークを自分で複製できます

 cd autolabel / benchmark
curl https : // autolabel - benchmarking . s3 . us - west - 2. amazonaws . com / data . zip - o data . zip
unzip data . zip
python benchmark . py - - model $ model - - base_dir benchmark - results
python results . py - - eval_dir benchmark - results
cat results . csv

$モデルをベンチマークする必要があるモデルの名前に置き換えることにより、関連するモデルをベンチマークできます。 gpt-3.5-turbo 、 gpt-4-1106-preview 、 claude-3-opus-20240229 、 gemini-1.5-pro-preview-0409またはその他のオートラベルサポートモデルなどのAPIホストモデルである場合、モデルの名前を書くだけです。ベンチマークされるモデルがVLLMサポートされたモデルである場合、モデルに対応するローカルパスまたはハギングフェイスパスを渡します。これにより、すべてのモデルの同じプロンプトとともにベンチマークが実行されます。

results.csvは、列としてベンチマークされたすべてのモデルを備えた行が含まれます。例については、 benchmark/results.csvをご覧ください。

はじめる

Autolabelは、データをラベル付けするための単純な3段階のプロセスを提供します。

JSON構成で使用するラベルガイドラインとLLMモデルを指定します。
最終プロンプトが良く見えることを確認するためにドライラン。
データセットのラベリングランを開始してください！

映画のレビューの感情分析を分析するために、MLモデルを構築していると想像しましょう。最初にラベルを付けたい映画レビューのデータセットがあります。この場合、データセットと構成の例が次のようになります。

{
    "task_name" : "MovieSentimentReview" ,
    "task_type" : "classification" ,
    "model" : {
        "provider" : "openai" ,
        "name" : "gpt-3.5-turbo"
    },
    "dataset" : {
        "label_column" : "label" ,
        "delimiter" : ","
    },
    "prompt" : {
        "task_guidelines" : "You are an expert at analyzing the sentiment of movie reviews. Your job is to classify the provided movie review into one of the following labels: {labels}" ,
        "labels" : [
            "positive" ,
            "negative" ,
            "neutral"
        ],
        "few_shot_examples" : [
            {
                "example" : "I got a fairly uninspired stupid film about how human industry is bad for nature." ,
                "label" : "negative"
            },
            {
                "example" : "I loved this movie. I found it very heart warming to see Adam West, Burt Ward, Frank Gorshin, and Julie Newmar together again." ,
                "label" : "positive"
            },
            {
                "example" : "This movie will be played next week at the Chinese theater." ,
                "label" : "neutral"
            }
        ],
        "example_template" : "Input: {example} n Output: {label}"
    }
}

ラベリングエージェントを初期化し、構成を渡します。

 from autolabel import LabelingAgent , AutolabelDataset

agent = LabelingAgent ( config = 'config.json' )

LLMに送信される例のプロンプトをプレビューします。

 ds = AutolabelDataset ( 'dataset.csv' , config = config )
agent . plan ( ds )

これは印刷：

 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 0:00:00 0:00:00
┌──────────────────────────┬─────────┐
│ Total Estimated Cost     │ $0.538  │
│ Number of Examples       │ 200     │
│ Average cost per example │ 0.00269 │
└──────────────────────────┴─────────┘
─────────────────────────────────────────

Prompt Example:
You are an expert at analyzing the sentiment of movie reviews. Your job is to classify the provided movie review into one of the following labels: [positive, negative, neutral]

Some examples with their output answers are provided below:

Example: I got a fairly uninspired stupid film about how human industry is bad for nature.
Output:
negative

Example: I loved this movie. I found it very heart warming to see Adam West, Burt Ward, Frank Gorshin, and Julie Newmar together again.
Output:
positive

Example: This movie will be played next week at the Chinese theater.
Output:
neutral

Now I want you to label the following example:
Input: A rare exception to the rule that great literature makes disappointing films.
Output:

─────────────────────────────────────────────────────────────────────────────────────────

最後に、サブセットまたはデータセット全体でラベルを実行できます。

 ds = agent . run ( ds )

出力データフレームには、ラベル列が含まれています。

 ds . df . head ()
                                                text  ... MovieSentimentReview_llm_label
0  I was very excited about seeing this film , ant ...  ...                       negative
1  Serum is about a crazy doctor that finds a ser ...  ...                       negative
4  I loved this movie . I knew it would be chocked ...  ...                       positive
...

特徴

分類、質問回答、名前付きエンティティ認識、エンティティマッチングなどのNLPタスクのラベルデータ。
Openai、Anthropic、Huggingface、Googleなどのプロバイダーからの商用またはオープンソースのLLMを使用してください。
少数のショット学習や考え方のプロンプトなど、ラベルの品質を高めるための研究が実証されたLLMテクニックのサポート。
単一の出力ラベルごとに、箱から出した信頼性の推定と説明
コストと実験時間を最小限に抑えるためのキャッシュと州の管理

給油ホストLLMSへのアクセス

Refuelは、ラベル付けのためにホストされたオープンソースLLMSへのアクセスを提供します。自信を推定するために、これは役立ちます。ラベル付けタスクの信頼性のしきい値を調整し、自信のないラベルを人間にルーティングできますが、自信のある例の自動labeの利点を得ることができます。

RefuelホストのLLMSを使用するには、こちらからアクセスをリクエストできます。

ロードマップ

公共のロードマップをご覧になり、Autolabel Libraryの継続的および計画的な改善の詳細をご覧ください。

私たちは常にコミュニティからの提案と貢献を探しています。 Discordに関するディスカッションに参加するか、GitHubの問題を開き、バグとリクエスト機能を報告します。

？貢献

Autolabelは急速に発展しているプロジェクトです。バグレポート、プルリクエスト、ライブラリを改善するためのアイデアなど、あらゆる形態の貢献を歓迎します。

Discordの会話に参加してください
バグとリクエスト機能については、GitHubで問題を開きます。
オープンな問題をつかみ、プルリクエストを送信します。

拡大する

追加情報

バージョン v0.0.16
タイプその他のソースコード
更新時間 2025-03-03
サイズ 2MB
から Github

autolabel

不和| Twitter |ウェブサイト|ベンチマーク

actionクイックインストール

ドキュメント

？オートラベルとは何ですか

？（新規！）Refuelのベンチマークに関するベンチマークモデル

はじめる

特徴

給油ホストLLMSへのアクセス

ロードマップ

？貢献

waymo open dataset

Sunamu

chat.petals.dev

SmartTube

MySchedule.py

viptools for eslam

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

Sunamu

chat.petals.dev

waymo open dataset

termwind

wp functions

autolabel

不和| Twitter |ウェブサイト|ベンチマーク

actionクイックインストール

ドキュメント

？オートラベルとは何ですか

？ （新規！）Refuelのベンチマークに関するベンチマークモデル

はじめる

特徴

給油ホストLLMSへのアクセス

ロードマップ

？貢献

？（新規！）Refuelのベンチマークに関するベンチマークモデル