autolabel下载 - autolabel源代码下载

autolabel

其他源码

v0.0.16

下载

不和谐| Twitter |网站|基准

⚡快速安装

pip install refuel-autolabel

文档

https://docs.refuel.ai/

？什么是自动标签

访问大型，清洁和多样的标签数据集是任何机器学习工作成功的关键组成部分。与手动标记相比，像GPT-4这样的最先进的LLMs能够以高准确性和成本和时间的一小部分自动标记数据。

AutoLabel是一个使用您选择的任何大语言模型（LLM）标记，清洁和丰富文本数据集的Python库。

？（New！）Rupuel的基准测试模型

查看我们的技术报告，以了解有关在我们的基准测试中Refuelllm-V2的性能的更多信息。您可以通过遵循以下步骤来复制基准测试

 cd autolabel / benchmark
curl https : // autolabel - benchmarking . s3 . us - west - 2. amazonaws . com / data . zip - o data . zip
unzip data . zip
python benchmark . py - - model $ model - - base_dir benchmark - results
python results . py - - eval_dir benchmark - results
cat results . csv

您可以通过用基准测试的模型的名称替换$模型来对相关模型进行基准测试。如果它是API托管的模型，例如gpt-3.5-turbo ， gpt-4-1106-preview ， claude-3-opus-20240229 ， gemini-1.5-pro-preview-0409或其他一些自动标签支持模型，只需写入模型的名称即可。如果要进行基准测试的模型是支持VLLM的模型，请通过本地路径或与模型相对应的拥抱面路径。对于所有型号，这将运行基准以及相同的提示。

results.csv将包含一个行的行，每个模型都被标记为行。查看benchmark/results.csv中的示例。

入门

AutoLabel提供了一个简单的三步过程来标记数据：

指定用于在JSON配置中使用的标签指南和LLM模型。
干式运行以确保最终提示看起来不错。
启动数据集的标签运行！

让我们想象我们正在建立一个ML模型来分析电影评论的情感分析。我们有一个电影评论的数据集，我们希望首先标记。对于这种情况，这是示例数据集和配置的样子：

{
    "task_name" : "MovieSentimentReview" ,
    "task_type" : "classification" ,
    "model" : {
        "provider" : "openai" ,
        "name" : "gpt-3.5-turbo"
    },
    "dataset" : {
        "label_column" : "label" ,
        "delimiter" : ","
    },
    "prompt" : {
        "task_guidelines" : "You are an expert at analyzing the sentiment of movie reviews. Your job is to classify the provided movie review into one of the following labels: {labels}" ,
        "labels" : [
            "positive" ,
            "negative" ,
            "neutral"
        ],
        "few_shot_examples" : [
            {
                "example" : "I got a fairly uninspired stupid film about how human industry is bad for nature." ,
                "label" : "negative"
            },
            {
                "example" : "I loved this movie. I found it very heart warming to see Adam West, Burt Ward, Frank Gorshin, and Julie Newmar together again." ,
                "label" : "positive"
            },
            {
                "example" : "This movie will be played next week at the Chinese theater." ,
                "label" : "neutral"
            }
        ],
        "example_template" : "Input: {example} n Output: {label}"
    }
}

初始化标签代理并将其传递给配置：

 from autolabel import LabelingAgent , AutolabelDataset

agent = LabelingAgent ( config = 'config.json' )

预览将发送到LLM的示例提示：

 ds = AutolabelDataset ( 'dataset.csv' , config = config )
agent . plan ( ds )

这打印：

 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 0:00:00 0:00:00
┌──────────────────────────┬─────────┐
│ Total Estimated Cost     │ $0.538  │
│ Number of Examples       │ 200     │
│ Average cost per example │ 0.00269 │
└──────────────────────────┴─────────┘
─────────────────────────────────────────

Prompt Example:
You are an expert at analyzing the sentiment of movie reviews. Your job is to classify the provided movie review into one of the following labels: [positive, negative, neutral]

Some examples with their output answers are provided below:

Example: I got a fairly uninspired stupid film about how human industry is bad for nature.
Output:
negative

Example: I loved this movie. I found it very heart warming to see Adam West, Burt Ward, Frank Gorshin, and Julie Newmar together again.
Output:
positive

Example: This movie will be played next week at the Chinese theater.
Output:
neutral

Now I want you to label the following example:
Input: A rare exception to the rule that great literature makes disappointing films.
Output:

─────────────────────────────────────────────────────────────────────────────────────────

最后，我们可以在数据集的子集或整个数据集上运行标签：

 ds = agent . run ( ds )

输出数据帧包含标签列：

 ds . df . head ()
                                                text  ... MovieSentimentReview_llm_label
0  I was very excited about seeing this film , ant ...  ...                       negative
1  Serum is about a crazy doctor that finds a ser ...  ...                       negative
4  I loved this movie . I knew it would be chocked ...  ...                       positive
...

特征

标签NLP任务的标签数据，例如分类，提问和命名实体识别，实体匹配等。
使用OpenAI，Anthropic，Huggingface，Google等提供商的商业或开源LLMS。
支持研究知识的LLM技术以提高标签质量，例如少量学习和经过思考的提示。
每个输出标签的置信度估计和解释开箱即用
缓存和国家管理以最大程度地减少成本和实验时间

访问加油托管的LLM

Cupuel提供对托管开源LLM的标签访问权限，并且为了估算信心这是有帮助的，因为您可以校准标签任务的信心阈值，然后将较少自信的标签路由到人类中，而您仍然可以获得自动标签的好处，以获得自信的示例。

为了使用加油托管的LLM，您可以在此处请求访问。

路线图

查看我们的公共路线图，以了解有关AutoLabel图书馆正在进行的和计划改进的更多信息。

我们一直在寻找社区的建议和贡献。加入有关Discord的讨论或打开GitHub问题以报告错误和请求功能。

？贡献

Autolabel是一个快速发展的项目。我们欢迎各种形式的贡献 - 错误报告，提取请求和改进图书馆的想法。

加入不和谐的对话
在GitHub上打开一个问题，以获取错误和请求功能。
抓住一个公开问题，并提交拉动请求。

展开

附加信息

版本 v0.0.16
类型其他源码
更新时间 2025-03-03
大小 2MB
来自于 Github

autolabel

不和谐| Twitter |网站|基准

⚡快速安装

文档

？什么是自动标签

？（New！）Rupuel的基准测试模型

入门

特征

访问加油托管的LLM

路线图

？贡献

waymo open dataset

Sunamu

chat.petals.dev

SmartTube

MySchedule.py

viptools for eslam

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

Sunamu

chat.petals.dev

waymo open dataset

termwind

wp functions

autolabel

不和谐| Twitter |网站|基准

⚡快速安装

文档

？什么是自动标签

？ （New！）Rupuel的基准测试模型

入门

特征

访问加油托管的LLM

路线图

？贡献

？（New！）Rupuel的基准测试模型