ดาวน์โหลด autolabel - ดาวน์โหลดซอร์สโค้ด autolabel

autolabel

ซอร์สโค้ดอื่น ๆ

v0.0.16

ดาวน์โหลด

ความไม่ลงรอยกัน Twitter | เว็บไซต์ | เกณฑ์มาตรฐาน

⚡ติดตั้งอย่างรวดเร็ว

pip install refuel-autolabel

เอกสาร

https://docs.refuel.ai/

- autolabel คืออะไร

การเข้าถึงชุดข้อมูลขนาดใหญ่สะอาดและหลากหลายเป็นองค์ประกอบที่สำคัญสำหรับความพยายามในการเรียนรู้ของเครื่องเพื่อให้ประสบความสำเร็จ LLM ที่ล้ำสมัยอย่าง GPT-4 สามารถติดฉลากข้อมูลโดยอัตโนมัติด้วยความแม่นยำสูงและในราคาและเวลาเพียงเล็กน้อยเมื่อเทียบกับการติดฉลากด้วยตนเอง

AutoLabel เป็นไลบรารี Python ที่จะติดฉลากชุดข้อมูลข้อความที่สะอาดและเพิ่มคุณค่าให้กับรุ่นภาษาขนาดใหญ่ (LLM) ที่คุณเลือก

- (ใหม่!) โมเดลมาตรฐานบนเกณฑ์มาตรฐานของ Refuel

ตรวจสอบรายงานทางเทคนิคของเราเพื่อเรียนรู้เพิ่มเติมเกี่ยวกับประสิทธิภาพของ Refuelllm-V2 ตามมาตรฐานของเรา คุณสามารถจำลองเกณฑ์มาตรฐานด้วยตนเองได้โดยทำตามขั้นตอนด้านล่าง

 cd autolabel / benchmark
curl https : // autolabel - benchmarking . s3 . us - west - 2. amazonaws . com / data . zip - o data . zip
unzip data . zip
python benchmark . py - - model $ model - - base_dir benchmark - results
python results . py - - eval_dir benchmark - results
cat results . csv

คุณสามารถเปรียบเทียบโมเดลที่เกี่ยวข้องได้โดยแทนที่ $ รุ่นด้วยชื่อของโมเดลที่จำเป็นต้องได้รับการเปรียบเทียบ หากเป็นโมเดลโฮสต์ API เช่น gpt-3.5-turbo , gpt-4-1106-preview , claude-3-opus-20240229 , gemini-1.5-pro-preview-0409 หรือโมเดลที่รองรับอัตโนมัติอื่น ๆ เพียงแค่เขียนชื่อของรุ่น หากโมเดลที่เป็นมาตรฐานเป็นรุ่นที่รองรับ VLLM ให้ผ่านเส้นทางท้องถิ่นหรือเส้นทาง HuggingFace ที่สอดคล้องกับโมเดล สิ่งนี้จะเรียกใช้เกณฑ์มาตรฐานพร้อมกับพรอมต์ เดียวกัน สำหรับทุกรุ่น

results.csv จะมีแถวที่มีทุกรุ่นที่ได้รับการเปรียบเทียบเป็นแถว ดู benchmark/results.csv สำหรับตัวอย่าง

เริ่มต้น

AutoLabel จัดเตรียมกระบวนการ 3 ขั้นตอนอย่างง่ายสำหรับการติดฉลากข้อมูล:

ระบุแนวทางการติดฉลากและโมเดล LLM เพื่อใช้ในการกำหนดค่า JSON
แห้งเพื่อให้แน่ใจว่าพรอมต์สุดท้ายดูดี
เริ่มต้นการติดฉลากสำหรับชุดข้อมูลของคุณ!

ลองจินตนาการว่าเรากำลังสร้างแบบจำลอง ML เพื่อวิเคราะห์การวิเคราะห์ความรู้สึกของรีวิวภาพยนตร์ เรามีชุดข้อมูลของบทวิจารณ์ภาพยนตร์ที่เราต้องการติดป้ายก่อน สำหรับกรณีนี้นี่คือสิ่งที่ชุดข้อมูลตัวอย่างและการกำหนดค่าจะมีลักษณะ:

{
    "task_name" : "MovieSentimentReview" ,
    "task_type" : "classification" ,
    "model" : {
        "provider" : "openai" ,
        "name" : "gpt-3.5-turbo"
    },
    "dataset" : {
        "label_column" : "label" ,
        "delimiter" : ","
    },
    "prompt" : {
        "task_guidelines" : "You are an expert at analyzing the sentiment of movie reviews. Your job is to classify the provided movie review into one of the following labels: {labels}" ,
        "labels" : [
            "positive" ,
            "negative" ,
            "neutral"
        ],
        "few_shot_examples" : [
            {
                "example" : "I got a fairly uninspired stupid film about how human industry is bad for nature." ,
                "label" : "negative"
            },
            {
                "example" : "I loved this movie. I found it very heart warming to see Adam West, Burt Ward, Frank Gorshin, and Julie Newmar together again." ,
                "label" : "positive"
            },
            {
                "example" : "This movie will be played next week at the Chinese theater." ,
                "label" : "neutral"
            }
        ],
        "example_template" : "Input: {example} n Output: {label}"
    }
}

เริ่มต้นตัวแทนการติดฉลากและส่งผ่านการกำหนดค่า:

 from autolabel import LabelingAgent , AutolabelDataset

agent = LabelingAgent ( config = 'config.json' )

ดูตัวอย่างพรอมตัวอย่างที่จะถูกส่งไปยัง LLM:

 ds = AutolabelDataset ( 'dataset.csv' , config = config )
agent . plan ( ds )

งานพิมพ์นี้:

 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 0:00:00 0:00:00
┌──────────────────────────┬─────────┐
│ Total Estimated Cost     │ $0.538  │
│ Number of Examples       │ 200     │
│ Average cost per example │ 0.00269 │
└──────────────────────────┴─────────┘
─────────────────────────────────────────

Prompt Example:
You are an expert at analyzing the sentiment of movie reviews. Your job is to classify the provided movie review into one of the following labels: [positive, negative, neutral]

Some examples with their output answers are provided below:

Example: I got a fairly uninspired stupid film about how human industry is bad for nature.
Output:
negative

Example: I loved this movie. I found it very heart warming to see Adam West, Burt Ward, Frank Gorshin, and Julie Newmar together again.
Output:
positive

Example: This movie will be played next week at the Chinese theater.
Output:
neutral

Now I want you to label the following example:
Input: A rare exception to the rule that great literature makes disappointing films.
Output:

─────────────────────────────────────────────────────────────────────────────────────────

ในที่สุดเราสามารถเรียกใช้การติดฉลากบนชุดย่อยหรือชุดข้อมูลทั้งหมด:

 ds = agent . run ( ds )

DataFrame เอาต์พุตมีคอลัมน์ฉลาก:

 ds . df . head ()
                                                text  ... MovieSentimentReview_llm_label
0  I was very excited about seeing this film , ant ...  ...                       negative
1  Serum is about a crazy doctor that finds a ser ...  ...                       negative
4  I loved this movie . I knew it would be chocked ...  ...                       positive
...

คุณสมบัติ

ข้อมูลฉลากสำหรับงาน NLP เช่นการจำแนกประเภทการตอบคำถามและการจดจำเอนทิตีที่มีชื่อการจับคู่เอนทิตีและอื่น ๆ
ใช้ LLMS เชิงพาณิชย์หรือโอเพ่นซอร์สจากผู้ให้บริการเช่น OpenAI, มานุษยวิทยา, HuggingFace, Google และอีกมากมาย
สนับสนุนเทคนิค LLM ที่พิสูจน์แล้วเพื่อเพิ่มคุณภาพของฉลากเช่นการเรียนรู้แบบไม่กี่ครั้ง
การประเมินความมั่นใจและคำอธิบายนอกกรอบสำหรับฉลากเอาต์พุตทุกตัว
การแคชและการจัดการของรัฐเพื่อลดต้นทุนและเวลาในการทดลอง

การเข้าถึงการเติมน้ำมันโฮสต์ LLMS

Refuel ให้การเข้าถึง LLM แบบโอเพ่นซอร์สที่โฮสต์สำหรับการติดฉลากและสำหรับการประเมินความมั่นใจสิ่งนี้มีประโยชน์เพราะคุณสามารถปรับขีดความเชื่อมั่นสำหรับงานการติดฉลากของคุณจากนั้นกำหนดเส้นทางฉลากที่มั่นใจน้อยลงไปยังมนุษย์ในขณะที่คุณยังได้รับประโยชน์จากการติดฉลากอัตโนมัติสำหรับตัวอย่างที่มั่นใจ

ในการใช้ LLM ที่โฮสต์เติมน้ำมันคุณสามารถขอการเข้าถึงได้ที่นี่