ดาวน์โหลด ice score - ice score ดาวน์โหลดซอร์สโค้ด

ice score

ซอร์สโค้ดอื่น ๆ

1.0.0

ดาวน์โหลด

ICE-Score: การสอนโมเดลภาษาขนาดใหญ่เพื่อประเมินโค้ด

มกราคม 2024 - ICE-Score ได้รับการยอมรับใน EACL 2024 ???

ตัวอย่าง
การตั้งค่าสภาพแวดล้อม
คำอธิบายโฟลเดอร์
การใช้งาน
การอ้างอิง
รับทราบ

ตัวอย่าง

การตั้งค่าสภาพแวดล้อม

การทดลองของเราส่วนใหญ่สร้างขึ้นบนที่เก็บ codegen-metrics และ code-bert-score หากต้องการจำลองการทดลองทั้งหมด โปรดทำตามคำแนะนำเพื่อตั้งค่าสภาพแวดล้อม

หากต้องการรัน compute_results.ipynb และโมดูลในโฟลเดอร์ llm-code-eval ให้ใช้คำสั่งต่อไปนี้เพื่อติดตั้งการอ้างอิงทั้งหมด:

pip install -r requirements.txt

คำอธิบายโฟลเดอร์

data/ ประกอบด้วยข้อมูลที่ประมวลผลทั้งหมดที่ใช้ในกระดาษ
- data/conala/ มีชุดข้อมูล CoNaLa พร้อมผลการประเมินอัตโนมัติทั้งหมด
- data/humaneval/ ประกอบด้วยชุดข้อมูล HumanEval พร้อมผลการประเมินอัตโนมัติทั้งหมด
  - data/humaneval/humaneval_java_grade.json : แยก Java
  - data/humaneval/humaneval_cpp_grade.json : แยก C++
  - data/humaneval/humaneval_python_grade.json : แยก Python
  - data/humaneval/humaneval_js_grade.json : แยก JavaScript
experiment_source/ มีสคริปต์เพื่อรวบรวมผลการประเมินอัตโนมัติทั้งหมด จำเป็นต้องมีการแก้ไขเฉพาะเพื่อให้ทำงานบนเครื่องของคุณได้ โปรดทราบว่าสำหรับสคริปต์ใดๆ เหล่านี้ที่ใช้ metrics_evaluation.metrics คุณต้องใช้งานในโฟลเดอร์ metrics_evaluation จาก codegen-metrics
llm_code_eval มีการใช้งานผลิตภัณฑ์ขั้นต่ำ (MVP) ของโปรเจ็กต์นี้ คุณสามารถใช้มันเพื่อประเมินข้อมูลโค้ดที่สร้างขึ้นได้ โปรดดู Use Large Language Models To Downstream Tasks Of Source Code สำหรับรายละเอียดเพิ่มเติม

การใช้งาน

เราใช้ผลิตภัณฑ์ขั้นต่ำที่เป็นไปได้ (MVP) สำหรับโครงการนี้ ในการติดตั้งโปรเจ็กต์ โปรดใช้คำสั่งต่อไปนี้:

pip install -e .

คุณสามารถใช้มันเพื่อประเมินข้อมูลโค้ดที่สร้างขึ้นใดๆ โดยมีอินพุตของ problem , output , task , aspect และ model ดังเช่นตัวอย่างต่อไปนี้:

 from llm_code_eval import evaluate

score = evaluate ( problem = "Given a list of integers, return the sum of all the integers." , 
                    output = "sum = 0 n for i in range(len(list)): n t sum += list[i] n return sum" , 
                    task = "code-gen" , aspect = "usefulness" , model = "gpt-3.5-turbo" )

print ( score )

หากคุณต้องการประเมินด้วยโค้ดอ้างอิง คุณสามารถใช้ตัวเลือก reference ในตัวอย่างต่อไปนี้:

 from llm_code_eval import evaluate

score = evaluate ( problem = "Given a list of integers, return the sum of all the integers." , 
                output = "sum = 0 n for i in range(len(list)): n t sum += list[i] n return sum" , 
                reference = "sum = 0 n for i in range(len(list)): n t sum += list[i] n return sum" , 
                task = "code-gen" , aspect = "usefulness" , model = "gpt-3.5-turbo" )

print ( score )

คุณยังสามารถใช้ตัวเลือก cot=True เพื่อเปิดใช้งานการประเมินแบบ Zero-Shot Chain-of-Thread ในตัวอย่างต่อไปนี้:

 from llm_code_eval import evaluate

score , eval_step = evaluate ( problem = "Given a list of integers, return the sum of all the integers." , 
                            output = "sum = 0 n for i in range(len(list)): n t sum += list[i] n return sum" , 
                            task = "code-gen" , aspect = "usefulness" , model = "gpt-3.5-turbo" , cot = True )

print ( score )
print ( eval_step )

การอ้างอิง

 @inproceedings{zhuo2024ice,
  title={ICE-Score: Instructing Large Language Models to Evaluate Code},
  author={Zhuo, Terry Yue},
  booktitle={Findings of the Association for Computational Linguistics: EACL 2024},
  pages={2232--2242},
  year={2024}
}