ice score下載 - ice score原始碼下載

ice score

其他源碼

1.0.0

下載

ICE-Score：指導大型語言模型評估程式碼

2024 年 1 月- ICE-Score 已被 EACL 2024 接受？

例子
環境設定
資料夾說明
用法
引文
致謝

例子

環境設定

我們的實驗主要建立在 codegen-metrics 和 code-bert-score 儲存庫上。要複製所有實驗，請按照他們的說明設定環境。

若要執行llm-code-eval資料夾中的compute_results.ipynb和模組，請使用下列命令安裝所有相依性：

pip install -r requirements.txt

資料夾說明

data/包含論文中使用的所有已處理資料。
- data/conala/包含帶有所有自動評估結果的 CoNaLa 資料集。
- data/humaneval/包含所有自動評估結果的 HumanEval 資料集。
  - data/humaneval/humaneval_java_grade.json ：Java 分割
  - data/humaneval/humaneval_cpp_grade.json : C++ 分割
  - data/humaneval/humaneval_python_grade.json ：Python 拆分
  - data/humaneval/humaneval_js_grade.json ：JavaScript 拆分
experiment_source/包含收集所有自動評估結果的腳本。它們需要進行特定修改才能在您的電腦上運行。請注意，對於使用metrics_evaluation.metrics的任何腳本，您需要使用codegen-metrics中的metrics_evaluation資料夾中的實作。
llm_code_eval包含該專案的最小可行產品 (MVP) 的實作。您可以使用它來評估任何產生的程式碼片段。請參閱Use Large Language Models To Downstream Tasks Of Source Code以了解更多詳細資訊。

用法

我們為此專案實施了最小可行產品（MVP）。若要安裝該項目，請使用以下命令：

pip install -e .

您可以使用它來評估任何產生的程式碼片段，輸入為problem 、 output 、 task 、 aspect和model ，如下例所示：

 from llm_code_eval import evaluate

score = evaluate ( problem = "Given a list of integers, return the sum of all the integers." , 
                    output = "sum = 0 n for i in range(len(list)): n t sum += list[i] n return sum" , 
                    task = "code-gen" , aspect = "usefulness" , model = "gpt-3.5-turbo" )

print ( score )

如果您想使用參考程式碼進行評估，可以在以下範例中使用reference選項：

 from llm_code_eval import evaluate

score = evaluate ( problem = "Given a list of integers, return the sum of all the integers." , 
                output = "sum = 0 n for i in range(len(list)): n t sum += list[i] n return sum" , 
                reference = "sum = 0 n for i in range(len(list)): n t sum += list[i] n return sum" , 
                task = "code-gen" , aspect = "usefulness" , model = "gpt-3.5-turbo" )

print ( score )

您也可以在以下範例中使用cot=True選項來啟用零樣本思想鏈評估：

 from llm_code_eval import evaluate

score , eval_step = evaluate ( problem = "Given a list of integers, return the sum of all the integers." , 
                            output = "sum = 0 n for i in range(len(list)): n t sum += list[i] n return sum" , 
                            task = "code-gen" , aspect = "usefulness" , model = "gpt-3.5-turbo" , cot = True )

print ( score )
print ( eval_step )

引文

 @inproceedings{zhuo2024ice,
  title={ICE-Score: Instructing Large Language Models to Evaluate Code},
  author={Zhuo, Terry Yue},
  booktitle={Findings of the Association for Computational Linguistics: EACL 2024},
  pages={2232--2242},
  year={2024}
}