AutoTrain Advanced:更快、更輕鬆地訓練和部署最先進的機器學習模型。 AutoTrain Advanced 是一種無程式碼解決方案,只需點擊幾下即可訓練機器學習模型。請注意,您必須以正確的格式上傳資料才能建立專案。如需有關正確資料格式和定價的協助,請查看文件。
注意:AutoTrain 是免費的!如果您決定在 Hugging Face Spaces 上執行 AutoTrain,則只需為使用的資源付費。在本地運行時,您只需為在自己的基礎架構上使用的資源付費。
任務 | 地位 | Python筆記本 | 配置範例 |
---|---|---|---|
LLM SFT微調 | ✅ | llm_sft_finetune.yaml | |
LLM ORPO微調 | ✅ | llm_orpo_finetune.yaml | |
LLM DPO 微調 | ✅ | llm_dpo_finetune.yaml | |
LLM獎勵微調 | ✅ | llm_reward_finetune.yaml | |
LLM 通用/預設微調 | ✅ | llm_generic_finetune.yaml | |
文字分類 | ✅ | 文本分類.yaml | |
文字迴歸 | ✅ | text_regression.yaml | |
代幣分類 | ✅ | 即將推出 | token_classification.yaml |
序列到序列 | ✅ | 即將推出 | seq2seq.yaml |
擷取式問答 | ✅ | 即將推出 | extractive_qa.yaml |
影像分類 | ✅ | 即將推出 | image_classification.yaml |
圖像評分/迴歸 | ✅ | 即將推出 | image_regression.yaml |
視覺語言管理 | ? | 即將推出 | vlm.yaml |
在擁抱臉部空間上部署 AutoTrain:
透過 ngrok 在 Colab 上運行 AutoTrain UI:
您可以透過 PIP 安裝 AutoTrain-Advanced python 套件。請注意,您需要 python >= 3.10 才能使 AutoTrain Advanced 正常運作。
pip install autotrain-advanced
請確保您已安裝 git lfs。請參閱此處的說明:https://github.com/git-lfs/git-lfs/wiki/Installation
您還需要安裝 torch、torchaudio 和 torchvision。
運行 autotrain 的最佳方法是在 conda 環境中。您可以使用以下命令建立新的 conda 環境:
conda create -n autotrain python=3.10
conda activate autotrain
pip install autotrain-advanced
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-12.1.0" cuda-nvcc
完成後,您可以使用以下命令啟動應用程式:
autotrain app --port 8080 --host 127.0.0.1
如果您不喜歡 UI,您可以使用 AutoTrain Configs 使用命令列或簡單的 AutoTrain CLI 進行訓練。
要使用設定檔進行訓練,可以使用以下命令:
autotrain --config <path_to_config_file>
您可以在此儲存庫的configs
目錄中找到範例設定檔。
用於微調 SmolLM2 的範例設定檔:
task : llm-sft
base_model : HuggingFaceTB/SmolLM2-1.7B-Instruct
project_name : autotrain-smollm2-finetune
log : tensorboard
backend : local
data :
path : HuggingFaceH4/no_robots
train_split : train
valid_split : null
chat_template : tokenizer
column_mapping :
text_column : messages
params :
block_size : 2048
model_max_length : 4096
epochs : 2
batch_size : 1
lr : 1e-5
peft : true
quantization : int4
target_modules : all-linear
padding : right
optimizer : paged_adamw_8bit
scheduler : linear
gradient_accumulation : 8
mixed_precision : bf16
merge_adapter : true
hub :
username : ${HF_USERNAME}
token : ${HF_TOKEN}
push_to_hub : true
要使用上面的設定檔微調模型,您可以使用以下命令:
$ export HF_USERNAME= < your_hugging_face_username >
$ export HF_TOKEN= < your_hugging_face_write_token >
$ autotrain --config < path_to_config_file >
文件可在 https://hf.co/docs/autotrain/ 取得
@inproceedings{thakur-2024-autotrain,
title = "{A}uto{T}rain: No-code training for state-of-the-art models",
author = "Thakur, Abhishek",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-demo.44",
pages = "419--423",
abstract = "With the advancements in open-source models, training(or finetuning) models on custom datasets has become a crucial part of developing solutions which are tailored to specific industrial or open-source applications. Yet, there is no single tool which simplifies the process of training across different types of modalities or tasks.We introduce AutoTrain(aka AutoTrain Advanced){---}an open-source, no code tool/library which can be used to train (or finetune) models for different kinds of tasks such as: large language model (LLM) finetuning, text classification/regression, token classification, sequence-to-sequence task, finetuning of sentence transformers, visual language model (VLM) finetuning, image classification/regression and even classification and regression tasks on tabular data. AutoTrain Advanced is an open-source library providing best practices for training models on custom datasets. The library is available at https://github.com/huggingface/autotrain-advanced. AutoTrain can be used in fully local mode or on cloud machines and works with tens of thousands of models shared on Hugging Face Hub and their variations.",
}