autotrain advancedダウンロード - autotrain advancedソースコードのダウンロード

autotrain advanced

パイソン

1.0.0

ダウンロード

?オートトレーニングアドバンスト

AutoTrain Advanced: 最先端の機械学習モデルのトレーニングとデプロイメントをより迅速かつ簡単に実行します。 AutoTrain Advanced は、数回クリックするだけで機械学習モデルをトレーニングできるノーコードソリューションです。プロジェクトを作成するには、正しい形式でデータをアップロードする必要があることに注意してください。適切なデータ形式と価格設定については、ドキュメントを参照してください。

注: AutoTrain は無料です。 Hugging Face Spaces で AutoTrain を実行する場合にのみ、使用したリソースに対して料金が発生します。ローカルで実行する場合、お支払いいただくのは、独自のインフラストラクチャで使用したリソースの料金のみです。

サポートされているタスク

タスク	状態	Python ノートブック	設定例
LLM SFT 微調整	✅		llm_sft_finetune.yaml
LLM ORPO の微調整	✅		llm_orpo_finetune.yaml
LLM DPO の微調整	✅		llm_dpo_finetune.yaml
LLM報酬の微調整	✅		llm_reward_finetune.yaml
LLM 汎用/デフォルト微調整	✅		llm_generic_finetune.yaml
テキストの分類	✅		text_classification.yaml
テキスト回帰	✅		text_regression.yaml
トークンの分類	✅	近日公開	トークン分類.yaml
シーケンス 2 シーケンス	✅	近日公開	seq2seq.yaml
抽出的な質問への回答	✅	近日公開	抽出_qa.yaml
画像の分類	✅	近日公開	画像分類.yaml
画像のスコアリング/回帰	✅	近日公開	image_regression.yaml
VLM	?	近日公開	vlm.yaml

Colab で UI を実行するか、Face Spaces をハグする

ハグフェイススペースに AutoTrain を展開します。
ngrok 経由で Colab 上で AutoTrain UI を実行します。

ローカルインストール

PIP 経由で AutoTrain-Advanced Python パッケージをインストールできます。 AutoTrain Advanced が正しく動作するには、Python 3.10 以上が必要であることに注意してください。

 pip install autotrain-advanced

git lfs がインストールされていることを確認してください。ここで手順を確認してください: https://github.com/git-lfs/git-lfs/wiki/installation

torch、torchaudio、torchvision もインストールする必要があります。

autotrain を実行する最良の方法は、conda 環境で行うことです。次のコマンドを使用して、新しい conda 環境を作成できます。

 conda create -n autotrain python=3.10
conda activate autotrain
pip install autotrain-advanced
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-12.1.0" cuda-nvcc

完了したら、以下を使用してアプリケーションを開始できます。

 autotrain app --port 8080 --host 127.0.0.1

UI が好きではない場合は、AutoTrain Config を使用して、コマンドラインまたは単に AutoTrain CLI を使用してトレーニングできます。

トレーニングに構成ファイルを使用するには、次のコマンドを使用できます。

 autotrain --config <path_to_config_file>

サンプル構成ファイルは、このリポジトリのconfigsディレクトリにあります。

SmolLM2 を微調整するための設定ファイルの例:

 task : llm-sft
base_model : HuggingFaceTB/SmolLM2-1.7B-Instruct
project_name : autotrain-smollm2-finetune
log : tensorboard
backend : local

data :
  path : HuggingFaceH4/no_robots
  train_split : train
  valid_split : null
  chat_template : tokenizer
  column_mapping :
    text_column : messages

params :
  block_size : 2048
  model_max_length : 4096
  epochs : 2
  batch_size : 1
  lr : 1e-5
  peft : true
  quantization : int4
  target_modules : all-linear
  padding : right
  optimizer : paged_adamw_8bit
  scheduler : linear
  gradient_accumulation : 8
  mixed_precision : bf16
  merge_adapter : true

hub :
  username : ${HF_USERNAME}
  token : ${HF_TOKEN}
  push_to_hub : true

上記の構成ファイルを使用してモデルを微調整するには、次のコマンドを使用できます。

$ export HF_USERNAME= < your_hugging_face_username >
$ export HF_TOKEN= < your_hugging_face_write_token >
$ autotrain --config < path_to_config_file >

ドキュメント

ドキュメントは https://hf.co/docs/autotrain/ から入手できます。

引用

 @inproceedings{thakur-2024-autotrain,
    title = "{A}uto{T}rain: No-code training for state-of-the-art models",
    author = "Thakur, Abhishek",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.44",
    pages = "419--423",
    abstract = "With the advancements in open-source models, training(or finetuning) models on custom datasets has become a crucial part of developing solutions which are tailored to specific industrial or open-source applications. Yet, there is no single tool which simplifies the process of training across different types of modalities or tasks.We introduce AutoTrain(aka AutoTrain Advanced){---}an open-source, no code tool/library which can be used to train (or finetune) models for different kinds of tasks such as: large language model (LLM) finetuning, text classification/regression, token classification, sequence-to-sequence task, finetuning of sentence transformers, visual language model (VLM) finetuning, image classification/regression and even classification and regression tasks on tabular data. AutoTrain Advanced is an open-source library providing best practices for training models on custom datasets. The library is available at https://github.com/huggingface/autotrain-advanced. AutoTrain can be used in fully local mode or on cloud machines and works with tens of thousands of models shared on Hugging Face Hub and their variations.",
}

拡大する

追加情報