autotrain advanced 다운로드 - autotrain advanced 소스 코드 다운로드

autotrain advanced

파이썬

1.0.0

다운로드

? AutoTrain 고급

AutoTrain Advanced: 최첨단 기계 학습 모델을 더 빠르고 쉽게 교육하고 배포합니다. AutoTrain Advanced는 단 몇 번의 클릭만으로 기계 학습 모델을 훈련할 수 있는 노코드 솔루션입니다. 프로젝트를 생성하려면 올바른 형식으로 데이터를 업로드해야 합니다. 적절한 데이터 형식 및 가격에 대한 도움말은 설명서를 확인하세요.

참고: AutoTrain은 무료입니다! Hugging Face Spaces에서 AutoTrain을 실행하기로 결정한 경우 사용하는 리소스에 대해서만 비용을 지불하면 됩니다. 로컬로 실행하는 경우 자체 인프라에서 사용한 리소스에 대해서만 비용을 지불합니다.

지원되는 작업

일	상태	파이썬 노트북	예시 구성
LLM SFT 미세 조정	✅		llm_sft_finetune.yaml
LLM ORPO 미세 조정	✅		llm_orpo_finetune.yaml
LLM DPO 미세 조정	✅		llm_dpo_finetune.yaml
LLM 보상 미세 조정	✅		llm_reward_finetune.yaml
LLM 일반/기본 미세 조정	✅		llm_generic_finetune.yaml
텍스트 분류	✅		text_classification.yaml
텍스트 회귀	✅		text_regression.yaml
토큰 분류	✅	출시 예정	token_classification.yaml
Seq2Seq	✅	출시 예정	seq2seq.yaml
추출적 질문 답변	✅	출시 예정	extractive_qa.yaml
이미지 분류	✅	출시 예정	image_classification.yaml
이미지 점수/회귀	✅	출시 예정	image_regression.yaml
VLM	?	출시 예정	vlm.yaml

Colab 또는 Hugging Face Spaces에서 UI 실행

포옹하는 얼굴 공간에 AutoTrain을 배포합니다.
ngrok를 통해 Colab에서 AutoTrain UI를 실행합니다.

로컬 설치

PIP를 통해 AutoTrain-Advanced Python 패키지를 설치할 수 있습니다. AutoTrain Advanced가 제대로 작동하려면 Python >= 3.10이 필요합니다.

 pip install autotrain-advanced

git lfs가 설치되어 있는지 확인하세요. 여기에서 지침을 확인하세요: https://github.com/git-lfs/git-lfs/wiki/Installation

또한 torch, torchaudio 및 torchvision을 설치해야 합니다.

Autotrain을 실행하는 가장 좋은 방법은 conda 환경입니다. 다음 명령을 사용하여 새 Conda 환경을 만들 수 있습니다.

 conda create -n autotrain python=3.10
conda activate autotrain
pip install autotrain-advanced
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-12.1.0" cuda-nvcc

완료되면 다음을 사용하여 애플리케이션을 시작할 수 있습니다.

 autotrain app --port 8080 --host 127.0.0.1

UI가 마음에 들지 않으면 AutoTrain 구성을 사용하여 명령줄을 사용하여 훈련하거나 간단히 AutoTrain CLI를 사용할 수 있습니다.

학습에 구성 파일을 사용하려면 다음 명령을 사용할 수 있습니다.

 autotrain --config <path_to_config_file>

이 저장소의 configs 디렉터리에서 샘플 구성 파일을 찾을 수 있습니다.

SmolLM2 미세 조정을 위한 구성 파일 예시:

 task : llm-sft
base_model : HuggingFaceTB/SmolLM2-1.7B-Instruct
project_name : autotrain-smollm2-finetune
log : tensorboard
backend : local

data :
  path : HuggingFaceH4/no_robots
  train_split : train
  valid_split : null
  chat_template : tokenizer
  column_mapping :
    text_column : messages

params :
  block_size : 2048
  model_max_length : 4096
  epochs : 2
  batch_size : 1
  lr : 1e-5
  peft : true
  quantization : int4
  target_modules : all-linear
  padding : right
  optimizer : paged_adamw_8bit
  scheduler : linear
  gradient_accumulation : 8
  mixed_precision : bf16
  merge_adapter : true

hub :
  username : ${HF_USERNAME}
  token : ${HF_TOKEN}
  push_to_hub : true

위의 구성 파일을 사용하여 모델을 미세 조정하려면 다음 명령을 사용할 수 있습니다.

$ export HF_USERNAME= < your_hugging_face_username >
$ export HF_TOKEN= < your_hugging_face_write_token >
$ autotrain --config < path_to_config_file >

선적 서류 비치

문서는 https://hf.co/docs/autotrain/에서 확인할 수 있습니다.

소환

 @inproceedings{thakur-2024-autotrain,
    title = "{A}uto{T}rain: No-code training for state-of-the-art models",
    author = "Thakur, Abhishek",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.44",
    pages = "419--423",
    abstract = "With the advancements in open-source models, training(or finetuning) models on custom datasets has become a crucial part of developing solutions which are tailored to specific industrial or open-source applications. Yet, there is no single tool which simplifies the process of training across different types of modalities or tasks.We introduce AutoTrain(aka AutoTrain Advanced){---}an open-source, no code tool/library which can be used to train (or finetune) models for different kinds of tasks such as: large language model (LLM) finetuning, text classification/regression, token classification, sequence-to-sequence task, finetuning of sentence transformers, visual language model (VLM) finetuning, image classification/regression and even classification and regression tasks on tabular data. AutoTrain Advanced is an open-source library providing best practices for training models on custom datasets. The library is available at https://github.com/huggingface/autotrain-advanced. AutoTrain can be used in fully local mode or on cloud machines and works with tens of thousands of models shared on Hugging Face Hub and their variations.",
}

확장하다

추가 정보