GOT OCR2.0 다운로드 - GOT OCR2.0 소스코드 다운로드

GOT OCR2.0

기타 소스코드

다운로드

일반 OCR 이론: 통합 엔드투엔드 모델을 통해 OCR-2.0을 향하여

풀어 주다

[2024/11/4] 6개의 위챗 그룹입니다.
[2024/10/24] 이전 4개의 위챗 그룹이 꽉 차서 다섯 번째 그룹을 만들었습니다.
[2024/10/11] 위챗 그룹에 가입하고 싶은 친구가 너무 많아서 네 번째 그룹을 만들었습니다.
[2024/10/2] GOT-OCR2.0의 onnx 및 mnn 버전입니다.
[2024/9/29]??? 커뮤니티는 llama_cpp_inference의 첫 번째 버전을 구현했습니다.
[2024/9/24]??? 자신의 데이터에 대한 ms-swift 빠른 미세 조정을 지원합니다.
[2024/9/23]??? 공식 Modelscope 데모를 출시합니다. GPU 리소스를 제공하는 Modelscope에 진심으로 감사드립니다.
[2024/9/14]??? 공식 데모를 공개합니다. GPU 리소스를 제공하는 Huggingface에 진심으로 감사드립니다.
[2024/9/13]??? Huggingface 배포를 출시합니다.
[2024/9/03]??? 우리는 코드, 가중치 및 벤치마크를 오픈 소스로 제공합니다. 해당 논문은 이 저장소에서 찾을 수 있습니다. 우리는 이를 Arxiv에도 제출했습니다.
[2024/9/03]??? OCR-2.0 모델 GOT를 출시합니다!

지역사회 기여

우리는 모든 사람이 이 저장소를 기반으로 GOT 애플리케이션을 개발하도록 권장합니다. 다음과 같은 기여에 감사드립니다.

vllm 참조 ~ 기여자: @Jay

onnx와 mnn은 ~ 기여자: @BaofengZan을 지원합니다.

llama_cpp 추론 ~ 기여자: @1694439208

GOT Colab ~ 기여자: @Zizhe Wang

GOT의 CPU 버전 ~ 기여자: @ElvisClaros

온라인 데모 ~ 기여자: @Joseph Pollack

Dokcer 및 클라이언트 데모 ~ 기여자: @QIN2DIM

GOT의 GUI ~ 기여자: @XJF2332

내용물

설치하다
GOT 가중치
데모
기차
미세 조정
평가

통합된 엔드투엔드 모델을 통해 OCR-2.0을 향해

설치하다

우리 환경은 cuda11.8+torch2.0.1입니다.
이 저장소를 복제하고 GOT 폴더로 이동하십시오.

 git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.gitcd 'GOT 폴더'

패키지 설치

 conda create -n python=3.10 -y를 얻었습니다.
콘다 활성화있어
pip 설치 -e .

플래시 어텐션 설치

pip install ninja
pip install flash-attn --no-build-isolation

GOT 가중치

포옹하는 얼굴
구글 드라이브
바이두윤 코드: OCR2

데모

일반 텍스트 OCR:

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type ocr

텍스트 OCR 형식:

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type 형식

세분화된 OCR:

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type format/ocr --box [x1,y1,x2,y2]

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type format/ocr --color 빨간색/녹색/파란색

다중 자르기 OCR:

 python3 GOT/demo/run_ocr_2.0_crop.py --model-name /GOT_weights/ --image-file /an/image/file.png

다중 페이지 OCR(이미지 경로에 여러 .png 파일이 포함되어 있음):

 python3 GOT/demo/run_ocr_2.0_crop.py --model-name /GOT_weights/ --image-file /images/path/ --multi-page

형식화된 OCR 결과를 렌더링합니다.

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type format --render

참고 : 렌더링 결과는 /results/demo.html에서 확인할 수 있습니다. 결과를 보려면 데모.html을 열어보세요.

기차

기차 샘플은 여기에서 찾을 수 있습니다. '대화'-'인간'-'값'에 '<이미지>'가 필요하다는 점에 유의하세요!
이 코드베이스는 GOT 가중치에 대한 사후 훈련(2단계/3단계)만 지원합니다.
논문에 설명된 1단계부터 훈련하려면 이 저장소가 필요합니다.

 딥스피드 /GOT-OCR-2.0-master/GOT/train/train_GOT.py
  --deepspeed /GOT-OCR-2.0-master/zero_config/zero2.json --model_name_or_path /GOT_weights/
  --use_im_start_end 참
    --bf16 참
    --gradient_accumulation_steps 2
     --evaluation_strategy "아니요"
    --save_strategy "단계"
   --save_steps 200
    --save_total_limit 1
    --weight_decay 0.
     --warmup_ratio 0.001
      --lr_scheduler_type "코사인"
     --logging_steps 1
     --tf32 참
      --model_max_length 8192
     --gradient_checkpointing 참
    --dataloader_num_workers 8
     --report_to 없음
   --per_device_train_batch_size 2
     --num_train_epochs 1
   --learning_rate 2e-5
    --데이터 세트 pdf-ocr+scence
  --output_dir /귀하의/출력/경로

메모 :

Constant.py에서 해당 데이터 정보를 변경합니다.
Conversation_dataset_qwen.py의 37행을 data_name으로 변경합니다.

미세 조정

ms-swift를 사용한 빠른 미세 조정:

 자식 클론 https://github.com/modelscope/ms-swift.gitcd ms-swift
pip install -e .[llm]

 # 기본값: sft LLM 및 프로젝터, 고정 비전 인코더CUDA_VISIBLE_DEVICES=0 Swift sft
--model_type got-ocr2
 --model_id_or_path stepfun-ai/GOT-OCR2_0
 --sft_type 로라
 --dataset latex-ocr-print#5000# Deepspeed ZeRO2NPROC_PER_NODE=4
 CUDA_VISIBLE_DEVICES=0,1,2,3 신속한 sft
 --model_type got-ocr2
 --model_id_or_path stepfun-ai/GOT-OCR2_0
 --sft_type 로라
 --dataset latex-ocr-print#5000
 --deepspeed 기본값-zero2

귀하의 데이터 :

 --dataset train.jsonl
--val_dataset val.jsonl(선택 사항)

데이터 형식 :

 {"쿼리": "<이미지>55555", "응답": "66666", "이미지": ["image_path"]}
{"쿼리": "<이미지><이미지>eeeee", "응답": "fffff", "역사": [], "이미지": ["image_path1", "image_path2"]}
{"쿼리": "EEEEEE", "응답": "FFFFF", "역사": [["query1", "response1"], ["query2", "response2"]]}

자세한 내용은 ms-swift에서 확인할 수 있습니다.

평가

우리는 Fox 및 OneChart 벤치마크를 사용하며 다른 벤치마크는 가중치 다운로드 링크에서 찾을 수 있습니다.
평가 코드는 GOT/eval에서 찾을 수 있습니다.
Evaluation_GOT.py를 사용하여 평가를 실행할 수 있습니다. GPU가 8개인 경우 --num-chunks를 8로 설정할 수 있습니다.

 python3 GOT/eval/evaluate_GOT.py --model-name /GOT_weights/ --gtfile_path xxxx.json --image_path /image/path/ --out_path /data/eval_results/GOT_mathpix_test/ --num-chunks 8 --datatype OCR

연락하다

이 작업에 관심이 있거나 코드나 논문에 대해 질문이 있는 경우 커뮤니케이션 Wechat 그룹에 참여하세요.

참고 : 위챗 그룹 5개가 모두 꽉 찼습니다. 그룹 6에 참여하세요.

질문이 있으시면 주저하지 마시고 [email protected]으로 이메일을 보내주세요.

승인

Vary: 우리가 구축한 코드베이스!
Qwen : 영어와 중국어를 모두 잘하는 Vary의 LLM 베이스 모델!

소환

 @article{wei2024general, title={일반 OCR 이론: 통합 엔드투엔드 모델을 통한 OCR-2.0을 향하여}, 작성자={Wei, Haoran 및 Liu, Chenglong 및 Chen, Jinyue 및 Wang, Jia 및 Kong, Lingyu 및 Xu, Yanming 및 Ge, Zheng 및 Zhao, Liang 및 Sun, Jianjian 및 Peng, Yuang 및 기타}, 저널={arXiv preprint arXiv:2409.01704}, year={2024}}@article{wei2023vary, title={Vary: 대규모 비전 언어 모델을 위한 비전 어휘 확장}, 저자={Wei, Haoran 및 Kong, Lingyu 및 Chen, Jinyue 및 Zhao, Liang 및 Ge, Zheng 및 Yang, Jinrong 및 Sun, Jianjian 및 Han, Chunrui 및 Zhang, Xiangyu}, 저널={arXiv 사전 인쇄 arXiv:2312.06109}, 연도={2023}}

확장하다

추가 정보