GOT OCR2.0下載 - GOT OCR2.0原始碼下載

GOT OCR2.0

其他源碼

下載

一般 OCR 理論：透過統一的端對端模型邁向 OCR-2.0

發布

[2024/11/4] 六個微信群。
[2024/10/24] 之前的四個微信群已經滿了，所以我們創建了第五個群。
[2024/10/11] 太多朋友想加入微信群，所以我們創造了第四個群。
[2024/10/2] GOT-OCR2.0的onnx和mnn版本。
[2024/9/29]？社群已經實現了 llama_cpp_inference 的第一個版本。
[2024/9/24]？支援ms-swift快速對自己的資料進行Fine-tune。
[2024/9/23]？我們發布了官方 Modelscope 演示。非常感謝Modelscope提供GPU資源。
[2024/9/14]？我們發布了官方演示。非常感謝 Huggingface 提供的 GPU 資源。
[2024/9/13]？我們發布了 Huggingface 部署。
[2024/9/03]？我們開源程式碼、權重和基準。該論文可以在這個 repo 中找到。我們也已將其提交給 Arxiv。
[2024/9/03]？我們發布了 OCR-2.0 模型 GOT！

社區貢獻

我們鼓勵大家基於這個倉庫開發GOT應用程式。感謝以下貢獻：

vllm 參考~貢獻者：@Jay

onnx 和 mnn 支持 ~ 貢獻者：@BaofengZan

llama_cpp 推理 ~ 貢獻者：@1694439208

GOT的Colab~貢獻者：@Zizhe Wang

CPU版本的GOT~貢獻者：@ElvisClaros

線上示範 ~ 貢獻者：@Joseph Pollack

Dokcer 與客戶端示範 ~ 貢獻者：@QIN2DIM

GOT的GUI〜貢獻者：@XJF2332

內容

安裝
得到的重量
示範
火車
微調
評估

透過統一的端對端模型邁向 OCR-2.0

安裝

我們的環境是cuda11.8+torch2.0.1
克隆此儲存庫並導航至 GOT 資料夾

git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.gitcd 'GOT資料夾'

安裝包

conda create -n 得到 python=3.10 -y
conda 激活 得到
pip install -e 。

安裝Flash-注意

pip install ninja
pip install flash-attn --no-build-isolation

得到的重量

抱臉
Google雲端硬碟
百度雲代碼：OCR2

示範

純文字 OCR：

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type ocr

格式化文字 OCR：

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type 格式

細粒度 OCR：

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type format/ocr --box [x1,y1,x2,y2]

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type format/ocr --color 紅/綠/藍

多作物 OCR：

 python3 GOT/demo/run_ocr_2.0_crop.py --model-name /GOT_weights/ --image-file /an/image/file.png

多頁OCR（影像路徑包含多個.png檔）：

 python3 GOT/demo/run_ocr_2.0_crop.py --model-name /GOT_weights/ --image-file /images/path/ --multi-page

渲染格式化的 OCR 結果：

 python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type format --render

注意：渲染結果可以在/results/demo.html中找到。請開啟 demo.html 查看結果。

火車

火車樣本可以在這裡找到。請注意，「對話」-「人類」-「價值」中的「<圖像>」是必要的！
此程式碼庫僅支援我們的 GOT 權重的後訓練（stage-2/stage-3）。
如果您想從我們論文中描述的第一階段開始進行訓練，您需要這個儲存庫。

 deepspeed /GOT-OCR-2.0-master/GOT/train/train_GOT.py
  --deepspeed /GOT-OCR-2.0-master/zero_config/zero2.json --model_name_or_path /GOT_weights/
  --use_im_start_end 真
    --bf16 正確
    --gradient_accumulation_steps 2
     --評估策略“否”
    --save_strategy“步驟”
   --save_steps 200
    --save_total_limit 1
    --權重衰減0。
     --warmup_ratio 0.001
      --lr_scheduler_type“餘弦”
     --logging_steps 1
     --tf32 真
      --model_max_length 8192
     --gradient_checkpointing 真
    --dataloader_num_workers 8
     --report_to none
   --per_device_train_batch_size 2
     --num_train_epochs 1
   --學習率2e-5
    --資料集 pdf-ocr+scence
  --output_dir /你的/輸出/路徑

筆記：

更改constant.py中對應的資料資訊。
將conversation_dataset_qwen.py 中的第37行更改為您的data_name。

微調

使用 ms-swift 快速微調：

 git 克隆 https://github.com/modelscope/ms-swift.gitcd ms-swift
pip install -e .[llm]

 # 預設：sft LLM & 投影儀，凍結視覺編碼器CUDA_VISIBLE_DEVICES=0 swift sft
--model_type got-ocr2
 --model_id_or_path stepfun-ai/GOT-OCR2_0
 --sft_type 勞拉
 --資料集 Latex-ocr-print#5000# Deepspeed ZeRO2NPROC_PER_NODE=4
 CUDA_VISIBLE_DEVICES=0,1,2,3 快速 sft
 --model_type got-ocr2
 --model_id_or_path stepfun-ai/GOT-OCR2_0
 --sft_type 勞拉
 --資料集 Latex-ocr-print#5000
 --deepspeed 預設值-02

根據您的數據：

 --資料集train.jsonl
--val_dataset val.jsonl（可選）

資料格式：

 {“查詢”：“<圖片> 55555”，“回應”：“66666”，“圖像”：[“image_path”]}
{“查詢”：“<圖像> <圖像> eeeee”，“回應”：“fffff”，“歷史記錄”：[]，“圖像”：[“image_path1”，“image_path2”]}
{“查詢”：“EEEEE”，“回應”：“FFFFF”，“歷史記錄”：[[“查詢1”，“回應1”]，[“查詢2”，“回應2”]]}

更多細節可以參考ms-swift。

評估

我們使用 Fox 和 OneChart 基準，其他基準可以在重量下載連結中找到。
評估代碼可以在 GOT/eval 中找到。
您可以使用evaluate_GOT.py來執行eval。如果你有8個GPU，--num-chunks可以設定為8。

 python3 GOT/eval/evaluate_GOT.py --model-name /GOT_weights/ --gtfile_path xxxx.json --image_path /image/path/ --out_path /data/eval_results/GOT_mahpix_test/ --num-chunks 8datatype OCR

接觸

如果您對這項工作感興趣或對程式碼或論文有疑問，請加入我們的交流微信群組。

註：五個微信群已滿，請加入6群。

如果您有任何疑問，請隨時透過電子郵件與我聯繫，[email protected]。

致謝

Vary：我們建構的程式碼庫！
Qwen：Vary的LLM基礎模型，英文和中文都很擅長！

引文

@article{wei2024general, title={通用 OCR 理論：透過統一的端對端模型邁向 OCR-2.0}，作者={Wei、Haoran 和 Liu、Chenglong 和 Chen、Jinyue 和 Wang、Jia 和 Kong、Lingyu 和徐彥明和葛、鄭和趙、樑和孫、健健和彭、袁等人}，journal={arXiv 預印本arXiv:2409.01704}，year={2024}}@article{wei2023vary，title={Vary：縮視覺語言模型的視覺詞彙表}，作者={魏、浩然和孔、凌宇和陳、金月和趙、樑和葛、鄭和楊、金榮和孫、健健和韓、春瑞和張、翔宇}, 期刊={arXiv 預印本 arXiv:2312.06109}, 年={2023}}

展開

附加信息

版本
類型其他源碼
更新時間 2024-11-09
大小 50MB
來自於 Github

相關應用

馬克斯程式CMS4.0（MaxCms4.0） v4.0 bulid2015.01.12

2024-11-14
婚嫁網v1.0

2022-06-04
金博客 v2.0

2022-06-01
Cart42 v1.0

2022-05-30
zhcms v1.0

2022-05-23
JF部落格 v1.0

2022-05-23

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
waymo open dataset

其他源碼

December 2023 Update
SmartTube

其他源碼

24.71 Stable
Sunamu

其他源碼

Release 2.2.0
waymo open dataset

其他源碼

December 2023 Update
termwind

其他類別

v2.3.0
wp functions

其他類別

1.0.0

相關資訊全部