manga image translator下載 - manga image translator原始碼下載

圖片/漫畫翻譯

翻譯漫畫/圖像中的文字。
中文說明 |變更日誌
加入我們的不和諧 https://discord.gg/Ak8APNy4vb

有些漫畫/圖像永遠不會被翻譯，因此這個項目誕生了。

圖片/漫畫翻譯
- 樣品
- 線上示範
- 免責聲明
- 安裝
  - 本地設定
    - 點/venv
    - Windows的附加說明
  - 碼頭工人
    - 主機伺服器
    - 用作 CLI
    - 設定翻譯秘密
    - 與 Nvidia GPU 一起使用
    - 本地建設
- 用法
  - 批次模式（預設）
  - 演示模式
  - 網頁模式
  - API模式
- 相關項目
- 文件
  - 推薦模組
    - 提升翻譯品質的技巧
  - 選項
  - 語言代碼參考
  - 譯者參考
  - GPT 配置參考
  - 使用Gimp進行渲染
  - API文件
    - 同步模式
    - 非同步模式
    - 手動翻譯
- 後續步驟
- 支持我們
  - 感謝我們所有的貢獻者：

樣品

請注意，範例可能不會總是更新，它們可能不代表目前的主分支版本。

原來的	已翻譯
（來源@09ra_19ra）	（面具）
（來源@VERTIGRIS_ART）	`--detector ctd` （掩模）
（來源@hiduki_yayoi）	`--translator none` （遮罩）
（來源@rikak）	（面具）

線上示範

官方示範（by zyddnys）：https://touhou.ai/imgtrans/
瀏覽器使用者腳本（由 QiroNT 提供）：https://greasyfork.org/scripts/437569

請注意，由於愚蠢的 google gcp 不斷重新啟動我的實例，這有時可能不起作用。在這種情況下，您可以等待我重新啟動服務，這可能最多需要 24 小時。
請注意，此線上示範使用目前的主分支版本。

免責聲明

MMDOCR-高性能的後繼者。
這是一個業餘愛好項目，歡迎您貢獻！
目前這只是一個簡單的demo，還存在著許多不完美的地方，我們需要您的支持來讓這個專案變得更好！
主要設計用於翻譯日文文本，但也支援中文、英文和韓文。
支援修復、文字渲染和著色。

安裝

本地設定

點/venv

 # First, you need to have Python(>=3.8) installed on your system
# The latest version often does not work with some pytorch libraries yet
$ python --version
Python 3.10.6

# Clone this repo
$ git clone https://github.com/zyddnys/manga-image-translator.git

# Create venv
$ python -m venv venv

# Activate venv
$ source venv/bin/activate

# For --use-gpu option go to https://pytorch.org/ and follow
# pytorch installation instructions. Add `--upgrade --force-reinstall`
# to the pip command to overwrite the currently installed pytorch version.

# Install the dependencies
$ pip install -r requirements.txt

模型將在運行時下載到./models 。

Windows的附加說明

在開始 pip 安裝之前，先安裝 Microsoft C++ 建置工具（下載、說明），因為如果沒有它，某些 pip 依賴項將無法編譯。（參見#114）。

若要在 Windows 上使用 cuda，請依照 https://pytorch.org/ 上的說明安裝正確的 pytorch 版本。

碼頭工人

要求：

Docker（CUDA/GPU 加速需要版本 19.03+）
Docker Compose（如果您想使用demo/doc資料夾中的文件，則可選）
Nvidia 容器運行時（如果您想使用 CUDA，則可選）

該專案在zyddnys/manga-image-translator:main image 下有 docker 支援。此 docker 映像包含專案所需的所有相依性/模型。應該注意的是，該圖像相當大（~ 15GB）。

主機伺服器

可以使用（對於 CPU）託管 Web 伺服器

docker run -p 5003:5003 -v result:/app/result --ipc=host --rm zyddnys/manga-image-translator:main -l ENG --manga2eng -v --mode web --host=0.0.0.0 --port=5003

或者

docker-compose -f demo/doc/docker-compose-web-with-cpu.yml up

取決於你更喜歡哪一個。 Web 伺服器應在連接埠 5003 上啟動，並且映像應位於/result資料夾中。

用作 CLI

將 docker 與 CLI 結合使用（即在批次模式下）

docker run -v < targetFolder > :/app/ < targetFolder > -v < targetFolder > -translated:/app/ < targetFolder > -translated  --ipc=host --rm zyddnys/manga-image-translator:main --mode=batch -i=/app/ < targetFolder > < cli flags >

注意：如果您需要引用主機上的文件，則需要將關聯的文件作為磁碟區安裝到容器內的/app資料夾中。 CLI 的路徑需要是內部 docker 路徑/app/...而不是主機上的路徑

設定翻譯秘密

某些翻譯服務需要 API 金鑰才能發揮作用，以將它們作為環境變數傳遞到 docker 容器中。例如：

docker run --env= " DEEPL_AUTH_KEY=xxx " --ipc=host --rm zyddnys/manga-image-translator:main < cli flags >

與 Nvidia GPU 一起使用

若要與支援的 GPU 一起使用，請先閱讀初始Docker部分。您需要使用一些特殊的依賴項

要運行設定了以下標誌的容器：

docker run ... --gpus=all ... zyddnys/manga-image-translator:main ... --use-gpu

或（適用於 Web 伺服器 + GPU）

docker-compose -f demo/doc/docker-compose-web-with-gpu.yml up

本地建設

要在本機上建置 docker 映像，您可以運行（您需要在您的電腦上進行 make）

make build-image

然後測試建置的映像運行

make run-web-server

用法

批次模式（預設）

 # use `--use-gpu` for speedup if you have a compatible NVIDIA GPU.
# use `--target-lang <language_code>` to specify a target language.
# use `--inpainter=none` to disable inpainting.
# use `--translator=none` if you only want to use inpainting (blank bubbles)
# replace <path> with the path to the image folder or file.
$ python -m manga_translator -v --translator=google -l ENG -i < path >
# results can be found under `<path_to_image_folder>-translated`.

演示模式

 # saves singular image into /result folder for demonstration purposes
# use `--mode demo` to enable demo translation.
# replace <path> with the path to the image file.
$ python -m manga_translator --mode demo -v --translator=google -l ENG -i < path >
# result can be found in `result/`.

網頁模式

 # use `--mode web` to start a web server.
$ python -m manga_translator -v --mode web --use-gpu
# the demo will be serving on http://127.0.0.1:5003

API模式

 # use `--mode web` to start a web server.
$ python -m manga_translator -v --mode api --use-gpu
# the demo will be serving on http://127.0.0.1:5003

文件

選項

 -h, --help                                   show this help message and exit
-m, --mode {demo,batch,web,web_client,ws,api}
                                             Run demo in single image demo mode (demo), batch
                                             translation mode (batch), web service mode (web)
-i, --input INPUT [INPUT ...]                Path to an image file if using demo mode, or path to an
                                             image folder if using batch mode
-o, --dest DEST                              Path to the destination folder for translated images in
                                             batch mode
-l, --target-lang {CHS,CHT,CSY,NLD,ENG,FRA,DEU,HUN,ITA,JPN,KOR,PLK,PTB,ROM,RUS,ESP,TRK,UKR,VIN,ARA,CNR,SRP,HRV,THA,IND,FIL}
                                             Destination language
-v, --verbose                                Print debug info and save intermediate images in result
                                             folder
-f, --format {png,webp,jpg,xcf,psd,pdf}      Output format of the translation.
--attempts ATTEMPTS                          Retry attempts on encountered error. -1 means infinite
                                             times.
--ignore-errors                              Skip image on encountered error.
--overwrite                                  Overwrite already translated images in batch mode.
--skip-no-text                               Skip image without text (Will not be saved).
--model-dir MODEL_DIR                        Model directory (by default ./models in project root)
--use-gpu                                   Turn on/off gpu
--use-gpu-limited                           Turn on/off gpu (excluding offline translator)
--detector {default,ctd,craft,none}          Text detector used for creating a text mask from an
                                             image, DO NOT use craft for manga, it's not designed
                                             for it
--ocr {32px,48px,48px_ctc,mocr}              Optical character recognition (OCR) model to use
--use-mocr-merge                             Use bbox merge when Manga OCR inference.
--inpainter {default,lama_large,lama_mpe,sd,none,original}
                                             Inpainting model to use
--upscaler {waifu2x,esrgan,4xultrasharp}     Upscaler to use. --upscale-ratio has to be set for it
                                             to take effect
--upscale-ratio UPSCALE_RATIO                Image upscale ratio applied before detection. Can
                                             improve text detection.
--colorizer {mc2}                            Colorization model to use.
--translator {google,youdao,baidu,deepl,papago,caiyun,gpt3,gpt3.5,gpt4,none,original,offline,nllb,nllb_big,sugoi,jparacrawl,jparacrawl_big,m2m100,m2m100_big,sakura}
                                             Language translator to use
--translator-chain TRANSLATOR_CHAIN          Output of one translator goes in another. Example:
                                             --translator-chain "google:JPN;sugoi:ENG".
--selective-translation SELECTIVE_TRANSLATION
                                             Select a translator based on detected language in
                                             image. Note the first translation service acts as
                                             default if the language isn't defined. Example:
                                             --translator-chain "google:JPN;sugoi:ENG".
--revert-upscaling                           Downscales the previously upscaled image after
                                             translation back to original size (Use with --upscale-
                                             ratio).
--detection-size DETECTION_SIZE              Size of image used for detection
--det-rotate                                 Rotate the image for detection. Might improve
                                             detection.
--det-auto-rotate                            Rotate the image for detection to prefer vertical
                                             textlines. Might improve detection.
--det-invert                                 Invert the image colors for detection. Might improve
                                             detection.
--det-gamma-correct                          Applies gamma correction for detection. Might improve
                                             detection.
--unclip-ratio UNCLIP_RATIO                  How much to extend text skeleton to form bounding box
--box-threshold BOX_THRESHOLD                Threshold for bbox generation
--text-threshold TEXT_THRESHOLD              Threshold for text detection
--min-text-length MIN_TEXT_LENGTH            Minimum text length of a text region
--no-text-lang-skip                          Dont skip text that is seemingly already in the target
                                             language.
--inpainting-size INPAINTING_SIZE            Size of image used for inpainting (too large will
                                             result in OOM)
--inpainting-precision {fp32,fp16,bf16}      Inpainting precision for lama, use bf16 while you can.
--colorization-size COLORIZATION_SIZE        Size of image used for colorization. Set to -1 to use
                                             full image size
--denoise-sigma DENOISE_SIGMA                Used by colorizer and affects color strength, range
                                             from 0 to 255 (default 30). -1 turns it off.
--mask-dilation-offset MASK_DILATION_OFFSET  By how much to extend the text mask to remove left-over
                                             text pixels of the original image.
--font-size FONT_SIZE                        Use fixed font size for rendering
--font-size-offset FONT_SIZE_OFFSET          Offset font size by a given amount, positive number
                                             increase font size and vice versa
--font-size-minimum FONT_SIZE_MINIMUM        Minimum output font size. Default is
                                             image_sides_sum/200
--font-color FONT_COLOR                      Overwrite the text fg/bg color detected by the OCR
                                             model. Use hex string without the "#" such as FFFFFF
                                             for a white foreground or FFFFFF:000000 to also have a
                                             black background around the text.
--line-spacing LINE_SPACING                  Line spacing is font_size * this value. Default is 0.01
                                             for horizontal text and 0.2 for vertical.
--force-horizontal                           Force text to be rendered horizontally
--force-vertical                             Force text to be rendered vertically
--align-left                                 Align rendered text left
--align-center                               Align rendered text centered
--align-right                                Align rendered text right
--uppercase                                  Change text to uppercase
--lowercase                                  Change text to lowercase
--no-hyphenation                             If renderer should be splitting up words using a hyphen
                                             character (-)
--manga2eng                                  Render english text translated from manga with some
                                             additional typesetting. Ignores some other argument
                                             options
--gpt-config GPT_CONFIG                      Path to GPT config file, more info in README
--use-mtpe                                   Turn on/off machine translation post editing (MTPE) on
                                             the command line (works only on linux right now)
--save-text                                  Save extracted text and translations into a text file.
--save-text-file SAVE_TEXT_FILE              Like --save-text but with a specified file path.
--filter-text FILTER_TEXT                    Filter regions by their text with a regex. Example
                                             usage: --text-filter ".*badtext.*"
--pre-dict FILe_PATH                         Path to the pre-translation dictionary file. One entry per line,
                                             Comments can be added with `#` and `//`.
                                             usage: //Example
                                                    dog cat #Example
                                                    abc def
                                                    abc
--post-dict FILE_PATH                        Path to the post-translation dictionary file. Same as above.
--skip-lang                                  Skip translation if source image is one of the provide languages, 
                                             use comma to separate multiple languages. Example: JPN,ENG
--prep-manual                                Prepare for manual typesetting by outputting blank,
                                             inpainted images, plus copies of the original for
                                             reference
--font-path FONT_PATH                        Path to font file
--gimp-font GIMP_FONT                        Font family to use for gimp rendering.
--host HOST                                  Used by web module to decide which host to attach to
--port PORT                                  Used by web module to decide which port to attach to
--nonce NONCE                                Used by web module as secret for securing internal web
                                             server communication
--ws-url WS_URL                              Server URL for WebSocket mode
--save-quality SAVE_QUALITY                  Quality of saved JPEG image, range from 0 to 100 with
                                             100 being best
--ignore-bubble IGNORE_BUBBLE                The threshold for ignoring text in non bubble areas,
                                             with valid values ranging from 1 to 50, does not ignore
                                             others. Recommendation 5 to 10. If it is too low,
                                             normal bubble areas may be ignored, and if it is too
                                             large, non bubble areas may be considered normal
                                             bubbles

語言代碼參考

由--target-lang或-l參數使用。

 CHS : Chinese (Simplified)
CHT : Chinese (Traditional)
CSY : Czech
NLD : Dutch
ENG : English
FRA : French
DEU : German
HUN : Hungarian
ITA : Italian
JPN : Japanese
KOR : Korean
PLK : Polish
PTB : Portuguese (Brazil)
ROM : Romanian
RUS : Russian
ESP : Spanish
TRK : Turkish
UKR : Ukrainian
VIN : Vietnames
ARA : Arabic
SRP : Serbian
HRV : Croatian
THA : Thai
IND : Indonesian
FIL : Filipino (Tagalog)

譯者參考

姓名	API金鑰	離線	筆記
~~Google~~			暫時停用
有道	✔️		需要`YOUDAO_APP_KEY`和`YOUDAO_SECRET_KEY`
百度	✔️		需要`BAIDU_APP_ID`和`BAIDU_SECRET_KEY`
深度	✔️		需要`DEEPL_AUTH_KEY`
彩雲	✔️		需要`CAIYUN_TOKEN`
總蛋白三	✔️		實作text-davinci-003。需要`OPENAI_API_KEY`
gpt3.5	✔️		實現 gpt-3.5-turbo。需要`OPENAI_API_KEY`
組蛋白4	✔️		實現 gpt-4。需要`OPENAI_API_KEY`
帕帕戈
櫻花			需要`SAKURA_API_BASE`
離線		✔️	選擇最適合語言的離線翻譯器
蘇戈伊		✔️	Sugoi V4.0 型號
米2米100		✔️	支援每種語言
m2m100_big		✔️
沒有任何		✔️	翻譯為空文本
原來的		✔️	保留原文

API金鑰：翻譯器是否需要將API金鑰設定為環境變數。為此，您可以在專案根目錄中建立一個包含 api 金鑰的 .env 文件，如下所示：

 OPENAI_API_KEY = sk-xxxxxxx...
DEEPL_AUTH_KEY = xxxxxxxx...

離線：翻譯器是否可以離線使用。
Sugoi由mingshiba創建，請支持他https://www.patreon.com/mingshiba

GPT 配置參考

由--gpt-config參數使用。

 # The prompt being feed into GPT before the text to translate.
# Use {to_lang} to indicate where the target language name should be inserted.
# Note: ChatGPT models don't use this prompt.
prompt_template : >
  Please help me to translate the following text from a manga to {to_lang}
  (if it's already in {to_lang} or looks like gibberish you have to output it as it is instead):n

# What sampling temperature to use, between 0 and 2.
# Higher values like 0.8 will make the output more random,
# while lower values like 0.2 will make it more focused and deterministic.
temperature : 0.5

# An alternative to sampling with temperature, called nucleus sampling,
# where the model considers the results of the tokens with top_p probability mass.
# So 0.1 means only the tokens comprising the top 10% probability mass are considered.
top_p : 1

# The prompt being feed into ChatGPT before the text to translate.
# Use {to_lang} to indicate where the target language name should be inserted.
# Tokens used in this example: 57+
chat_system_template : >
  You are a professional translation engine, 
  please translate the story into a colloquial, 
  elegant and fluent content, 
  without referencing machine translations. 
  You must only translate the story, never interpret it.
  If there is any issue in the text, output it as is.

  Translate to {to_lang}.

# Samples being feed into ChatGPT to show an example conversation.
# In a [prompt, response] format, keyed by the target language name.
#
# Generally, samples should include some examples of translation preferences, and ideally
# some names of characters it's likely to encounter.
#
# If you'd like to disable this feature, just set this to an empty list.
chat_sample :
  Simplified Chinese : # Tokens used in this example: 88 + 84
    - <|1|>恥ずかしい… 目立ちたくない… 私が消えたい…
      <|2|>きみ… 大丈夫⁉
      <|3|>なんだこいつ 空気読めて ないのか…？
    - <|1|>好尴尬…我不想引人注目…我想消失…
      <|2|>你…没事吧⁉
      <|3|>这家伙怎么看不懂气氛的…？

# Overwrite configs for a specific model.
# For now the list is: gpt3, gpt35, gpt4
gpt35 :
  temperature : 0.3

使用Gimp進行渲染

當輸出格式設為 { xcf , psd , pdf } 時，將使用 Gimp 產生檔案。

在 Windows 上，假定 Gimp 2.x 安裝到C:Users<Username>AppDataLocalProgramsGimp 2 。

產生的.xcf檔案包含原始影像作為最低層，並將修復作為單獨的層。翻譯後的文字方塊有自己的圖層，以原始文字作為圖層名稱，以便於存取。

限制：

儲存.psd檔案時，Gimp 會將文字圖層轉換為常規影像。
Gimp 中不能很好地處理旋轉文字。編輯旋轉的文字框時，它還會顯示一個彈出窗口，表示它已被外部程式修改。
字體系列是透過--gimp-font參數單獨控制的。

API文件

API V2

 # use `--mode api` to start a web server.
$ python -m manga_translator -v --mode api --use-gpu
# the api will be serving on http://127.0.0.1:5003

API 接受 json(post) 和 multipart。
API 端點為/colorize_translate 、 /inpaint_translate 、 /translate 、 /get_text 。
api 的有效參數是：

 // These are taken from args.py. For more info see README.md
detector: String
ocr: String
inpainter: String
upscaler: String
translator: String 
target_language: String
upscale_ratio: Integer
translator_chain: String
selective_translation: String
attempts: Integer
detection_size: Integer // 1024 => 'S', 1536 => 'M', 2048 => 'L', 2560 => 'X'
text_threshold: Float
box_threshold: Float
unclip_ratio: Float
inpainting_size: Integer
det_rotate: Bool
det_auto_rotate: Bool
det_invert: Bool
det_gamma_correct: Bool
min_text_length: Integer
colorization_size: Integer
denoise_sigma: Integer
mask_dilation_offset: Integer
ignore_bubble: Integer
gpt_config: String
filter_text: String
overlay_type: String

// These are api specific args
direction: String // {'auto', 'h', 'v'}
base64Images: String //Image in base64 format
image: Multipart // image upload from multipart
url: String // an url string

手動翻譯以人工翻譯取代機器翻譯。使用網頁模式時，可以在 http://127.0.0.1:5003/manual 找到基本的手動翻譯示範。

應用程式介面

此demo提供了兩種模式的翻譯服務：同步模式和非同步模式。
在同步模式下，一旦翻譯任務完成，您的 HTTP POST 要求就會完成。
在非同步模式下，您的 HTTP POST 請求將立即回應一個task_id ，您可以使用此task_id來輪詢翻譯任務狀態。

同步模式

POST 帶有表單資料file:<content-of-image>到 http://127.0.0.1:5003/run
等待回覆
使用結果task_id在result/目錄中尋找翻譯結果，例如使用Nginx公開result/

非同步模式

POST 帶有表單資料file:<content-of-image>到 http://127.0.0.1:5003/submit
取得翻譯task_id
將 JSON {"taskid": <task-id>}發佈到 http://127.0.0.1:5003/task-state 來輪詢翻譯任務狀態
當結果狀態為finished 、 error或error-lang時，翻譯完成
在result/目錄中尋找翻譯結果，例如使用 Nginx 公開result/

手動翻譯

將帶有表單資料file:<content-of-image>到 http://127.0.0.1:5003/manual-translate 並等待回應。

您將獲得如下 JSON 回應：

{
  "task_id" : " 12c779c9431f954971cae720eb104499 " ,
  "status" : " pending " ,
  "trans_result" : [
    {
      "s" : " ☆上司来ちゃった…… " ,
      "t" : " "
    }
  ]
}

填寫翻譯文：

{
  "task_id" : " 12c779c9431f954971cae720eb104499 " ,
  "status" : " pending " ,
  "trans_result" : [
    {
      "s" : " ☆上司来ちゃった…… " ,
      "t" : " ☆Boss is here... "
    }
  ]
}

將翻譯後的 JSON 發佈到 http://127.0.0.1:5003/post-manual-result 並等待回應。
然後你可以在result/目錄中找到翻譯結果，例如使用 Nginx 公開result/ 。

後續步驟

接下來需要做什麼的列表，歡迎您貢獻。

使用基於擴散模型的修復來實現近乎完美的結果，但這可能會慢得多。
重要！目前的文字渲染引擎勉強可用，我們需要您的幫助來改進文字渲染！
文字渲染區域由偵測到的文字行決定，而不是氣泡。
這適用於沒有氣泡的圖像，但無法決定將翻譯後的英文文字放在哪裡。我不知道如何解決這個問題。
良太等人。建議使用多模態機器翻譯，也許我們可以添加 ViT 功能來建立自訂 NMT 模型。
使該專案適用於視訊（用 C++ 重寫程式碼並使用 GPU/其他硬體神經網路加速器）。
用於偵測影片中的硬字幕，產生ass檔案並徹底刪除。
~~基於使用非深度學習演算法的掩模細化，我目前正在測試基於 CRF 的演算法。~~
~~目前不支援傾斜文字區域合併~~
建立 pip 儲存庫

支持我們

GPU伺服器並不便宜，請考慮捐款給我們。

Ko-fi：https://ko-fi.com/voilelabs
派特隆：https://www.patreon.com/voilelabs
愛發電：https://afdian.net/@voilelabs
感謝我們所有的貢獻者：

展開

manga image translator