翻譯漫畫/圖像中的文字。
中文說明 |變更日誌
加入我們的不和諧 https://discord.gg/Ak8APNy4vb
有些漫畫/圖像永遠不會被翻譯,因此這個項目誕生了。
請注意,範例可能不會總是更新,它們可能不代表目前的主分支版本。
原來的 | 已翻譯 |
---|---|
(來源@09ra_19ra) | (面具) |
(來源@VERTIGRIS_ART) | --detector ctd (掩模) |
(來源@hiduki_yayoi) | --translator none (遮罩) |
(來源@rikak) | (面具) |
官方示範(by zyddnys):https://touhou.ai/imgtrans/
瀏覽器使用者腳本(由 QiroNT 提供):https://greasyfork.org/scripts/437569
MMDOCR-高性能的後繼者。
這是一個業餘愛好項目,歡迎您貢獻!
目前這只是一個簡單的demo,還存在著許多不完美的地方,我們需要您的支持來讓這個專案變得更好!
主要設計用於翻譯日文文本,但也支援中文、英文和韓文。
支援修復、文字渲染和著色。
# First, you need to have Python(>=3.8) installed on your system
# The latest version often does not work with some pytorch libraries yet
$ python --version
Python 3.10.6
# Clone this repo
$ git clone https://github.com/zyddnys/manga-image-translator.git
# Create venv
$ python -m venv venv
# Activate venv
$ source venv/bin/activate
# For --use-gpu option go to https://pytorch.org/ and follow
# pytorch installation instructions. Add `--upgrade --force-reinstall`
# to the pip command to overwrite the currently installed pytorch version.
# Install the dependencies
$ pip install -r requirements.txt
模型將在運行時下載到./models
。
在開始 pip 安裝之前,先安裝 Microsoft C++ 建置工具(下載、說明),因為如果沒有它,某些 pip 依賴項將無法編譯。 (參見#114)。
若要在 Windows 上使用 cuda,請依照 https://pytorch.org/ 上的說明安裝正確的 pytorch 版本。
要求:
demo/doc
資料夾中的文件,則可選)該專案在zyddnys/manga-image-translator:main
image 下有 docker 支援。此 docker 映像包含專案所需的所有相依性/模型。應該注意的是,該圖像相當大(~ 15GB)。
可以使用(對於 CPU)託管 Web 伺服器
docker run -p 5003:5003 -v result:/app/result --ipc=host --rm zyddnys/manga-image-translator:main -l ENG --manga2eng -v --mode web --host=0.0.0.0 --port=5003
或者
docker-compose -f demo/doc/docker-compose-web-with-cpu.yml up
取決於你更喜歡哪一個。 Web 伺服器應在連接埠 5003 上啟動,並且映像應位於/result
資料夾中。
將 docker 與 CLI 結合使用(即在批次模式下)
docker run -v < targetFolder > :/app/ < targetFolder > -v < targetFolder > -translated:/app/ < targetFolder > -translated --ipc=host --rm zyddnys/manga-image-translator:main --mode=batch -i=/app/ < targetFolder > < cli flags >
注意:如果您需要引用主機上的文件,則需要將關聯的文件作為磁碟區安裝到容器內的/app
資料夾中。 CLI 的路徑需要是內部 docker 路徑/app/...
而不是主機上的路徑
某些翻譯服務需要 API 金鑰才能發揮作用,以將它們作為環境變數傳遞到 docker 容器中。例如:
docker run --env= " DEEPL_AUTH_KEY=xxx " --ipc=host --rm zyddnys/manga-image-translator:main < cli flags >
若要與支援的 GPU 一起使用,請先閱讀初始
Docker
部分。您需要使用一些特殊的依賴項
要運行設定了以下標誌的容器:
docker run ... --gpus=all ... zyddnys/manga-image-translator:main ... --use-gpu
或(適用於 Web 伺服器 + GPU)
docker-compose -f demo/doc/docker-compose-web-with-gpu.yml up
要在本機上建置 docker 映像,您可以運行(您需要在您的電腦上進行 make)
make build-image
然後測試建置的映像運行
make run-web-server
# use `--use-gpu` for speedup if you have a compatible NVIDIA GPU.
# use `--target-lang <language_code>` to specify a target language.
# use `--inpainter=none` to disable inpainting.
# use `--translator=none` if you only want to use inpainting (blank bubbles)
# replace <path> with the path to the image folder or file.
$ python -m manga_translator -v --translator=google -l ENG -i < path >
# results can be found under `<path_to_image_folder>-translated`.
# saves singular image into /result folder for demonstration purposes
# use `--mode demo` to enable demo translation.
# replace <path> with the path to the image file.
$ python -m manga_translator --mode demo -v --translator=google -l ENG -i < path >
# result can be found in `result/`.
# use `--mode web` to start a web server.
$ python -m manga_translator -v --mode web --use-gpu
# the demo will be serving on http://127.0.0.1:5003
# use `--mode web` to start a web server.
$ python -m manga_translator -v --mode api --use-gpu
# the demo will be serving on http://127.0.0.1:5003
GUI 實作:BallonsTranslator
探測器:
--detector ctd
可以增加偵測到的文字行數量光學字元辨識:
譯者:
油漆工:??
著色器: mc2
--upscale-ratio 2
或任何其他值來使用擴大機--font-size-minimum 30
或使用--manga2eng
渲染器,該渲染器將嘗試適應偵測到的文字氣泡--font-path fonts/anime_ace_3.ttf
指定字體 -h, --help show this help message and exit
-m, --mode {demo,batch,web,web_client,ws,api}
Run demo in single image demo mode (demo), batch
translation mode (batch), web service mode (web)
-i, --input INPUT [INPUT ...] Path to an image file if using demo mode, or path to an
image folder if using batch mode
-o, --dest DEST Path to the destination folder for translated images in
batch mode
-l, --target-lang {CHS,CHT,CSY,NLD,ENG,FRA,DEU,HUN,ITA,JPN,KOR,PLK,PTB,ROM,RUS,ESP,TRK,UKR,VIN,ARA,CNR,SRP,HRV,THA,IND,FIL}
Destination language
-v, --verbose Print debug info and save intermediate images in result
folder
-f, --format {png,webp,jpg,xcf,psd,pdf} Output format of the translation.
--attempts ATTEMPTS Retry attempts on encountered error. -1 means infinite
times.
--ignore-errors Skip image on encountered error.
--overwrite Overwrite already translated images in batch mode.
--skip-no-text Skip image without text (Will not be saved).
--model-dir MODEL_DIR Model directory (by default ./models in project root)
--use-gpu Turn on/off gpu
--use-gpu-limited Turn on/off gpu (excluding offline translator)
--detector {default,ctd,craft,none} Text detector used for creating a text mask from an
image, DO NOT use craft for manga, it's not designed
for it
--ocr {32px,48px,48px_ctc,mocr} Optical character recognition (OCR) model to use
--use-mocr-merge Use bbox merge when Manga OCR inference.
--inpainter {default,lama_large,lama_mpe,sd,none,original}
Inpainting model to use
--upscaler {waifu2x,esrgan,4xultrasharp} Upscaler to use. --upscale-ratio has to be set for it
to take effect
--upscale-ratio UPSCALE_RATIO Image upscale ratio applied before detection. Can
improve text detection.
--colorizer {mc2} Colorization model to use.
--translator {google,youdao,baidu,deepl,papago,caiyun,gpt3,gpt3.5,gpt4,none,original,offline,nllb,nllb_big,sugoi,jparacrawl,jparacrawl_big,m2m100,m2m100_big,sakura}
Language translator to use
--translator-chain TRANSLATOR_CHAIN Output of one translator goes in another. Example:
--translator-chain "google:JPN;sugoi:ENG".
--selective-translation SELECTIVE_TRANSLATION
Select a translator based on detected language in
image. Note the first translation service acts as
default if the language isn't defined. Example:
--translator-chain "google:JPN;sugoi:ENG".
--revert-upscaling Downscales the previously upscaled image after
translation back to original size (Use with --upscale-
ratio).
--detection-size DETECTION_SIZE Size of image used for detection
--det-rotate Rotate the image for detection. Might improve
detection.
--det-auto-rotate Rotate the image for detection to prefer vertical
textlines. Might improve detection.
--det-invert Invert the image colors for detection. Might improve
detection.
--det-gamma-correct Applies gamma correction for detection. Might improve
detection.
--unclip-ratio UNCLIP_RATIO How much to extend text skeleton to form bounding box
--box-threshold BOX_THRESHOLD Threshold for bbox generation
--text-threshold TEXT_THRESHOLD Threshold for text detection
--min-text-length MIN_TEXT_LENGTH Minimum text length of a text region
--no-text-lang-skip Dont skip text that is seemingly already in the target
language.
--inpainting-size INPAINTING_SIZE Size of image used for inpainting (too large will
result in OOM)
--inpainting-precision {fp32,fp16,bf16} Inpainting precision for lama, use bf16 while you can.
--colorization-size COLORIZATION_SIZE Size of image used for colorization. Set to -1 to use
full image size
--denoise-sigma DENOISE_SIGMA Used by colorizer and affects color strength, range
from 0 to 255 (default 30). -1 turns it off.
--mask-dilation-offset MASK_DILATION_OFFSET By how much to extend the text mask to remove left-over
text pixels of the original image.
--font-size FONT_SIZE Use fixed font size for rendering
--font-size-offset FONT_SIZE_OFFSET Offset font size by a given amount, positive number
increase font size and vice versa
--font-size-minimum FONT_SIZE_MINIMUM Minimum output font size. Default is
image_sides_sum/200
--font-color FONT_COLOR Overwrite the text fg/bg color detected by the OCR
model. Use hex string without the "#" such as FFFFFF
for a white foreground or FFFFFF:000000 to also have a
black background around the text.
--line-spacing LINE_SPACING Line spacing is font_size * this value. Default is 0.01
for horizontal text and 0.2 for vertical.
--force-horizontal Force text to be rendered horizontally
--force-vertical Force text to be rendered vertically
--align-left Align rendered text left
--align-center Align rendered text centered
--align-right Align rendered text right
--uppercase Change text to uppercase
--lowercase Change text to lowercase
--no-hyphenation If renderer should be splitting up words using a hyphen
character (-)
--manga2eng Render english text translated from manga with some
additional typesetting. Ignores some other argument
options
--gpt-config GPT_CONFIG Path to GPT config file, more info in README
--use-mtpe Turn on/off machine translation post editing (MTPE) on
the command line (works only on linux right now)
--save-text Save extracted text and translations into a text file.
--save-text-file SAVE_TEXT_FILE Like --save-text but with a specified file path.
--filter-text FILTER_TEXT Filter regions by their text with a regex. Example
usage: --text-filter ".*badtext.*"
--pre-dict FILe_PATH Path to the pre-translation dictionary file. One entry per line,
Comments can be added with `#` and `//`.
usage: //Example
dog cat #Example
abc def
abc
--post-dict FILE_PATH Path to the post-translation dictionary file. Same as above.
--skip-lang Skip translation if source image is one of the provide languages,
use comma to separate multiple languages. Example: JPN,ENG
--prep-manual Prepare for manual typesetting by outputting blank,
inpainted images, plus copies of the original for
reference
--font-path FONT_PATH Path to font file
--gimp-font GIMP_FONT Font family to use for gimp rendering.
--host HOST Used by web module to decide which host to attach to
--port PORT Used by web module to decide which port to attach to
--nonce NONCE Used by web module as secret for securing internal web
server communication
--ws-url WS_URL Server URL for WebSocket mode
--save-quality SAVE_QUALITY Quality of saved JPEG image, range from 0 to 100 with
100 being best
--ignore-bubble IGNORE_BUBBLE The threshold for ignoring text in non bubble areas,
with valid values ranging from 1 to 50, does not ignore
others. Recommendation 5 to 10. If it is too low,
normal bubble areas may be ignored, and if it is too
large, non bubble areas may be considered normal
bubbles
由--target-lang
或-l
參數使用。
CHS : Chinese (Simplified)
CHT : Chinese (Traditional)
CSY : Czech
NLD : Dutch
ENG : English
FRA : French
DEU : German
HUN : Hungarian
ITA : Italian
JPN : Japanese
KOR : Korean
PLK : Polish
PTB : Portuguese (Brazil)
ROM : Romanian
RUS : Russian
ESP : Spanish
TRK : Turkish
UKR : Ukrainian
VIN : Vietnames
ARA : Arabic
SRP : Serbian
HRV : Croatian
THA : Thai
IND : Indonesian
FIL : Filipino (Tagalog)
姓名 | API金鑰 | 離線 | 筆記 |
---|---|---|---|
暫時停用 | |||
有道 | ✔️ | 需要YOUDAO_APP_KEY 和YOUDAO_SECRET_KEY | |
百度 | ✔️ | 需要BAIDU_APP_ID 和BAIDU_SECRET_KEY | |
深度 | ✔️ | 需要DEEPL_AUTH_KEY | |
彩雲 | ✔️ | 需要CAIYUN_TOKEN | |
總蛋白三 | ✔️ | 實作text-davinci-003。需要OPENAI_API_KEY | |
gpt3.5 | ✔️ | 實現 gpt-3.5-turbo。需要OPENAI_API_KEY | |
組蛋白4 | ✔️ | 實現 gpt-4。需要OPENAI_API_KEY | |
帕帕戈 | |||
櫻花 | 需要SAKURA_API_BASE | ||
離線 | ✔️ | 選擇最適合語言的離線翻譯器 | |
蘇戈伊 | ✔️ | Sugoi V4.0 型號 | |
米2米100 | ✔️ | 支援每種語言 | |
m2m100_big | ✔️ | ||
沒有任何 | ✔️ | 翻譯為空文本 | |
原來的 | ✔️ | 保留原文 |
OPENAI_API_KEY = sk-xxxxxxx...
DEEPL_AUTH_KEY = xxxxxxxx...
離線:翻譯器是否可以離線使用。
Sugoi由mingshiba創建,請支持他https://www.patreon.com/mingshiba
由--gpt-config
參數使用。
# The prompt being feed into GPT before the text to translate.
# Use {to_lang} to indicate where the target language name should be inserted.
# Note: ChatGPT models don't use this prompt.
prompt_template : >
Please help me to translate the following text from a manga to {to_lang}
(if it's already in {to_lang} or looks like gibberish you have to output it as it is instead):n
# What sampling temperature to use, between 0 and 2.
# Higher values like 0.8 will make the output more random,
# while lower values like 0.2 will make it more focused and deterministic.
temperature : 0.5
# An alternative to sampling with temperature, called nucleus sampling,
# where the model considers the results of the tokens with top_p probability mass.
# So 0.1 means only the tokens comprising the top 10% probability mass are considered.
top_p : 1
# The prompt being feed into ChatGPT before the text to translate.
# Use {to_lang} to indicate where the target language name should be inserted.
# Tokens used in this example: 57+
chat_system_template : >
You are a professional translation engine,
please translate the story into a colloquial,
elegant and fluent content,
without referencing machine translations.
You must only translate the story, never interpret it.
If there is any issue in the text, output it as is.
Translate to {to_lang}.
# Samples being feed into ChatGPT to show an example conversation.
# In a [prompt, response] format, keyed by the target language name.
#
# Generally, samples should include some examples of translation preferences, and ideally
# some names of characters it's likely to encounter.
#
# If you'd like to disable this feature, just set this to an empty list.
chat_sample :
Simplified Chinese : # Tokens used in this example: 88 + 84
- <|1|>恥ずかしい… 目立ちたくない… 私が消えたい…
<|2|>きみ… 大丈夫⁉
<|3|>なんだこいつ 空気読めて ないのか…?
- <|1|>好尴尬…我不想引人注目…我想消失…
<|2|>你…没事吧⁉
<|3|>这家伙怎么看不懂气氛的…?
# Overwrite configs for a specific model.
# For now the list is: gpt3, gpt35, gpt4
gpt35 :
temperature : 0.3
當輸出格式設為 { xcf
, psd
, pdf
} 時,將使用 Gimp 產生檔案。
在 Windows 上,假定 Gimp 2.x 安裝到C:Users<Username>AppDataLocalProgramsGimp 2
。
產生的.xcf
檔案包含原始影像作為最低層,並將修復作為單獨的層。翻譯後的文字方塊有自己的圖層,以原始文字作為圖層名稱,以便於存取。
限制:
.psd
檔案時,Gimp 會將文字圖層轉換為常規影像。--gimp-font
參數單獨控制的。 # use `--mode api` to start a web server.
$ python -m manga_translator -v --mode api --use-gpu
# the api will be serving on http://127.0.0.1:5003
API 接受 json(post) 和 multipart。
API 端點為/colorize_translate
、 /inpaint_translate
、 /translate
、 /get_text
。
api 的有效參數是:
// These are taken from args.py. For more info see README.md
detector: String
ocr: String
inpainter: String
upscaler: String
translator: String
target_language: String
upscale_ratio: Integer
translator_chain: String
selective_translation: String
attempts: Integer
detection_size: Integer // 1024 => 'S', 1536 => 'M', 2048 => 'L', 2560 => 'X'
text_threshold: Float
box_threshold: Float
unclip_ratio: Float
inpainting_size: Integer
det_rotate: Bool
det_auto_rotate: Bool
det_invert: Bool
det_gamma_correct: Bool
min_text_length: Integer
colorization_size: Integer
denoise_sigma: Integer
mask_dilation_offset: Integer
ignore_bubble: Integer
gpt_config: String
filter_text: String
overlay_type: String
// These are api specific args
direction: String // {'auto', 'h', 'v'}
base64Images: String //Image in base64 format
image: Multipart // image upload from multipart
url: String // an url string
手動翻譯以人工翻譯取代機器翻譯。使用網頁模式時,可以在 http://127.0.0.1:5003/manual 找到基本的手動翻譯示範。
此demo提供了兩種模式的翻譯服務:同步模式和非同步模式。
在同步模式下,一旦翻譯任務完成,您的 HTTP POST 要求就會完成。
在非同步模式下,您的 HTTP POST 請求將立即回應一個task_id
,您可以使用此task_id
來輪詢翻譯任務狀態。
file:<content-of-image>
到 http://127.0.0.1:5003/runtask_id
在result/
目錄中尋找翻譯結果,例如使用Nginx公開result/
file:<content-of-image>
到 http://127.0.0.1:5003/submittask_id
{"taskid": <task-id>}
發佈到 http://127.0.0.1:5003/task-state 來輪詢翻譯任務狀態finished
、 error
或error-lang
時,翻譯完成result/
目錄中尋找翻譯結果,例如使用 Nginx 公開result/
將帶有表單資料file:<content-of-image>
到 http://127.0.0.1:5003/manual-translate 並等待回應。
您將獲得如下 JSON 回應:
{
"task_id" : " 12c779c9431f954971cae720eb104499 " ,
"status" : " pending " ,
"trans_result" : [
{
"s" : " ☆上司来ちゃった…… " ,
"t" : " "
}
]
}
填寫翻譯文:
{
"task_id" : " 12c779c9431f954971cae720eb104499 " ,
"status" : " pending " ,
"trans_result" : [
{
"s" : " ☆上司来ちゃった…… " ,
"t" : " ☆Boss is here... "
}
]
}
將翻譯後的 JSON 發佈到 http://127.0.0.1:5003/post-manual-result 並等待回應。
然後你可以在result/
目錄中找到翻譯結果,例如使用 Nginx 公開result/
。
接下來需要做什麼的列表,歡迎您貢獻。
GPU伺服器並不便宜,請考慮捐款給我們。
Ko-fi:https://ko-fi.com/voilelabs
派特隆:https://www.patreon.com/voilelabs
愛發電:https://afdian.net/@voilelabs