manga image translator下载 - manga image translator源代码下载

图片/漫画翻译

翻译漫画/图像中的文本。
中文说明 |变更日志
加入我们的不和谐 https://discord.gg/Ak8APNy4vb

有些漫画/图像永远不会被翻译，因此这个项目诞生了。

图片/漫画翻译
- 样品
- 在线演示
- 免责声明
- 安装
  - 本地设置
    - 点/venv
    - Windows的附加说明
  - 码头工人
    - 托管网络服务器
    - 用作 CLI
    - 设置翻译秘密
    - 与 Nvidia GPU 一起使用
    - 本地建设
- 用法
  - 批处理模式（默认）
  - 演示模式
  - 网页模式
  - API模式
- 相关项目
- 文档
  - 推荐模块
    - 提高翻译质量的技巧
  - 选项
  - 语言代码参考
  - 译者参考
  - GPT 配置参考
  - 使用Gimp进行渲染
  - API文档
    - 同步模式
    - 异步模式
    - 手动翻译
- 后续步骤
- 支持我们
  - 感谢我们所有的贡献者：

样品

请注意，示例可能并不总是更新，它们可能不代表当前的主分支版本。

原来的	已翻译
（来源@09ra_19ra）	（面具）
（来源@VERTIGRIS_ART）	`--detector ctd` （掩模）
（来源@hiduki_yayoi）	`--translator none` （掩码）
（来源@rikak）	（面具）

在线演示

官方演示（by zyddnys）：https://touhou.ai/imgtrans/
浏览器用户脚本（由 QiroNT 提供）：https://greasyfork.org/scripts/437569

请注意，由于愚蠢的 google gcp 不断重新启动我的实例，这有时可能不起作用。在这种情况下，您可以等待我重新启动服务，这可能最多需要 24 小时。
请注意，此在线演示使用当前的主分支版本。

免责声明

MMDOCR-高性能的后继者。
这是一个业余爱好项目，欢迎您贡献！
目前这只是一个简单的demo，还存在很多不完善的地方，我们需要您的支持来让这个项目变得更好！
主要设计用于翻译日语文本，但也支持中文、英语和韩语。
支持修复、文本渲染和着色。

安装

本地设置

点/venv

 # First, you need to have Python(>=3.8) installed on your system
# The latest version often does not work with some pytorch libraries yet
$ python --version
Python 3.10.6

# Clone this repo
$ git clone https://github.com/zyddnys/manga-image-translator.git

# Create venv
$ python -m venv venv

# Activate venv
$ source venv/bin/activate

# For --use-gpu option go to https://pytorch.org/ and follow
# pytorch installation instructions. Add `--upgrade --force-reinstall`
# to the pip command to overwrite the currently installed pytorch version.

# Install the dependencies
$ pip install -r requirements.txt

模型将在运行时下载到./models中。

Windows的附加说明

在开始 pip 安装之前，首先安装 Microsoft C++ 构建工具（下载、说明），因为如果没有它，某些 pip 依赖项将无法编译。（参见#114）。

要在 Windows 上使用 cuda，请按照 https://pytorch.org/ 上的说明安装正确的 pytorch 版本。

码头工人

要求：

Docker（CUDA/GPU 加速需要版本 19.03+）
Docker Compose（如果您想使用demo/doc文件夹中的文件，则可选）
Nvidia 容器运行时（如果您想使用 CUDA，则可选）

该项目在zyddnys/manga-image-translator:main image 下有 docker 支持。此 docker 映像包含项目所需的所有依赖项/模型。应该注意的是，该图像相当大（~ 15GB）。

托管网络服务器

可以使用（对于 CPU）托管 Web 服务器

docker run -p 5003:5003 -v result:/app/result --ipc=host --rm zyddnys/manga-image-translator:main -l ENG --manga2eng -v --mode web --host=0.0.0.0 --port=5003

或者

docker-compose -f demo/doc/docker-compose-web-with-cpu.yml up

取决于你更喜欢哪一个。 Web 服务器应在端口 5003 上启动，并且图像应位于/result文件夹中。

用作 CLI

将 docker 与 CLI 结合使用（即在批处理模式下）

docker run -v < targetFolder > :/app/ < targetFolder > -v < targetFolder > -translated:/app/ < targetFolder > -translated  --ipc=host --rm zyddnys/manga-image-translator:main --mode=batch -i=/app/ < targetFolder > < cli flags >

注意：如果您需要引用主机上的文件，则需要将关联的文件作为卷安装到容器内的/app文件夹中。 CLI 的路径需要是内部 docker 路径/app/...而不是主机上的路径

设置翻译秘密

某些翻译服务需要 API 密钥才能发挥作用，以将它们作为环境变量传递到 docker 容器中。例如：

docker run --env= " DEEPL_AUTH_KEY=xxx " --ipc=host --rm zyddnys/manga-image-translator:main < cli flags >

与 Nvidia GPU 一起使用

要与受支持的 GPU 一起使用，请首先阅读初始Docker部分。您需要使用一些特殊的依赖项

要运行设置了以下标志的容器：

docker run ... --gpus=all ... zyddnys/manga-image-translator:main ... --use-gpu

或者（对于 Web 服务器 + GPU）

docker-compose -f demo/doc/docker-compose-web-with-gpu.yml up

本地建设

要在本地构建 docker 映像，您可以运行（您需要在您的计算机上进行 make）

make build-image

然后测试构建的图像运行

make run-web-server

用法

批处理模式（默认）

 # use `--use-gpu` for speedup if you have a compatible NVIDIA GPU.
# use `--target-lang <language_code>` to specify a target language.
# use `--inpainter=none` to disable inpainting.
# use `--translator=none` if you only want to use inpainting (blank bubbles)
# replace <path> with the path to the image folder or file.
$ python -m manga_translator -v --translator=google -l ENG -i < path >
# results can be found under `<path_to_image_folder>-translated`.

演示模式

 # saves singular image into /result folder for demonstration purposes
# use `--mode demo` to enable demo translation.
# replace <path> with the path to the image file.
$ python -m manga_translator --mode demo -v --translator=google -l ENG -i < path >
# result can be found in `result/`.

网页模式

 # use `--mode web` to start a web server.
$ python -m manga_translator -v --mode web --use-gpu
# the demo will be serving on http://127.0.0.1:5003

API模式

 # use `--mode web` to start a web server.
$ python -m manga_translator -v --mode api --use-gpu
# the demo will be serving on http://127.0.0.1:5003

文档

选项

 -h, --help                                   show this help message and exit
-m, --mode {demo,batch,web,web_client,ws,api}
                                             Run demo in single image demo mode (demo), batch
                                             translation mode (batch), web service mode (web)
-i, --input INPUT [INPUT ...]                Path to an image file if using demo mode, or path to an
                                             image folder if using batch mode
-o, --dest DEST                              Path to the destination folder for translated images in
                                             batch mode
-l, --target-lang {CHS,CHT,CSY,NLD,ENG,FRA,DEU,HUN,ITA,JPN,KOR,PLK,PTB,ROM,RUS,ESP,TRK,UKR,VIN,ARA,CNR,SRP,HRV,THA,IND,FIL}
                                             Destination language
-v, --verbose                                Print debug info and save intermediate images in result
                                             folder
-f, --format {png,webp,jpg,xcf,psd,pdf}      Output format of the translation.
--attempts ATTEMPTS                          Retry attempts on encountered error. -1 means infinite
                                             times.
--ignore-errors                              Skip image on encountered error.
--overwrite                                  Overwrite already translated images in batch mode.
--skip-no-text                               Skip image without text (Will not be saved).
--model-dir MODEL_DIR                        Model directory (by default ./models in project root)
--use-gpu                                   Turn on/off gpu
--use-gpu-limited                           Turn on/off gpu (excluding offline translator)
--detector {default,ctd,craft,none}          Text detector used for creating a text mask from an
                                             image, DO NOT use craft for manga, it's not designed
                                             for it
--ocr {32px,48px,48px_ctc,mocr}              Optical character recognition (OCR) model to use
--use-mocr-merge                             Use bbox merge when Manga OCR inference.
--inpainter {default,lama_large,lama_mpe,sd,none,original}
                                             Inpainting model to use
--upscaler {waifu2x,esrgan,4xultrasharp}     Upscaler to use. --upscale-ratio has to be set for it
                                             to take effect
--upscale-ratio UPSCALE_RATIO                Image upscale ratio applied before detection. Can
                                             improve text detection.
--colorizer {mc2}                            Colorization model to use.
--translator {google,youdao,baidu,deepl,papago,caiyun,gpt3,gpt3.5,gpt4,none,original,offline,nllb,nllb_big,sugoi,jparacrawl,jparacrawl_big,m2m100,m2m100_big,sakura}
                                             Language translator to use
--translator-chain TRANSLATOR_CHAIN          Output of one translator goes in another. Example:
                                             --translator-chain "google:JPN;sugoi:ENG".
--selective-translation SELECTIVE_TRANSLATION
                                             Select a translator based on detected language in
                                             image. Note the first translation service acts as
                                             default if the language isn't defined. Example:
                                             --translator-chain "google:JPN;sugoi:ENG".
--revert-upscaling                           Downscales the previously upscaled image after
                                             translation back to original size (Use with --upscale-
                                             ratio).
--detection-size DETECTION_SIZE              Size of image used for detection
--det-rotate                                 Rotate the image for detection. Might improve
                                             detection.
--det-auto-rotate                            Rotate the image for detection to prefer vertical
                                             textlines. Might improve detection.
--det-invert                                 Invert the image colors for detection. Might improve
                                             detection.
--det-gamma-correct                          Applies gamma correction for detection. Might improve
                                             detection.
--unclip-ratio UNCLIP_RATIO                  How much to extend text skeleton to form bounding box
--box-threshold BOX_THRESHOLD                Threshold for bbox generation
--text-threshold TEXT_THRESHOLD              Threshold for text detection
--min-text-length MIN_TEXT_LENGTH            Minimum text length of a text region
--no-text-lang-skip                          Dont skip text that is seemingly already in the target
                                             language.
--inpainting-size INPAINTING_SIZE            Size of image used for inpainting (too large will
                                             result in OOM)
--inpainting-precision {fp32,fp16,bf16}      Inpainting precision for lama, use bf16 while you can.
--colorization-size COLORIZATION_SIZE        Size of image used for colorization. Set to -1 to use
                                             full image size
--denoise-sigma DENOISE_SIGMA                Used by colorizer and affects color strength, range
                                             from 0 to 255 (default 30). -1 turns it off.
--mask-dilation-offset MASK_DILATION_OFFSET  By how much to extend the text mask to remove left-over
                                             text pixels of the original image.
--font-size FONT_SIZE                        Use fixed font size for rendering
--font-size-offset FONT_SIZE_OFFSET          Offset font size by a given amount, positive number
                                             increase font size and vice versa
--font-size-minimum FONT_SIZE_MINIMUM        Minimum output font size. Default is
                                             image_sides_sum/200
--font-color FONT_COLOR                      Overwrite the text fg/bg color detected by the OCR
                                             model. Use hex string without the "#" such as FFFFFF
                                             for a white foreground or FFFFFF:000000 to also have a
                                             black background around the text.
--line-spacing LINE_SPACING                  Line spacing is font_size * this value. Default is 0.01
                                             for horizontal text and 0.2 for vertical.
--force-horizontal                           Force text to be rendered horizontally
--force-vertical                             Force text to be rendered vertically
--align-left                                 Align rendered text left
--align-center                               Align rendered text centered
--align-right                                Align rendered text right
--uppercase                                  Change text to uppercase
--lowercase                                  Change text to lowercase
--no-hyphenation                             If renderer should be splitting up words using a hyphen
                                             character (-)
--manga2eng                                  Render english text translated from manga with some
                                             additional typesetting. Ignores some other argument
                                             options
--gpt-config GPT_CONFIG                      Path to GPT config file, more info in README
--use-mtpe                                   Turn on/off machine translation post editing (MTPE) on
                                             the command line (works only on linux right now)
--save-text                                  Save extracted text and translations into a text file.
--save-text-file SAVE_TEXT_FILE              Like --save-text but with a specified file path.
--filter-text FILTER_TEXT                    Filter regions by their text with a regex. Example
                                             usage: --text-filter ".*badtext.*"
--pre-dict FILe_PATH                         Path to the pre-translation dictionary file. One entry per line,
                                             Comments can be added with `#` and `//`.
                                             usage: //Example
                                                    dog cat #Example
                                                    abc def
                                                    abc
--post-dict FILE_PATH                        Path to the post-translation dictionary file. Same as above.
--skip-lang                                  Skip translation if source image is one of the provide languages, 
                                             use comma to separate multiple languages. Example: JPN,ENG
--prep-manual                                Prepare for manual typesetting by outputting blank,
                                             inpainted images, plus copies of the original for
                                             reference
--font-path FONT_PATH                        Path to font file
--gimp-font GIMP_FONT                        Font family to use for gimp rendering.
--host HOST                                  Used by web module to decide which host to attach to
--port PORT                                  Used by web module to decide which port to attach to
--nonce NONCE                                Used by web module as secret for securing internal web
                                             server communication
--ws-url WS_URL                              Server URL for WebSocket mode
--save-quality SAVE_QUALITY                  Quality of saved JPEG image, range from 0 to 100 with
                                             100 being best
--ignore-bubble IGNORE_BUBBLE                The threshold for ignoring text in non bubble areas,
                                             with valid values ranging from 1 to 50, does not ignore
                                             others. Recommendation 5 to 10. If it is too low,
                                             normal bubble areas may be ignored, and if it is too
                                             large, non bubble areas may be considered normal
                                             bubbles

语言代码参考

由--target-lang或-l参数使用。

 CHS : Chinese (Simplified)
CHT : Chinese (Traditional)
CSY : Czech
NLD : Dutch
ENG : English
FRA : French
DEU : German
HUN : Hungarian
ITA : Italian
JPN : Japanese
KOR : Korean
PLK : Polish
PTB : Portuguese (Brazil)
ROM : Romanian
RUS : Russian
ESP : Spanish
TRK : Turkish
UKR : Ukrainian
VIN : Vietnames
ARA : Arabic
SRP : Serbian
HRV : Croatian
THA : Thai
IND : Indonesian
FIL : Filipino (Tagalog)

译者参考

姓名	API密钥	离线	笔记
谷歌			暂时禁用
有道	✔️		需要`YOUDAO_APP_KEY`和`YOUDAO_SECRET_KEY`
百度	✔️		需要`BAIDU_APP_ID`和`BAIDU_SECRET_KEY`
深度	✔️		需要`DEEPL_AUTH_KEY`
彩云	✔️		需要`CAIYUN_TOKEN`
总蛋白三	✔️		实现text-davinci-003。需要`OPENAI_API_KEY`
gpt3.5	✔️		实现 gpt-3.5-turbo。需要`OPENAI_API_KEY`
组蛋白4	✔️		实现 gpt-4。需要`OPENAI_API_KEY`
帕帕戈
樱花			需要`SAKURA_API_BASE`
离线		✔️	选择最适合语言的离线翻译器
苏戈伊		✔️	Sugoi V4.0 型号
米2米100		✔️	支持每种语言
m2m100_big		✔️
没有任何		✔️	翻译为空文本
原来的		✔️	保留原文

API密钥：翻译器是否需要将API密钥设置为环境变量。为此，您可以在项目根目录中创建一个包含 api 密钥的 .env 文件，如下所示：

 OPENAI_API_KEY = sk-xxxxxxx...
DEEPL_AUTH_KEY = xxxxxxxx...

离线：翻译器是否可以离线使用。
Sugoi由mingshiba创建，请支持他https://www.patreon.com/mingshiba

GPT 配置参考

由--gpt-config参数使用。

 # The prompt being feed into GPT before the text to translate.
# Use {to_lang} to indicate where the target language name should be inserted.
# Note: ChatGPT models don't use this prompt.
prompt_template : >
  Please help me to translate the following text from a manga to {to_lang}
  (if it's already in {to_lang} or looks like gibberish you have to output it as it is instead):n

# What sampling temperature to use, between 0 and 2.
# Higher values like 0.8 will make the output more random,
# while lower values like 0.2 will make it more focused and deterministic.
temperature : 0.5

# An alternative to sampling with temperature, called nucleus sampling,
# where the model considers the results of the tokens with top_p probability mass.
# So 0.1 means only the tokens comprising the top 10% probability mass are considered.
top_p : 1

# The prompt being feed into ChatGPT before the text to translate.
# Use {to_lang} to indicate where the target language name should be inserted.
# Tokens used in this example: 57+
chat_system_template : >
  You are a professional translation engine, 
  please translate the story into a colloquial, 
  elegant and fluent content, 
  without referencing machine translations. 
  You must only translate the story, never interpret it.
  If there is any issue in the text, output it as is.

  Translate to {to_lang}.

# Samples being feed into ChatGPT to show an example conversation.
# In a [prompt, response] format, keyed by the target language name.
#
# Generally, samples should include some examples of translation preferences, and ideally
# some names of characters it's likely to encounter.
#
# If you'd like to disable this feature, just set this to an empty list.
chat_sample :
  Simplified Chinese : # Tokens used in this example: 88 + 84
    - <|1|>恥ずかしい… 目立ちたくない… 私が消えたい…
      <|2|>きみ… 大丈夫⁉
      <|3|>なんだこいつ 空気読めて ないのか…？
    - <|1|>好尴尬…我不想引人注目…我想消失…
      <|2|>你…没事吧⁉
      <|3|>这家伙怎么看不懂气氛的…？

# Overwrite configs for a specific model.
# For now the list is: gpt3, gpt35, gpt4
gpt35 :
  temperature : 0.3

使用Gimp进行渲染

当将输出格式设置为 { xcf , psd , pdf } 时，将使用 Gimp 生成文件。

在 Windows 上，假定 Gimp 2.x 安装到C:Users<Username>AppDataLocalProgramsGimp 2 。

生成的.xcf文件包含原始图像作为最低层，并将修复作为单独的层。翻译后的文本框有自己的图层，以原始文本作为图层名称，以便于访问。

限制：

保存.psd文件时，Gimp 会将文本图层转换为常规图像。
Gimp 中不能很好地处理旋转文本。编辑旋转的文本框时，它还会显示一个弹出窗口，表明它已被外部程序修改。
字体系列是通过--gimp-font参数单独控制的。

API文档

API V2

 # use `--mode api` to start a web server.
$ python -m manga_translator -v --mode api --use-gpu
# the api will be serving on http://127.0.0.1:5003

API 接受 json(post) 和 multipart。
API 端点为/colorize_translate 、 /inpaint_translate 、 /translate 、 /get_text 。
api 的有效参数是：

 // These are taken from args.py. For more info see README.md
detector: String
ocr: String
inpainter: String
upscaler: String
translator: String 
target_language: String
upscale_ratio: Integer
translator_chain: String
selective_translation: String
attempts: Integer
detection_size: Integer // 1024 => 'S', 1536 => 'M', 2048 => 'L', 2560 => 'X'
text_threshold: Float
box_threshold: Float
unclip_ratio: Float
inpainting_size: Integer
det_rotate: Bool
det_auto_rotate: Bool
det_invert: Bool
det_gamma_correct: Bool
min_text_length: Integer
colorization_size: Integer
denoise_sigma: Integer
mask_dilation_offset: Integer
ignore_bubble: Integer
gpt_config: String
filter_text: String
overlay_type: String

// These are api specific args
direction: String // {'auto', 'h', 'v'}
base64Images: String //Image in base64 format
image: Multipart // image upload from multipart
url: String // an url string

手动翻译以人工翻译取代机器翻译。使用网页模式时，可以在 http://127.0.0.1:5003/manual 找到基本的手动翻译演示。

应用程序编程接口

该demo提供了两种模式的翻译服务：同步模式和异步模式。
在同步模式下，一旦翻译任务完成，您的 HTTP POST 请求就会完成。
在异步模式下，您的 HTTP POST 请求将立即响应一个task_id ，您可以使用此task_id来轮询翻译任务状态。

同步模式

POST 带有表单数据file:<content-of-image>到 http://127.0.0.1:5003/run
等待回复
使用结果task_id在result/目录中查找翻译结果，例如使用Nginx公开result/

异步模式

POST 带有表单数据file:<content-of-image>到 http://127.0.0.1:5003/submit
获取翻译task_id
通过将 JSON {"taskid": <task-id>}发布到 http://127.0.0.1:5003/task-state 来轮询翻译任务状态
当结果状态为finished 、 error或error-lang时，翻译完成
在result/目录中查找翻译结果，例如使用 Nginx 公开result/

手动翻译

将带有表单数据file:<content-of-image>到 http://127.0.0.1:5003/manual-translate 并等待响应。

您将获得如下 JSON 响应：

{
  "task_id" : " 12c779c9431f954971cae720eb104499 " ,
  "status" : " pending " ,
  "trans_result" : [
    {
      "s" : " ☆上司来ちゃった…… " ,
      "t" : " "
    }
  ]
}

填写翻译文本：

{
  "task_id" : " 12c779c9431f954971cae720eb104499 " ,
  "status" : " pending " ,
  "trans_result" : [
    {
      "s" : " ☆上司来ちゃった…… " ,
      "t" : " ☆Boss is here... "
    }
  ]
}

将翻译后的 JSON 发布到 http://127.0.0.1:5003/post-manual-result 并等待响应。
然后你可以在result/目录中找到翻译结果，例如使用 Nginx 公开result/ 。

后续步骤

接下来需要做什么的列表，欢迎您贡献。

使用基于扩散模型的修复来实现近乎完美的结果，但这可能会慢得多。
重要！！！需要帮助！！！当前的文本渲染引擎勉强可用，我们需要您的帮助来改进文本渲染！
文本渲染区域由检测到的文本行确定，而不是气泡。
这适用于没有气泡的图像，但无法决定将翻译后的英文文本放在哪里。我不知道如何解决这个问题。
良太等人。建议使用多模态机器翻译，也许我们可以添加 ViT 功能来构建自定义 NMT 模型。
使该项目适用于视频（用 C++ 重写代码并使用 GPU/其他硬件神经网络加速器）。
用于检测视频中的硬字幕，生成ass文件并彻底删除。
~~基于使用非深度学习算法的掩模细化，我目前正在测试基于 CRF 的算法。~~
~~目前不支持倾斜文本区域合并~~
创建 pip 存储库

支持我们

GPU服务器并不便宜，请考虑捐赠给我们。

Ko-fi：https://ko-fi.com/voilelabs
帕特隆：https://www.patreon.com/voilelabs
爱发电：https://afdian.net/@voilelabs
感谢我们所有的贡献者：

展开

manga image translator