SadTalker下载 - SadTalker源码下载

SadTalker

其他源码

v0.0.2 rc Release Note

下载

张文轩^*,1,2村晓东^*,2王轩³张勇²沉希²
于果¹英山²王飞¹

¹西安交通大学²腾讯人工智能实验室³蚂蚁集团

CVPR 2023

悲言者

TL;DR：单个肖像图像？‍♂️ + 音频？ = 头部说话视频？

亮点

许可证已更新至Apache 2.0，我们已取消非商业限制
SadTalker现已正式集成到Discord中，您可以通过发送文件来免费使用它。您还可以根据文本提示生成高质量视频。加入：
我们发布了 stable-diffusion-webui 扩展。在这里查看更多详细信息。演示视频
全图像模式现已推出！更多详情...

v0.0.1 中仍然+增强器	v0.0.2 中仍然+增强器	输入图像@bagbag1815
still_e_n.mp4	full_body_2.bus_chinese_enhanced.mp4

现已推出多种新模式（静止、参考和调整大小模式）！
我们很高兴在 bilibili、YouTube 和 X (#sadtalker) 上看到更多社区演示。

变更日志

之前的变更日志可以在这里找到。

[2023.06.12] : 在WebUI扩展中添加了更多新功能，请参阅此处的讨论。
[2023.06.05] : 发布了新的512x512px（测试版）脸部模型。修复了一些错误并提高了性能。
[2023.04.15] ：添加了 @camenduru 的 WebUI Colab 笔记本：
[2023.04.12] ：添加了更详细的WebUI安装文档并修复了重新安装时的问题。
[2023.04.12] : 修复了第三方包导致的 WebUI 安全问题，并优化了sd-webui-extension中的输出路径。
[2023.04.08] ：在 v0.0.2 中，我们在生成的视频中添加了徽标水印以防止滥用。此水印已在后续版本中删除。
[2023.04.08] : 在 v0.0.2 中，我们添加了完整图像动画的功能以及从百度下载检查点的链接。我们还优化了增强器逻辑。

待办事项

我们正在跟踪第 280 期的新更新。

故障排除

如果您有任何问题，请在提出问题之前阅读我们的常见问题解答。

1、安装。

社区教程：中文Windows教程（中文Windows教程）| 日本语コーsu（日语教程）。

Linux/Unix

安装 Anaconda、Python 和git 。
创建环境并安装需求。

git clone https://github.com/OpenTalker/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

# ## Coqui TTS is optional for gradio demo. 
# ## pip install TTS

视窗

此处提供中文视频教程。您还可以按照以下说明操作：

安装Python 3.8并选中“将Python添加到PATH”。
手动安装 git 或使用 Scoop： scoop install git 。
按照本教程或使用 scoop 安装ffmpeg ： scoop install ffmpeg 。
通过运行git clone https://github.com/Winfredy/SadTalker.git下载 SadTalker 存储库。
在下载部分下载检查点和 gfpgan 模型。
像平常、非管理员、用户一样从 Windows 资源管理器运行start.bat ，然后将启动 Gradio 驱动的 WebUI 演示。

macOS

可以在此处找到有关在 macOS 上安装 SadTalker 的教程。

Docker、WSL 等

请在此处查看其他教程。

2.下载模型

您可以在Linux/macOS上运行以下脚本来自动下载所有模型：

bash scripts/download_models.sh

我们还提供了离线补丁（ gfpgan/ ），因此生成时不会下载任何模型。

预训练模型

谷歌云端硬盘
GitHub 发布
百度（百度云盘）（密码： sadt ）

GFPGAN离线补丁

谷歌云端硬盘
GitHub 发布
百度（百度云盘）（密码： sadt ）

型号详情

模型解释：

新版本

模型	描述
检查点/mapping_00229-model.pth.tar	在 Sadtalker 中预先训练的 MappingNet。
检查点/mapping_00109-model.pth.tar	在 Sadtalker 中预先训练的 MappingNet。
检查点/SadTalker_V0.0.2_256.safetensors	打包旧版本的sadtalker检查点，256个面部渲染）。
检查点/SadTalker_V0.0.2_512.safetensors	打包旧版本的sadtalker检查点，512面渲染）。
GFPGAN/权重	`facexlib`和`gfpgan`中使用的人脸检测和增强模型。

旧版

模型	描述
检查点/auido2exp_00300-model.pth	在 Sadtalker 中预先训练的 ExpNet。
检查点/auido2pose_00140-model.pth	在 Sadtalker 中预先训练的 PoseVAE。
检查点/mapping_00229-model.pth.tar	在 Sadtalker 中预先训练的 MappingNet。
检查点/mapping_00109-model.pth.tar	在 Sadtalker 中预先训练的 MappingNet。
检查点/facevid2vid_00189-model.pth.tar	来自face-vid2vid再现的预训练face-vid2vid模型。
检查点/epoch_20.pth	Deep3DFaceReconstruction 中预训练的 3DMM 提取器。
检查点/wav2lip.pth	Wav2lip 中的高精度唇形同步模型。
检查点/shape_predictor_68_face_landmarks.dat	dilb中使用的人脸地标模型。
检查站/BFM	3DMM 库文件。
检查站/枢纽	用于人脸对齐的人脸检测模型。
GFPGAN/权重	`facexlib`和`gfpgan`中使用的人脸检测和增强模型。

最终文件夹将显示为：

3. 快速入门

请阅读我们有关最佳实践和配置提示的文档

网页界面演示

在线演示：HuggingFace | SDWebUI-Colab |科拉布

本地 WebUI 扩展：请参阅 WebUI 文档。

本地 gradio 演示（推荐） ：类似于我们的 Hugging Face 演示的 Gradio 实例可以在本地运行：

 # # you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app_sadtalker.py

您还可以更轻松地启动它：

windows：只需双击webui.bat ，需求就会自动安装。
Linux/Mac OS：运行bash webui.sh启动 webui。

CLI 用法

从默认配置对肖像图像进行动画处理：

python inference.py --driven_audio < audio.wav > 
                    --source_image < video.mp4 or picture.png > 
                    --enhancer gfpgan

结果将保存在results/$SOME_TIMESTAMP/*.mp4中。

全身/图像生成：

使用--still生成自然的全身视频。您可以添加enhancer来提高生成视频的质量。

python inference.py --driven_audio < audio.wav > 
                    --source_image < video.mp4 or picture.png > 
                    --result_dir < a file to store results > 
                    --still 
                    --preprocess full 
                    --enhancer gfpgan

更多示例、配置和技巧可以在>>>最佳实践文档<<<中找到。

引文

如果您发现我们的工作对您的研究有用，请考虑引用：

 @article { zhang2022sadtalker ,
  title = { SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation } ,
  author = { Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei } ,
  journal = { arXiv preprint arXiv:2211.12194 } ,
  year = { 2022 }
}

致谢

Facerender代码大量借鉴了zhanglonghao对face-vid2vid和PIRender的复制。我们感谢作者分享他们精彩的代码。在训练过程中，我们还使用了 Deep3DFaceReconstruction 和 Wav2lip 的模型。我们感谢他们的出色工作。

我们还使用以下第三方库：

脸部工具：https://github.com/xinntao/facexlib
人脸增强：https://github.com/TencentARC/GFPGAN
图像/视频增强：https://github.com/xinntao/Real-ESRGAN

扩展：

来自 @Zz-ww 的 SadTalker-Video-Lip-Sync：用于视频唇形编辑的 SadTalker

免责声明

这不是腾讯的官方产品。

 1. Please carefully read and comply with the open-source license applicable to this code before using it. 
2. Please carefully read and comply with the intellectual property declaration applicable to this code before using it.
3. This open-source code runs completely offline and does not collect any personal information or other data. If you use this code to provide services to end-users and collect related data, please take necessary compliance measures according to applicable laws and regulations (such as publishing privacy policies, adopting necessary data security strategies, etc.). If the collected data involves personal information, user consent must be obtained (if applicable). Any legal liabilities arising from this are unrelated to Tencent.
4. Without Tencent's written permission, you are not authorized to use the names or logos legally owned by Tencent, such as "Tencent." Otherwise, you may be liable for legal responsibilities.
5. This open-source code does not have the ability to directly provide services to end-users. If you need to use this code for further model training or demos, as part of your product to provide services to end-users, or for similar use, please comply with applicable laws and regulations for your product or service. Any legal liabilities arising from this are unrelated to Tencent.
6. It is prohibited to use this open-source code for activities that harm the legitimate rights and interests of others (including but not limited to fraud, deception, infringement of others' portrait rights, reputation rights, etc.), or other behaviors that violate applicable laws and regulations or go against social ethics and good customs (including providing incorrect or false information, spreading pornographic, terrorist, and violent information, etc.). Otherwise, you may be liable for legal responsibilities.

LOGO：颜色和字体建议：ChatGPT，标志字体：Montserrat Alternates。

演示图像和音频的所有版权均来自社区用户或稳定扩散的生成。如果您想使用删除它们，请随时与我们联系。

展开

附加信息

版本 v0.0.2 rc Release Note
类型其他源码
更新时间 2024-12-05
大小 50MB
来自于 Github

SadTalker

亮点

变更日志

待办事项

故障排除

1、安装。

Linux/Unix

视窗

macOS

Docker、WSL 等

2.下载模型

预训练模型

GFPGAN离线补丁

新版本

旧版

3. 快速入门

网页界面演示

CLI 用法

从默认配置对肖像图像进行动画处理：

全身/图像生成：

引文

致谢

扩展：

相关作品

免责声明

waymo open dataset

SmartTube

Sunamu

MySchedule.py

viptools for eslam

VITAident

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind