SadTalker 다운로드 - SadTalker 소스 코드 다운로드

SadTalker

기타 소스코드

v0.0.2 rc Release Note

다운로드

Wenxuan Zhang ^*,1,2 Xiaodong Cun ^*,2 Xuan Wang ³ Yong Zhang ² Xi Shen ²
위궈 ¹ 잉샨 ² 페이왕 ¹

¹ 시안교통대학교 ² 텐센트 AI 연구소 ³ 앤트그룹

CVPR 2023

슬픈 말을 하는 사람

TL;DR: 단일 인물 이미지 ?‍♂️ + 오디오 ? = 말하는 머리 영상?.

하이라이트

라이센스가 Apache 2.0으로 업데이트되었으며 비상업적 제한이 제거되었습니다.
SadTalker는 이제 공식적으로 Discord에 통합되어 파일 전송을 통해 무료로 사용할 수 있습니다. 텍스트 프롬프트에서 고품질 비디오를 생성할 수도 있습니다. 가입하다:
stable-diffusion-webui 확장을 게시했습니다. 자세한 내용은 여기에서 확인하세요. 데모 비디오
이제 전체 이미지 모드를 사용할 수 있습니다! 자세한 내용은...

v0.0.1의 Still+Enhancer	v0.0.2의 Still + Enhancer	입력 이미지 @bagbag1815
still_e_n.mp4	full_body_2.bus_chinese_enhanced.mp4

이제 여러 가지 새로운 모드(스틸 모드, 참조 모드, 크기 조정 모드)를 사용할 수 있습니다!
bilibili, YouTube 및 X(#sadtalker)에서 더 많은 커뮤니티 데모를 볼 수 있게 되어 기쁩니다.

변경 내역

이전 변경 내역은 여기에서 확인할 수 있습니다.

[2023.06.12] : WebUI 확장에 새로운 기능이 추가되었습니다. 여기에서 토론을 참조하세요.
[2023.06.05] : 새로운 512x512px(베타) 얼굴 모델을 출시했습니다. 일부 버그를 수정하고 성능을 개선했습니다.
[2023.04.15] : @camenduru의 WebUI Colab 노트북이 추가되었습니다.
[2023.04.12] : WebUI 설치 문서를 좀 더 자세하게 추가하고, 재설치 시 발생하는 문제를 수정했습니다.
[2023.04.12] : 타사 패키지로 인한 WebUI 안전 문제를 수정하고 sd-webui-extension 의 출력 경로를 최적화했습니다.
[2023.04.08] : v0.0.2에서는 악용 방지를 위해 생성된 영상에 로고 워터마크를 추가하였습니다. 이 워터마크는 이후 릴리스에서 제거되었습니다.
[2023.04.08] : v0.0.2에서는 전체 이미지 애니메이션 기능과 바이두에서 체크포인트를 다운로드할 수 있는 링크를 추가했습니다. 또한 강화 로직도 최적화했습니다.

할 일

문제 #280의 새로운 업데이트를 추적하고 있습니다.

문제 해결

문제가 있는 경우 문제를 열기 전에 FAQ를 읽어보세요.

1. 설치.

커뮤니티 튜토리얼: 中文Windows教程(중국어 Windows 튜토리얼) | 일본어코스(일본어 튜토리얼).

리눅스/유닉스

Anaconda, Python 및 git 설치합니다.
환경을 생성하고 요구사항을 설치합니다.

git clone https://github.com/OpenTalker/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

# ## Coqui TTS is optional for gradio demo. 
# ## pip install TTS

윈도우

여기에서 중국어로 된 비디오 튜토리얼을 볼 수 있습니다. 다음 지침을 따를 수도 있습니다.

Python 3.8을 설치하고 "PATH에 Python 추가"를 확인하십시오.
git을 수동으로 설치하거나 Scoop: scoop install git 사용하여 설치합니다.
이 튜토리얼을 따르거나 Scoop: scoop install ffmpeg 사용하여 ffmpeg 설치하세요.
git clone https://github.com/Winfredy/SadTalker.git 실행하여 SadTalker 저장소를 다운로드하세요.
다운로드 섹션에서 체크포인트와 gfpgan 모델을 다운로드하세요.
일반, 비관리자, 사용자로 Windows 탐색기에서 start.bat 실행하면 Gradio 기반 WebUI 데모가 시작됩니다.

macOS

macOS에 SadTalker를 설치하는 방법에 대한 튜토리얼은 여기에서 찾을 수 있습니다.

도커, WSL 등

여기에서 추가 튜토리얼을 확인하세요.

2. 모델 다운로드

Linux/macOS에서 다음 스크립트를 실행하여 모든 모델을 자동으로 다운로드할 수 있습니다.

bash scripts/download_models.sh

또한 오프라인 패치( gfpgan/ )를 제공하므로 생성 시 모델이 다운로드되지 않습니다.

사전 훈련된 모델

구글 드라이브
GitHub 릴리스
Baidu (百島云盘) (비밀번호: sadt )

GFPGAN 오프라인 패치

구글 드라이브
GitHub 릴리스
Baidu (百島云盘) (비밀번호: sadt )

모델 세부정보

모델은 다음과 같이 설명합니다.

새 버전

모델	설명
체크포인트/mapping_00229-model.pth.tar	Sadtalker에서 사전 훈련된 MappingNet.
체크포인트/mapping_00109-model.pth.tar	Sadtalker에서 사전 훈련된 MappingNet.
체크포인트/SadTalker_V0.0.2_256.safetensors	이전 버전의 패키지된 sadtalker 체크포인트, 256개의 얼굴 렌더링).
체크포인트/SadTalker_V0.0.2_512.safetensors	이전 버전의 패키지된 sadtalker 체크포인트, 512 얼굴 렌더링).
gfpgan/가중치	`facexlib` 및 `gfpgan` 에 사용되는 얼굴 감지 및 향상된 모델입니다.

이전 버전

모델	설명
체크포인트/auido2exp_00300-model.pth	Sadtalker에서 사전 훈련된 ExpNet.
체크포인트/auido2pose_00140-model.pth	Sadtalker에서 사전 훈련된 PoseVAE.
체크포인트/mapping_00229-model.pth.tar	Sadtalker에서 사전 훈련된 MappingNet.
체크포인트/mapping_00109-model.pth.tar	Sadtalker에서 사전 훈련된 MappingNet.
체크포인트/facevid2vid_00189-model.pth.tar	Face-vid2vid의 재현으로 사전 훈련된 Face-vid2vid 모델입니다.
체크포인트/epoch_20.pth	Deep3DFaceReconstruction의 사전 훈련된 3DMM 추출기.
체크포인트/wav2lip.pth	Wav2lip의 매우 정확한 립싱크 모델.
체크포인트/shape_predictor_68_face_landmarks.dat	dilb에서 사용되는 얼굴 랜드마크 모델.
검문소/BFM	3DMM 라이브러리 파일.
체크포인트/허브	얼굴 정렬에 사용되는 얼굴 감지 모델.
gfpgan/가중치	`facexlib` 및 `gfpgan` 에 사용되는 얼굴 감지 및 향상된 모델입니다.

최종 폴더는 다음과 같이 표시됩니다.

3. 빠른 시작

모범 사례 및 구성 팁에 대한 문서를 읽어 보십시오.

WebUI 데모

온라인 데모 : HuggingFace | SDWebUI-Colab | 코랩

로컬 WebUI 확장 : WebUI 문서를 참고하세요.

로컬 그라디오 데모(권장) : Hugging Face 데모와 유사한 Gradio 인스턴스를 로컬에서 실행할 수 있습니다.

 # # you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app_sadtalker.py

더 쉽게 시작할 수도 있습니다.

windows: webui.bat 두 번 클릭하면 요구 사항이 자동으로 설치됩니다.
Linux/Mac OS: bash webui.sh 실행하여 webui를 시작합니다.

CLI 사용법

기본 구성에서 세로 이미지 애니메이션:

python inference.py --driven_audio < audio.wav > 
                    --source_image < video.mp4 or picture.png > 
                    --enhancer gfpgan

결과는 results/$SOME_TIMESTAMP/*.mp4 에 저장됩니다.

전신/이미지 생성:

--still 사용하여 자연스러운 전신 영상을 생성합니다. enhancer 추가하여 생성된 비디오의 품질을 향상할 수 있습니다.

python inference.py --driven_audio < audio.wav > 
                    --source_image < video.mp4 or picture.png > 
                    --result_dir < a file to store results > 
                    --still 
                    --preprocess full 
                    --enhancer gfpgan

>>> 모범 사례 문서 <<<에서 더 많은 예제와 구성 및 팁을 확인할 수 있습니다.

소환

귀하의 연구에 우리의 연구가 유용하다고 생각되면 다음을 인용해 보십시오.

 @article { zhang2022sadtalker ,
  title = { SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation } ,
  author = { Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei } ,
  journal = { arXiv preprint arXiv:2211.12194 } ,
  year = { 2022 }
}

감사의 말

Facerender 코드는 zhanglonghao의 Face-vid2vid 및 PIRender 재현에서 많이 차용했습니다. 훌륭한 코드를 공유해 주신 작성자에게 감사드립니다. 훈련 과정에서 우리는 Deep3DFaceReconstruction과 Wav2lip의 모델도 사용했습니다. 그들의 훌륭한 작업에 감사드립니다.

또한 다음과 같은 타사 라이브러리도 사용합니다.

페이스 유틸리티 : https://github.com/xinntao/facexlib
얼굴 강화 : https://github.com/TencentARC/GFPGAN
이미지/비디오 향상 :https://github.com/xinntao/Real-ESRGAN

확장:

@Zz-ww의 SadTalker-Video-Lip-Sync: 비디오 립 편집을 위한 SadTalker

부인 성명

이것은 Tencent의 공식 제품이 아닙니다.

 1. Please carefully read and comply with the open-source license applicable to this code before using it. 
2. Please carefully read and comply with the intellectual property declaration applicable to this code before using it.
3. This open-source code runs completely offline and does not collect any personal information or other data. If you use this code to provide services to end-users and collect related data, please take necessary compliance measures according to applicable laws and regulations (such as publishing privacy policies, adopting necessary data security strategies, etc.). If the collected data involves personal information, user consent must be obtained (if applicable). Any legal liabilities arising from this are unrelated to Tencent.
4. Without Tencent's written permission, you are not authorized to use the names or logos legally owned by Tencent, such as "Tencent." Otherwise, you may be liable for legal responsibilities.
5. This open-source code does not have the ability to directly provide services to end-users. If you need to use this code for further model training or demos, as part of your product to provide services to end-users, or for similar use, please comply with applicable laws and regulations for your product or service. Any legal liabilities arising from this are unrelated to Tencent.
6. It is prohibited to use this open-source code for activities that harm the legitimate rights and interests of others (including but not limited to fraud, deception, infringement of others' portrait rights, reputation rights, etc.), or other behaviors that violate applicable laws and regulations or go against social ethics and good customs (including providing incorrect or false information, spreading pornographic, terrorist, and violent information, etc.). Otherwise, you may be liable for legal responsibilities.

로고: 색상 및 글꼴 제안: ChatGPT, 로고 글꼴: Montserrat Alternates .

확장하다

추가 정보

버전 v0.0.2 rc Release Note
유형 기타 소스코드
업데이트 시간 2024-12-05
크기 50MB
출처 Github

SadTalker

하이라이트

변경 내역

할 일

문제 해결

1. 설치.

리눅스/유닉스

윈도우

macOS

도커, WSL 등

2. 모델 다운로드

사전 훈련된 모델

GFPGAN 오프라인 패치

새 버전

이전 버전

3. 빠른 시작

WebUI 데모

CLI 사용법

기본 구성에서 세로 이미지 애니메이션:

전신/이미지 생성:

소환

감사의 말

확장:

관련 작품

부인 성명

waymo open dataset

SmartTube

Sunamu

MySchedule.py

viptools for eslam

VITAident

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind