English | 中文
Original repository: LivePortrait, thanks to the authors for sharing
New features:
Achieved real-time running of LivePortrait on RTX 3090 GPU using TensorRT, reaching speeds of 30+ FPS. This is the speed for rendering a single frame, including pre- and post-processing, not just the model inference speed.
Implemented conversion of LivePortrait model to Onnx model, achieving inference speed of about 70ms/frame (~12 FPS) using onnxruntime-gpu on RTX 3090, facilitating cross-platform deployment.
Seamless support for native gradio app, with several times faster speed and support for simultaneous inference on multiple faces and Animal Model.
If you find this project useful, please give it a star ✨✨
Changelog
2024/08/11: Optimized paste_back speed and fixed some bugs.
Used torchgeometry + cuda to optimize the paste_back function, significantly improving speed. Example: python run.py --src_image assets/examples/source/s39.jpg --dri_video assets/examples/driving/d0.mp4 --cfg configs/trt_infer.yaml --paste_back --animal
Fixed issues with Xpose ops causing errors on some GPUs and other bugs. Please use the latest docker image: docker pull shaoguo/faster_liveportrait:v3
2024/08/07: Added support for animal models and MediaPipe models, so you no longer need to worry about copyright issues.
For web usage: python app.py --mode trt --mp
or python app.py --mode onnx --mp
For local webcam: python run.py --src_image assets/examples/source/s12.jpg --dri_video 0 --cfg configs/trt_mp_infer.yaml
Download the animal ONNX file: huggingface-cli download warmshao/FasterLivePortrait --local-dir ./checkpoints
, then convert it to TRT format.
Update the Docker image: docker pull shaoguo/faster_liveportrait:v3
. Using animal model:python run.py --src_image assets/examples/source/s39.jpg --dri_video 0 --cfg configs/trt_infer.yaml --realtime --animal
Windows users can download the latest Windows all-in-one package from the release page, then unzip and use it.
Simple usage tutorial:
Added support for animal models.
Using MediaPipe model to replace InsightFace
2024/07/24: Windows integration package, no installation required, one-click run, supports TensorRT and OnnxruntimeGPU. Thanks to @zhanghongyong123456 for their contribution in this issue.
Download CUDA 12.2, double-click the exe and install following the default settings step by step.
Download the cuDNN zip file, extract it, and copy the lib, bin, and include folders from the cuDNN folder to the CUDA 12.2 folder (default is C:Program FilesNVIDIA GPU Computing ToolkitCUDAv12.2)
[Optional] If you have already installed CUDA and cuDNN on your Windows computer, please skip this step. I have only verified on CUDA 12.2. If you haven't installed CUDA or encounter CUDA-related errors, you need to follow these steps:
Download the installation-free Windows integration package from the release page and extract it.
Enter FasterLivePortrait-windows
and double-click all_onnx2trt.bat
to convert onnx files, which will take some time.
For web demo: Double-click app.bat
, open the webpage: http://localhost:9870/
For real-time camera operation, double-click camera.bat
,press q
to stop. If you want to change the target image, run in command line: camera.bat assets/examples/source/s9.jpg
2024/07/18: macOS support added(No need for Docker, Python is enough). M1/M2 chips are faster, but it's still quite slow ?
Install ffmpeg: brew install ffmpeg
Set up a Python 3.10 virtual environment. Recommend using miniforge: conda create -n flip python=3.10 && conda activate flip
Install requirements: pip install -r requirements_macos.txt
Download ONNX files: huggingface-cli download warmshao/FasterLivePortrait --local-dir ./checkpoints
Test: python app.py --mode onnx
2024/07/17: Added support for Docker environment, providing a runnable image.
Option 1: Docker (recommended).A docker image is provided for eliminating the need to install onnxruntime-gpu and TensorRT manually.
docker run -it --gpus=all --name faster_liveportrait -v $FasterLivePortrait_ROOT:/root/FasterLivePortrait --restart=always -p 9870:9870 shaoguo/faster_liveportrait:v3 /bin/bash
Install Docker according to your system
Download the image: docker pull shaoguo/faster_liveportrait:v3
Execute the command, replace $FasterLivePortrait_ROOT
with the local directory where you downloaded FasterLivePortrait:
Option 2: Create a new Python virtual environment and install the necessary Python packages manually.
First, install ffmpeg
Run pip install -r requirements.txt
Then follow the tutorials below to install onnxruntime-gpu or TensorRT. Note that this has only been tested on Linux systems.
First, download the converted onnx model files:huggingface-cli download warmshao/FasterLivePortrait --local-dir ./checkpoints
.
(Ignored in Docker)If you want to use onnxruntime cpu inference, simply pip install onnxruntime
. However, cpu inference is extremely slow and not recommended. The latest onnxruntime-gpu still doesn't support grid_sample cuda, but I found a branch that supports it. Follow these steps to install onnxruntime-gpu
from source:
./build.sh --parallel --build_shared_lib --use_cuda --cuda_version 11.8 --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda/ --config Release --build_wheel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="60;70;75;80;86" --cmake_extra_defines CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc --disable_contrib_ops --allow_running_as_root
pip install build/Linux/Release/dist/onnxruntime_gpu-1.17.0-cp310-cp310-linux_x86_64.whl
git clone https://github.com/microsoft/onnxruntime
git checkout liqun/ImageDecoder-cuda
. Thanks to liqun for the grid_sample with cuda implementation!
Run the following commands to compile, changing cuda_version
and CMAKE_CUDA_ARCHITECTURES
according to your machine:
Test the pipeline using onnxruntime:
python run.py --src_image assets/examples/source/s10.jpg --dri_video assets/examples/driving/d14.mp4 --cfg configs/onnx_infer.yaml
(Ignored in Docker) Install TensorRT. Remember the installation path of TensorRT.
(Ignored in Docker) Install the grid_sample TensorRT plugin, as the model uses grid sample that requires 5D input, which is not supported by the native grid_sample operator.
git clone https://github.com/SeanWangJS/grid-sample3d-trt-plugin
Modify line 30 in CMakeLists.txt
to: set_target_properties(${PROJECT_NAME} PROPERTIES CUDA_ARCHITECTURES "60;70;75;80;86")
export PATH=/usr/local/cuda/bin:$PATH
mkdir build && cd build
cmake .. -DTensorRT_ROOT=$TENSORRT_HOME
, replace $TENSORRT_HOME with your own TensorRT root directory.
make
, remember the address of the .so file, replace /opt/grid-sample3d-trt-plugin/build/libgrid_sample_3d_plugin.so
in scripts/onnx2trt.py
and src/models/predictor.py
with your own .so file path
Download ONNX model files:huggingface-cli download warmshao/FasterLivePortrait --local-dir ./checkpoints
. Convert all ONNX models to TensorRT, run sh scripts/all_onnx2trt.sh
and sh scripts/all_onnx2trt_animal.sh
Test the pipeline using tensorrt:
python run.py --src_image assets/examples/source/s10.jpg --dri_video assets/examples/driving/d14.mp4 --cfg configs/trt_infer.yaml
To run in real-time using a camera:
python run.py --src_image assets/examples/source/s10.jpg --dri_video 0 --cfg configs/trt_infer.yaml --realtime
onnxruntime: python app.py --mode onnx
tensorrt: python app.py --mode trt
The default port is 9870. Open the webpage: http://localhost:9870/
Follow my shipinhao channel for continuous updates on my AIGC content. Feel free to message me for collaboration opportunities.