Digital Life DL B Download - Digital Life DL B Source code download

Digital Life DL B

Other source code

1.0.0

Download

Digital Life DL-B

DL-B

DL-C & partial DL-D demonstration • AI Society

introduce

This open source is DL-B, which is a digital image solution based on ChatGLM, Wav2lip, and so-vits-svc. The code base was written in mid-March 2023 and has not been optimized or updated since then.

I am currently competing on this project. The competition will enter the provincial competition stage in late June. The project team is currently advancing to the subsequent optimization and improvement of DL-C and the testing and development of DL-D. No codes and details about DL-C and DL-D will be released until the end of the competition. The code and detailed framework will be compiled and updated after the competition. Forgot to forgive.

The current code is too rigid. I am a second-year undergraduate student majoring in finance. I have no aesthetic or technical skills in code writing (C+V weird). Please don’t criticize me.

After the competition, the project will be taken over by the AI Society and a user-friendly framework will be produced in the future, with a full-process lazy package.

Hardware requirements

The platform used for DL-B production is provided here as a reference (you are welcome to propose lower runnable configurations as a supplement)

graphics card	CPU	Memory	harddisk
RTX 3060 12G	Intel i5-12400F	16 GB	30G

Environment installation (organized in a hurry, the environment package is confusing, there are bugs, thank you for mentioning it)

The test environment is based on Python 3.9.13 64-bit

Use pip to install dependencies: pip install -r requirements.txt

It should be noted that you still need to download a Python 3.8 environment package for running So-VITS (click on the environment package), but don’t worry, I have already configured it for you, you only need to download and unzip it in DL-B folder and keep the file path

 DL-B
├───python3.8
   ├───Lib
   ├───libs
   ├───···
   └───Tools

In addition, you also need to install ffmpeg. If you don’t want to install it manually, you can also try using the lazy package we provide.

Model training

ChatGLM

ChatGLM has many fine-tuning methods, and users can choose the appropriate fine-tuning method according to their actual situation. Tsinghua University officials gave a detailed explanation of the fine-tuning of ChatGLM by P-tuning. There is a better fine-tuning example library on Github, which uses Zhen Huan as an example of fine-tuning. This library contains the code for P-tuning fine-tuning, but does not include the pre-trained model of ChatGLM.

The program will automatically download the model implementation and parameters from transformers . The complete model implementation can be found in Hugging Face Hub. If your network environment is poor, downloading model parameters may take a long time or even fail. At this time, you can first download the model to the local and then load it from the local.

To download the model from Hugging Face Hub, you need to install Git LFS first and then run

git clone https://huggingface.co/THUDM/chatglm-6b

If you are slow to download the checkpoint from Hugging Face Hub, you can only download the model implementation.

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm-6b

Then manually download the model parameter file from here and replace the downloaded file to the local module/chatglm-6b directory.

The implementation of the Optional model is still in flux. If you want to fix the model implementation used to ensure compatibility, you can execute

git checkout v0.1.0

Use your own data set

Here is a helpful library for collecting personal corpus

Modify train_file , validation_file and test_file in train.sh and evaluate.sh to your own JSON format data set path, and change prompt_column and response_column to the KEY corresponding to the input text and output text in the JSON file. You may also need to increase max_source_length and max_target_length to match the maximum input and output lengths in your own data set.

Conversation dataset

If you need to use multi-turn conversation data to fine-tune the model, you can provide chat history. For example, the following is the training data of a three-turn conversation:

{ "prompt" : "长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线" , "response" : "用电脑能读数据流吗？水温多少" , "history" : []}
{ "prompt" : " 95 " , "response" : "上下水管温差怎么样啊？空气是不是都排干净了呢？ " , "history" : [[ "长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线" , "用电脑能读数据流吗？水温多少" ]]}
{ "prompt" : "是的。上下水管都好的" , "response" : "那就要检查线路了，一般风扇继电器是由电脑控制吸合的，如果电路存在断路，或者电脑坏了的话会出现继电器不吸合的情况！ " , "history" : [[ "长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线" , "用电脑能读数据流吗？水温多少" ], [ " 95 " , "上下水管温差怎么样啊？空气是不是都排干净了呢？ " ]]}

During training, you need to specify --history_column as the key of the chat history in the data ( history in this example), and the chat history will be automatically spliced. Please note that content exceeding the input length max_source_length will be truncated.

You can refer to the following instructions:

bash train_chat.sh

Of course, you can also mix the corpus of multi-round dialogues and single-round dialogues together. Just add the following dialogue mode directly on top of the above.

{ "prompt" : "老刘，你知道那个作业要怎么从电脑上保存到手机上吗？ " , "response" : "我也不知道啊" , "history" :[]}

so-vits-svc

So-VITS is already a very popular and mature model, and there are many teaching videos on station B, so I won’t go into details here. Here are tutorials that I think are of very high quality and essence. This library contains the code for basic training and clustering training of So-VITS, but it is not very user-friendly, and no changes have been made to the content in DL-B after it was completed in March. What is needed here Note that this library does not include tools for data processing and preliminary preparation.

There are some model files that need to be completed, checkpoint_best_legacy_500.pt, placed under hubert , and two matching pre-trained models G_0.pth and D_0.pth placed under the .moduleSo-VITS and pre_trained_model folders.

Wav2Lip

This is an older method, and a lot of optimizations have been done in the latest framework. This version is based on the original Wav2Lip, and users can choose different pre-training model weights. The model here is a required download and is placed in the .modulewav2lip folder.

Model	describe	Link
Wav2Lip	High-precision lip sync	Link
Wav2Lip+GAN	The lip sync is slightly worse, but the visual quality is better	Link
Expert Discriminator		Link
Visual Quality Discriminator		Link

It should be noted that this library needs to collect some videos, which can be recorded using mobile phones, computers or cameras. It is used to collect facial information. The recommended format is .mp4 and the resolution is 720p or 480p . A single video is usually 5-10s. Multiple videos can be captured. Store the video files in the source folder.

Regarding the optimization of Wan2lip, many big guys on station B have already done it, so I won’t go into details (lazy). Here is a video.

Note that in addition to the above content, you also need to download a model s3fd.pth that needs to be used during the inference process and place it in the .face_detectiondetectionsfd folder

Source code changes

This library does not contain any models! ! It cannot be used after being pulled directly! ! It is necessary to train the model

The source code needs to be changed in the following places:

Place all fine-tuned models into the corresponding folders in module . Please copy all the files output to output after P-tuning training to the corresponding local output . So-VITS/44k is used to store So-VITS training models. The wav2lip+GAN model is stored under the wav2lip folder.

In line 32 of main_demo.py change CHECKPOINT_PATH to the model after personal fine-tuning

 prefix_state_dict = torch . load ( os . path . join ( CHECKPOINT_PATH , "pytorch_model.bin" ))

Note that you may need to change pre_seq_len to the actual value during your training. If you are loading the model locally, you need to change THUDM/chatglm-6b to the local model path (note that it is not the checkpoint path).

The default writing method in the source code is to load a new Checkpoint (only containing the PrefixEncoder parameter)

If you need to load the old Checkpoint (including ChatGLM-6B and PrefixEncoder parameters), or perform full parameter fine-tuning, load the entire Checkpoint directly:

 model = AutoModel . from_pretrained ( CHECKPOINT_PATH , trust_remote_code = True )

Add the model path and speaker name to So-VITS_run.py (depending on your training settings)

 parser . add_argument ( '-m' , '--model_path' , type = str , default = "" , help = '模型路径' )
parser . add_argument ( '-s' , '--spk_list' , type = str , nargs = '+' , default = [ '' ], help = '合成目标说话人名称' )