Tammy is a Python/Pytorch-based open-source project that uses deep learning models to generate original music videos. It allows users to automatically generate videos based on text prompt transitions that are synchronized with various aspects of a song, such as its BPM or piano pattern. The project utilizes deep learning models at various stages of the video generation process, including audio source separation with LSTMs, frame generation with GANs, spatial upscaling with super-resolution models, and temporal upsampling with frame interpolation models. The aim of this project is to provide an easy-to-use framework to build custom model pipelines to create unique music videos.
Features
Quick start
Dataflow and Code Structure
Generation Settings
More Examples
Contributing
For a quick start:
sudo apt-get install ffmpeg libsndfile1
and git-lfs.pip install .
python run_tammy.py
which will use the default settings in settingssettings_cpu.yaml
and default song thoughtsarebeings_clip.wav
.The tammy
package can be easily used in your own script or other setting files and audio files can be used with the existing run_tammy.py
script by running python run_tammy.py --settings_file
.
tammy.prompthandler
generates the settings for every frame to be generated (e.g. translation or text prompt) based on a more concise description of the generation settings.tammy.sequence_maker
has a generator
which generates an image sequence based on a text prompts. Currently the supported models are VQGAN-CLIP and Stable-Diffusiontammy.upscaling
scales up the generated images with super-resolution. Currently the only supported model is SwinIR.tammy.superslowmo
interpolates generated (optionally upscaled) images to increase the FPS without needing to generate every frame with a sequence_maker
. Currently the only supported model is SuperSloMo.The video generation has many configuration settings which are specified in a
file. Some example setting files, mostly used for testing, can be found in the settings
folder. Most setting names (keys in the settings.yaml
) should be self-explanatory. For clarity, some settings are explained below.
Instruments are used to steer frame transitions, in particular: zoom in Animation_2d mode and prompt transition speed in Interpolation mode. tammy
has two options to provide instruments:
do_spleet: True
and provide instrument:
zoom_instrument:
and name the file: file_name_fps.txt
where fps
should correspond with the fps
value in sequence_settings.initial_fps
. Keyframes can be manually generated with e.g. https://www.chigozie.co.uk/audio-keyframe-generator/The setting sequence_settings.initial_fps
determines the number of frames generated, given the length of the audio clip. By using frame interpolation, the frame-rate can be increased to a target by setting do_slowmo: True
and providing a target_fps
which must be a multiple of initial_fps
. This allows to produce high frame rate videos faster than compared to generating all frames from scratch with the generator
.
If desired, the number of generated frames can be limited by providing sequence_settings.max_frames
. In this case the generated video length will be shorter than the provided audio clip and will be: max_frames
/initial_fps
.
Video generated using VQGAN-CLIP and Animation_2d mode from tammy
.
Full video (watch in 4K for best experience!: https://www.youtube.com/watch?v=T_bii9VLDk0
Videos generated using Stable Diffusion and Interpolation mode from tammy
.
.pytest
.pre-commit install
.