awesome colab notebooks
1.0.0
The page might not be rendered properly. Please open README.md file directly
repositories | papers |
---|---|
|
|
name | description | authors | links | colaboratory | update |
---|---|---|---|---|---|
CoTracker | Architecture that jointly tracks multiple points throughout an entire video |
others |
|
16.10.2024 | |
PIFu | Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization |
|
|
08.10.2024 | |
DifFace | Method that is capable of coping with unseen and complex degradations more gracefully without complicated loss designs |
|
|
05.10.2024 | |
Segment Anything 2 | Foundation model towards solving promptable visual segmentation in images and videos |
others |
|
01.10.2024 | |
Open-Unmix | A deep neural network reference implementation for music source separation, applicable for researchers, audio engineers and artists |
|
|
25.09.2024 | |
Deep Painterly Harmonization | Algorithm produces significantly better results than photo compositing or global stylization techniques and that it enables creative painterly edits that would be otherwise difficult to achieve |
|
|
23.09.2024 | |
audio2photoreal | Framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction |
others |
|
13.09.2024 | |
Fast Segment Anything | CNN Segment Anything Model trained using only 2% of the SA-1B dataset published by SAM authors |
others |
|
10.09.2024 | |
Neuralangelo | Framework for high-fidelity 3D surface reconstruction from RGB video captures |
others |
|
02.09.2024 | |
BiRefNet | Bilateral reference framework for high-resolution dichotomous image segmentation |
others |
|
23.08.2024 | |
SPIN | Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop |
|
|
21.08.2024 | |
YOLOv10 | Aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture |
others |
|
20.08.2024 | |
SpecVQGAN | Taming the visually guided sound generation by shrinking a training dataset to a set of representative vectors |
|
|
12.07.2024 | |
LivePortrait | Video-driven portrait animation framework with a focus on better generalization, controllability, and efficiency for practical usage |
others |
|
10.07.2024 | |
TAPIR | Tracking Any Point with per-frame Initialization and temporal Refinement |
others |
|
05.07.2024 | |
Wav2Lip | A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild |
|
|
27.06.2024 | |
DeepLabCut | Efficient method for markerless pose estimation based on transfer learning with deep neural networks that achieves excellent results with minimal training data |
others |
|
05.06.2024 | |
PoolFormer | MetaFormer Is Actually What You Need for Vision |
others |
|
01.06.2024 | |
StoryDiffusion | Way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent pretrained diffusion-based text-to-image models in a zero-shot manner |
|
|
04.05.2024 | |
PuLID | Pure and Lightning ID customization, a tuning-free ID customization method for text-to-image generation |
|
|
03.05.2024 | |
FILM | A frame interpolation algorithm that synthesizes multiple intermediate frames from two input images with large in-between motion |
others |
|
03.05.2024 | |
VoiceCraft | token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech on audiobooks, internet videos, and podcasts |
|
|
21.04.2024 | |
ZeST | Method for zero-shot material transfer to an object in the input image given a material exemplar image |
|
|
16.04.2024 | |
InstantMesh | Feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability |
others |
|
16.04.2024 | |
AlphaFold | Highly accurate protein structure prediction |
others |
|
15.04.2024 | |
Würstchen | Architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models |
|
|
06.04.2024 | |
AQLM | Extreme Compression of Large Language Models via Additive Quantization |
others |
|
08.03.2024 | |
YOLOv9 | Learning What You Want to Learn Using Programmable Gradient Information |
|
|
05.03.2024 | |
Multi-LoRA Composition | LoRA Switch and LoRA Composite, approaches that aim to surpass traditional techniques in terms of accuracy and image quality, especially in complex compositions |
others |
|
03.03.2024 | |
AMARETTO | Multiscale and multimodal inference of regulatory networks to identify cell circuits and their drivers shared and distinct within and across biological systems of human disease |
others |
|
28.02.2024 | |
LIDA | Tool for generating grammar-agnostic visualizations and infographics | Victor Dibia |
|
06.02.2024 | |
ViT | Vision Transformer and MLP-Mixer Architectures |
others |
|
06.02.2024 | |
3D Ken Burns | A reference implementation of 3D Ken Burns Effect from a Single Image using PyTorch - given a single input image, it animates this still image with a virtual camera scan and zoom subject to motion parallax | Manuel Romero | |
24.01.2024 | |
VALL-E X | Cross-lingual neural codec language model for cross-lingual speech synthesis |
others |
|
19.01.2024 | |
PhotoMaker | Efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information |
others |
|
18.01.2024 | |
DDColor | End-to-end method with dual decoders for image colorization |
others |
|
15.01.2024 | |
PASD | Pixel-aware stable diffusion network to achieve robust Real-ISR as well as personalized stylization |
|
|
12.01.2024 | |
HandRefiner | Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting |
|
|
08.01.2024 | |
GraphCast | Learning skillful medium-range global weather forecasting |
others |
|
04.01.2024 | |
ESM | Evolutionary Scale Modeling: Pretrained language models for proteins |
others |
|
28.12.2023 | |
LLaVA | Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding |
|
|
22.12.2023 | |
Background Matting V2 | Real-time, high-resolution background replacement technique which operates at 30fps in 4K resolution, and 60fps for HD on a modern GPU |
others |
|
22.12.2023 | |
Gaussian Splatting | State-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 100 fps) novel-view synthesis at 1080p resolution |
|
|
19.12.2023 | |
SMPLer-X | Scaling up EHPS towards the first generalist foundation model, with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources |
others |
|
18.12.2023 | |
DeepCache | Training-free paradigm that accelerates diffusion models from the perspective of model architecture |
|
|
18.12.2023 | |
MagicAnimate | Diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity |
others |
|
18.12.2023 | |
DiffBIR | Towards Blind Image Restoration with Generative Diffusion Prior |
others |
|
18.12.2023 | |
AudioLDM | Text-to-audio system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining latents |
others |
|
02.12.2023 | |
TabPFN | Neural network that learned to do tabular data prediction |
|
|
29.11.2023 | |
Concept Sliders | Plug-and-play low rank adaptors applied on top of pretrained models |
|
|
26.11.2023 | |
Qwen-VL | Set of large-scale vision-language models designed to perceive and understand both text and images |
others |
|
24.11.2023 | |
AnimeGANv3 | Double-tail generative adversarial network for fast photo animation |
|
|
23.11.2023 | |
Ithaca | First Deep Neural Network for the textual restoration, geographical and chronological attribution of ancient Greek inscriptions |
others |
|
21.11.2023 | |
PixArt-Σ | Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation |
others |
|
07.11.2023 | |
Zero123++ | Image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view |
others |
|
26.10.2023 | |
UniFormerV2 | Unified Transformer for Efficient Spatiotemporal Representation Learning |
others |
|
20.10.2023 | |
Show-1 | Hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation |
others |
|
15.10.2023 | |
AudioSep | Foundation model for open-domain audio source separation with natural language queries |
others |
|
12.10.2023 | |
DA-CLIP | Degradation-aware vision-language model to better transfer pretrained vision-language models to low-level vision tasks as a universal framework for image restoration |
|
|
11.10.2023 | |
SadTalker | Generates 3D motion coefficients of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation |
others |
|
10.10.2023 | |
Musika | Music generation system that can be trained on hundreds of hours of music using a single consumer GPU, and that allows for much faster than real-time generation of music of arbitrary length on a consumer CPU |
|
|
09.10.2023 | |
YOLOv6 | Single-stage object detection framework dedicated to industrial applications |
|
|
08.10.2023 | |
DreamGaussian | Algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details |
|
|
04.10.2023 | |
ICON | Given a set of images, method estimates a detailed 3D surface from each image and then combines these into an animatable avatar |
|
|
31.08.2023 | |
DINOv2 | Produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning |
others |
|
31.08.2023 | |
OWL-ViT | Simple Open-Vocabulary Object Detection with Vision Transformers |
others |
|
21.08.2023 | |
StyleGAN3 | Alias-Free Generative Adversarial Networks |
others |
|
13.08.2023 | |
FateZero | Zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask |
others |
|
13.08.2023 | |
Big GAN | Large Scale GAN Training for High Fidelity Natural Image Synthesis |
|
03.08.2023 | ||
LaMa | Resolution-robust Large Mask Inpainting with Fourier Convolutions |
others |
|
02.08.2023 | |
MakeItTalk | A method that generates expressive talking-head videos from a single facial image with audio as the only input |
others |
|
27.07.2023 | |
HiDT | A generative image-to-image model and a new upsampling scheme that allows to apply image translation at high resolution |
|
|
24.07.2023 | |
CutLER | Simple approach for training unsupervised object detection and segmentation models |
|
|
24.07.2023 | |
Recognize Anything & Tag2Text | Vision language pre-training framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features |
others |
|
09.07.2023 | |
Thin-Plate Spline Motion Model | End-to-end unsupervised motion transfer framework |
|
|
07.07.2023 | |
DragGAN | Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold |
others |
|
03.07.2023 | |
MobileSAM | Towards Lightweight SAM for Mobile Applications |
others |
|
30.06.2023 | |
Grounding DINO | Marrying DINO with Grounded Pre-Training for Open-Set Object Detection |
others |
|
28.06.2023 | |
T5X | Modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of sequence models at many scales |
others |
|
27.06.2023 | |
CodeTalker | Cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty |
others |
|
16.06.2023 | |
First Order Motion Model for Image Animation | Transferring facial movements from video to image | Aliaksandr Siarohin |
|
04.06.2023 | |
Parallel WaveGAN | State-of-the-art non-autoregressive models to build your own great vocoder | Tomoki Hayashi |
|
01.06.2023 | |
ECON | designed for "Human digitization from a color image", which combines the best properties of implicit and explicit representations, to infer high-fidelity 3D clothed humans from in-the-wild images, even with loose clothing or in challenging poses |
|
|
31.05.2023 | |
MMS | The Massively Multilingual Speech project expands speech technology from about 100 languages to over 1000 by building a single multilingual speech recognition model supporting over 1100 languages, language identification models able to identify over 4000 languages, pretrained models supporting over 1400 languages, and text-to-speech models for over 1100 languages |
others |
|
26.05.2023 | |
FAB | Flow AIS Bootstrap uses AIS to generate samples in regions where the flow is a poor approximation of the target, facilitating the discovery of new modes |
|
|
29.04.2023 | |
CodeFormer | Transformer-based prediction network to model global composition and context of the low-quality faces for code prediction, enabling the discovery of natural faces that closely approximate the target faces even when the inputs are severely degraded |
|
|
21.04.2023 | |
Text2Video-Zero | Text-to-Image Diffusion Models are Zero-Shot Video Generators |
others |
|
11.04.2023 | |
Segment Anything | The Segment Anything Model produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image |
others |
|
10.04.2023 | |
FollowYourPose | Two-stage training scheme that can utilize image pose pair and pose-free video datasets and the pre-trained text-to-image model to obtain the pose-controllable character videos |
others |
|
07.04.2023 | |
EVA3D | High-quality unconditional 3D human generative model that only requires 2D image collections for training |
|
|
06.04.2023 | |
Stable Dreamfusion | Using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis |
|
|
04.04.2023 | |
PIFuHD | Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization |
|
|
26.03.2023 | |
VideoReTalking | System to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion |
others |
|
19.03.2023 | |
Visual ChatGPT | Connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting |
others |
|
15.03.2023 | |
Tune-A-Video | One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation |
others |
|
23.02.2023 | |
GPEN | GAN Prior Embedded Network for Blind Face Restoration in the Wild |
|
|
15.02.2023 | |
PyMAF-X | Кegression-based approach to recovering parametric full-body models from monocular images |
others |
|
14.02.2023 | |
Disco Diffusion | A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations |
|
|
11.02.2023 | |
GrooVAE | Some applications of machine learning for generating and manipulating beats and drum performances |
|
|
02.02.2023 | |
Multitrack MusicVAE | The models in this notebook are capable of encoding and decoding single measures of up to 8 tracks, optionally conditioned on an underlying chord |
others |
|
02.02.2023 | |
MusicVAE | A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music |
|
|
02.02.2023 | |
Learning to Paint | Learning to Paint With Model-based Deep Reinforcement Learning | Manuel Romero | |
01.02.2023 | |
Instant-NGP | Instant Neural Graphics Primitives with a Multiresolution Hash Encoding |
|
|
18.01.2023 | |
Fourier Feature Networks | Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains |
others |
|
17.01.2023 | |
AlphaPose | Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time |
others |
|
07.01.2023 | |
HybrIK | Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation |
others |
|
01.01.2023 | |
Score Jacobian Chaining | Apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field |
|
|
05.12.2022 | |
Demucs | Hybrid Spectrogram and Waveform Source Separation | Alexandre Défossez |
|
21.11.2022 | |
StyleCLIP | Text-Driven Manipulation of StyleGAN Imager |
|
|
30.10.2022 | |
MotionDiffuse | The first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods |
others |
|
13.10.2022 | |
VToonify | Leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details |
|
|
07.10.2022 | |
PyMAF | Pyramidal Mesh Alignment Feedback loop in regression network for well-aligned body mesh recovery and extend it for the recovery of expressive full-body models |
others |
|
06.10.2022 | |
AlphaTensor | Discovering faster matrix multiplication algorithms with reinforcement learning |
others |
|
04.10.2022 | |
Swin2SR | Novel Swin Transformer V2, to improve SwinIR for image super-resolution, and in particular, the compressed input scenario |
|
|
03.10.2022 | |
Functa | From data to functa: Your data point is a function and you can treat it like one |
|
|
24.09.2022 | |
Whisper | Automatic speech recognition system trained on 680,000 hours of multilingual and multitask supervised data collected from the web |
others |
|
21.09.2022 | |
DeOldify (video) | Colorize your own videos! | Jason Antic |
|
19.09.2022 | |
DeOldify (photo) | Colorize your own photos! |
|
|
19.09.2022 | |
Real-ESRGAN | Extend the powerful ESRGAN to a practical restoration application, which is trained with pure synthetic data |
|
|
18.09.2022 | |
IDE-3D | Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis |
others |
|
08.09.2022 | |
Decision Transformers | An architecture that casts the problem of RL as conditional sequence model
Expand
Additional Information
Related Applications
Recommended for You
Related Information
All
|