LoRA-deployment
This repository demonstrates how to serve multiple LoRA fine-tuned Stable Diffusions from ? Diffusers library on Hugging Face Inference Endpoint. Since only few ~ MB of checkpoint is produced after finetuning with LoRA, we can switch different checkpoint for different fine-tuned Stable Diffusion in super quick, memory efficient, and disk space efficient ways.
For demonstration purpose, I have tested the following Hugging Face Model repositories which has LoRA fine-tuned checkpoint(pytorch_lora_weights.bin
):
- ethan_ai
- noto-emoji
- pokemon
Notebook
- Pilot notebook: shows how to write and test a custom handler for Hugging Face Inference Endpoint in local or Colab environments
- Inference notebook: shows how to request inference to the custom handler deployed on Hugging Face Inference Endopint
- Multi-workers inference notebook: shows how to run simultaneous requests to the custom handler deployed on Hugging Face Inference Endpoint in Colab environment
Custom Handler
- handler.py: basic handler. This custom handler is proved to work with this Hugging Face Model repo
- multiworker_handler.py: advanced handler with multiple worker(Stable Diffusion) pool. This custom handler is proved to work with this Hugging Face Model repo
Script
- inference.py: standalone Python script to send requests to the custom handler deployed on Hugging Face Inference Endpoint
Reference
- https://huggingface.co/blog/lora