Stable Diffusion End-to-End Guide - From Noob to Expert
I became interested in using SD to generate images for military applications. Most of the resources are taken from 4chan's NSFW boards, as anons use SD to make hentai. Interestingly, the canonical SD WebUI has built-in functionality with anime/hentai image boards... One of the first use cases of SD right after DALL-E was generating anime girls, so the jump to hentai is not surprising.
Anyhow, the techniques from these weirdos are applicable to a variety of applications, most specifically LoRAs, which are like model fine-tuners. The idea is to work with specific LoRAs (e.g., military vehicles, aircraft, weapons, etc.) to generate synthetic image data for training vision models. Training new, useful LoRAs is also of interest. Later stuff may include inpainting for perturbation.
Disclaimer and Sources
Every link here may contain NSFW content, as most of the cutting-edge work on SD and LoRAs is with porn or hentai. So, please be wary when you are working with these resources. ALSO, Rentry.org pages are the main resources linked to in this guide. If any of the rentry pages do not work, change the .org to .co and the link should work. Otherwise, use the Wayback machine.
-TP
Play With It!
What can you actually do with SD? Huggingface and some others have some apps in-browser for you. Play around with them to see the power! What we will do in this guide is get the full, extensible WebUI to allow us to do anything we want.
- Huggingface Text to Image SD Playground
- Dreamstudio Text to Image SD App
- Dezgo Text to Image SD App
- Huggingface Image to Image SD Playground
- Huggingface Inpainting Playground
Table of Contents
- WebUI Basics
- Set up Local GPU usage
- Linux Setup
- Going Deeper
- Prompting
- NovelAI Model
- LoRA
- Playing with Models
- VAEs
- Put it all Together
- The General SD Process
- Saving Prompts
- txt2img Settings
- Regenerating a Previously-Generated Image
- Troubleshooting Errors
- Getting Comfortable
- Testing
- WebUI Advanced
- Prompt Editing
- Xformers
- Img2Img
- Inpainting
- Extras
- ControlNets
- Making New Stuff (WIP)
- Checkpoint Merger
- Training LoRAs
- Training New Models
- Google Colab Setup (WIP)
- Midjourney
- MJ Parameters
- MJ Advanced Prompts
- DreamStudio (WIP)
- Stable Horde (WIP)
- DreamBooth (WIP)
- Video Diffusion (WIP)
WebUI Basics
It's somewhat daunting to get into this... but 4channers have done a good job making this approachable. Below are the steps I took, in the simplest terms. Your intent is to get the Stable Diffusion WebUI (built with Gradio) running locally so you can start prompting and making images.
Set up Local GPU Usage
We will do Google Colab Pro setup later, so we can run SD on any device anywhere we want; but to start, let's get the WebUI setup on a PC. You need 16GB RAM, a GPU with 2GB VRAM, Windows 7+ and 20+GB disk space.
- Finish the starting setup guide
- I followed this up to step 7, after which it goes into the hentai stuff
- Step 3 takes 15-45 minutes on average Internet speed, as the models are 5+ GB each
- Step 7 can take upwards of half an hour and may seem "stuck" in the CLI
- In step 3 I downloaded SD1.5, not the 2.x versions, as 1.5 produces much better results
- CivitAI has all the SD models; it's like HuggingFace but for SD specifically
- Verify that the WebUI works
- Copy the URL the CLI outputs once done, e.g.,
127.0.0.1:7860
(do NOT use Ctrl + C because this command can close the CLI)
- Paste into browser and voila; try a prompt and you're off to the races
- Images will be saved automatically when generated to
stable-diffusion-webuioutputstxt2img-images
- Remember, to update, just open a CLI in the stable-diffusion-webui folder and enter the command
git pull
Linux Setup
Ignore this entirely if you have Windows. I did manage to get it running on Linux too, although it's a bit more complicated. I started by following this guide, but it is rather poorly written, so below are the steps I took to get it running in Linux. I was using Linux Mint 20, which is an Ubuntu 20 distribution.
- Start by cloning the webui repo:
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
- Get a SD model (e.g., SD 1.5, like in the previous section)
- Put the model ckpt file into
stable-diffusion-webui/models/Stable-diffusion
- Download Python (if you don't already have it):
sudo apt install python3 python3-pip python3-virtualenv wget git
- And the WebUI is very particular, so we need to install Conda, a virtual environment manager, to work inside of:
wget https://repo.anaconda.com/miniconda/Minconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
- Now create the environment:
conda create --name sdwebui python=3.10.6
- Activate the environment:
conda activate sdwebui
- Navigate to your WebUI folder and type
./webui.sh
- It should execute for a bit until you get an error about not being able to access CUDA/your GPU... this is fine, because it's our next step
- Start by wiping any existing Nvidia drivers:
sudo apt update
sudo apt purge *nvidia*
- Now, sort of following some bits from this guide, find out what GPU your Linux machine has (easiest way to do this is to open the Driver Manager app and your GPU will be listed; but there are a dozen ways, just Google it)
- Go to this page and click the "Latest New Feature Branch" under Linux x86_64 (for me, it was 530.xx.xx)
- Click the tab "Supported Products" and Ctrl + F to find your GPU; if it is listed, proceed, otherwise back out and try "Latest Production Branch Version"; note the number, e.g., 530
- In a terminal, type:
sudo add-apt-repository ppa:graphics-drivers/ppa
- Update with
sudo apt-get update
- Launch the Driver Manager app and you should see a list of them; do NOT select the recommended one (e.g., nvidia-driver-530-open), select the exact one from earlier (e.g., nvidia-driver-530), and Apply Changes; OR, install it in the terminal with
sudo apt-get install nvidia-driver-530
- AT THIS POINT, you should get a popup through your CLI about Secure Boot, asking you for an 8-digit password: set it and write it down
- Reboot your PC and before your encryption/user login, you should see a BIOS-like screen (I am writing this from memory) with an option to input a MOK key; click it and enter your password, then submit and boot; some info here
- Log in like normally and type the command
nvidia-smi
; if successful, it should print a table; if not, it will say something like "Could not connect to the GPU; please ensure the most up to date driver is installed"
- Now to install CUDA (the last command here should print some info about your new CUDA install); from this guide:
sudo apt update
sudo apt install apt-transport-https ca-certificates gnupg
sudo apt install nvidia-cuda-toolkit
nvcc-version
- Now go back and do steps 7-9; if you get this "ERROR: Cannot activate python venv, aborting...", go to the next step (otherwise, you are off to the races and will copy the IP address from the CLI like normal and can begin playing with SD)
- This Github issue has some troubleshooting for this venv problem... for me, what worked was running
python3 -c 'import venv'
python3 -m venv venv/
And then going to the /stable-diffusion-webui
folder and running:
rm -rf venv/
python3 -m venv venv/
After that, it worked for me.
Going Deeper
- Read up on prompting techniques, because there are lots of things to know (e.g., positive prompt vs. negative prompt, sampling steps, sampling method, etc.)
- OpenArt Promptbook Guide
- Definitive SD Prompting Guide
- A succint prompting guide
- 4chan prompting tips (NSFW)
- Collection of prompts and images
- Step-by-Step Anime Girl Prompting Guide
- Read up on SD knowledge in general:
- Seminal Stable Diffusion Publication
- CompVis / Stability AI Github (home of the original SD models)
- Stable Diffusion Compendium (good outside resource)
- Stable Diffusion Links Hub (incredible 4chan resource)
- Stable Diffusion Goldmine
- Simplified SD Goldmine
- Random/Misc. SD Links
- FAQ (NSFW)
- Another FAQ
- Join the Stable Diffusion Discord
- Keep up to date with Stable Diffsion news
- Did you know that as of March 2023, a 1.7B parameter text-to-video diffusion model is available?
- Mess around in the WebUI, play with different models, settings, etc.
Prompting
The order of words in a prompt has an effect: earlier words take precedence. The general structure of a good prompt, from here:
And another good guide says the prompt should follow this structure: