-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
Here's my attempt at an explanation without jargon, you can just read the last paragraph, the first 4 are just context.
These image models are trained on 1000 steps of noise, where at 0 no noise is added to the training image and at 1000 the image is pure noise. The model's goal it to denoise the image, and it does this knowing how much noise the image has, this makes the model learn how much it should change the image, for example at high noise it changes a lot of pixels and starts building the overall "structure" of the image, and a low noise it changes less pixels and focuses on adding details.
To use the model you start with pure noise, then the model iteratively denoises that noise until a clean image shows up. A naive approach would take 1000 steps, this means you run the model 1000 times, each time feeding the previous result and telling the model that the noise decreased by 1 until it reaches 0 noise. This takes a long time, up to 15 minutes to generate an image on a mid-range consumer GPU.
Turns out when you give the model pure noise and tell it there's 1000 steps of noise, the result is not an image that has 999 steps of noise, but an image that looks like it has much less, this means that you can probably skip 50-100 steps of denoising per iteration and still get a very good picture, the issue is: what steps to pick? You could again take a naive approach and just skip every 50 steps for a total of 20 steps, but turns out there's better ways.
This is where samplers come in, essentially a sampler takes the number of steps you want to take to denoise an image (usually ~20 steps) and it will--among other things--pick which steps to choose each iteration. The most popular samplers are the samplers in the k-diffusion repo[1] or k-samplers for short. Do note that samplers do much more than just pick the steps, they are actually responsible for doing the denoising process itself, some of them even add a small noise after a denoising step among other things.
The newest open source model, SDXL, is actually 2 models. A base model that can generate images as normal, and a refiner model that is specialized on adding details to images. A typical workflow is to ask the base model for 25 steps of denoise, but only run the first 20, then use the refiner model to do the rest. According to the OP, this was being done without keeping the state of the sampler, that is they were running 2 samplers separately, one for the base model and then start one over for the refiner model. Since the samplers use historical data for optimization, the end result was not ideal.
[1] https://github.com/crowsonkb/k-diffusion
Related posts
-
Llama3V is suspected to have been stolen from the MiniCPM-Llama3-v2.5 project
-
[2209.02842] ASR2K: Speech Recognition for Around 2000 Languages without Audio
-
Text-to-Speech with Speaker Diarization
-
Here comes the Muybridge camera moment but for text. Photoshop too
-
Best Deepfake Open Source App ROPE — So Easy To Use Full HD Feceswap DeepFace, Tutorials for Windows and Cloud — No GPU Required