We present DiffBIR, which leverages pretrained text-to-image diffusion models for blind image restoration problem. Our framework adopts a two-stage pipeline. In the first stage, we pretrain a restoration module across diversified degradations for improving generalization capability in real-world scenarios. The second stage leverages the generative ability of latent diffusion models, for achieving realistic image restoration. Specifically, we introduce an injective modulation sub-network -- LAControlNet for finetuning, while the pre-trained Stable Diffusion is to maintain its generative ability. Finally, we introduce a controllable module that allows users to balance quality and fidelity by introducing the latent image guidance in the denoising process during inference. Extensive experiments have demonstrated its superiority over state-of-the-art approaches for both blind image super-resolution and blind face restoration tasks on both synthetic and real-world datasets.
The two-stage pipeline of DiffBIR: (1) pretrain a Restoration Module (RM) for degradation removal to obtain \(I_{reg}\); (2) leverage fixed Stable Diffusion through our proposed LAControNet for realistic image reconstruction and obtain \(I_{diff}\). RM is trained across diversified degradations in a self-supervised manner, and is fixed during stage-two. LAControlNet contains a parallel module that is partially initialized with the denoiser's checkpoint and has several fusion layers. It uses VAE's encoder to project the \(I_{reg}\) to the latent space, and performs concatenation with the randomly sampled noisy \(z_t\) as the conditioning mechanism.
@article{2023diffbir,
author = {Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Ben Fei, Bo Dai, Wanli Ouyang, Yu Qiao, Chao Dong},
title = {DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior},
journal = {arxiv},
year = {2023},
}