Inserting Anybody in Diffusion Models via Celeb Basis

NIPS 2023 Poster

Ge Yuan^1,2 Xiaodong Cun² Yong Zhang² Maomao Li^2,* Chenyang Qi^3,2 Xintao Wang² Ying Shan² Huicheng Zheng^1,*

¹ Sun Yat-sen University ² Tencent AI Lab ³ HKUST

TL;DR: Intergrating a unique individual into the pre-trained diffusion model with:

✅ just one facial photograph ✅ only 1024 learnable parameters ✅ in 3 minutes tunning
✅ Textural-Inversion compatibility ✅ Genearte and interact with other (new person) concepts

Method

The text embedding space has some nice feature of Interpolation, which inspired us to define a space for human generation.

First, we collect about 1,500 celebrity names as the initial collection. Then, we manually filter the initial one to $m=691$ names, based on the synthesis quality of text-to-image diffusion model(stable-diffusion} with corresponding name prompt. Later, each filtered name is tokenized and encoded into a celeb embedding group $g_i$. Finally, we conduct Principle Component Analysis to build a compact orthogonal basis.

During training~(left), we optimize the coefficients of the celeb basis with the help of a fixed face encoder. During inference~(right), we combine the learned personalized weights and shared celeb basis to generate images with the input identity.

Comparisons on the StyleGAN Synthetic Faces as training sample.

Single Person's Comparisons on Real Identities as training sample.

Multiple Persons' Personalization on Real Identities

More Evaluation

Two persons interaction.

Personalization for single person.

Expression controlling.

BibTeX


        @article{yuan2023celebbasis,
          title={Inserting Anybody in Diffusion Models via Celeb Basis},
          author={Yuan, Ge and Cun, Xiaodong and Zhang, Yong and Li, Maomao and Qi, Chenyang and Wang, Xintao and Shan, Ying and Zheng, Huicheng},
          journal={arXiv preprint arXiv:2306.00926},
          year={2023}
        }