The text embedding space has some nice feature of Interpolation, which inspired us to define a space for human generation.
First, we collect about 1,500 celebrity names as the initial collection. Then, we manually filter the initial one to $m=691$ names, based on the synthesis quality of text-to-image diffusion model(stable-diffusion} with corresponding name prompt. Later, each filtered name is tokenized and encoded into a celeb embedding group $g_i$. Finally, we conduct Principle Component Analysis to build a compact orthogonal basis.
During training~(left), we optimize the coefficients of the celeb basis with the help of a fixed face encoder. During inference~(right), we combine the learned personalized weights and shared celeb basis to generate images with the input identity.
@article{yuan2023celebbasis,
title={Inserting Anybody in Diffusion Models via Celeb Basis},
author={Yuan, Ge and Cun, Xiaodong and Zhang, Yong and Li, Maomao and Qi, Chenyang and Wang, Xintao and Shan, Ying and Zheng, Huicheng},
journal={arXiv preprint arXiv:2306.00926},
year={2023}
}