TY - JOUR
T1 - AnimeDiff
T2 - Customized Image Generation of Anime Characters Using Diffusion Model
AU - Jiang, Yuqi
AU - Liu, Qiankun
AU - Chen, Dongdong
AU - Yuan, Lu
AU - Fu, Ying
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Due to the unprecedented power of text-to-image diffusion models, customizing these models to generate new concepts has gained increasing attention. Existing works have achieved some success on real-world concepts, but fail on the concepts of anime characters. We empirically find that such low quality comes from the newly introduced identifier text tokens, which are optimized to identify different characters. In this paper, we propose AnimeDiff which focuses on customized image generation of anime characters. Our AnimeDiff directly binds anime characters with their names and keeps the embeddings of text tokens unchanged. Furthermore, when composing multiple characters in a single image, the model tends to confuse the properties of those characters. To address this issue, our AnimeDiff incorporates a Cut-and-Paste data augmentation strategy that produces multi-character images for training by cutting and pasting multiple characters onto background images. Experiments are conducted to prove the superiority of AnimeDiff over other methods.
AB - Due to the unprecedented power of text-to-image diffusion models, customizing these models to generate new concepts has gained increasing attention. Existing works have achieved some success on real-world concepts, but fail on the concepts of anime characters. We empirically find that such low quality comes from the newly introduced identifier text tokens, which are optimized to identify different characters. In this paper, we propose AnimeDiff which focuses on customized image generation of anime characters. Our AnimeDiff directly binds anime characters with their names and keeps the embeddings of text tokens unchanged. Furthermore, when composing multiple characters in a single image, the model tends to confuse the properties of those characters. To address this issue, our AnimeDiff incorporates a Cut-and-Paste data augmentation strategy that produces multi-character images for training by cutting and pasting multiple characters onto background images. Experiments are conducted to prove the superiority of AnimeDiff over other methods.
KW - customized image generation
KW - diffusion model
KW - Text-to-image synthesis
UR - http://www.scopus.com/inward/record.url?scp=85198260830&partnerID=8YFLogxK
U2 - 10.1109/TMM.2024.3415357
DO - 10.1109/TMM.2024.3415357
M3 - Article
AN - SCOPUS:85198260830
SN - 1520-9210
VL - 26
SP - 10559
EP - 10572
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -