Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction

dc.contributor.authorJu, Yixuanen_US
dc.contributor.authorTan, Xuanen_US
dc.contributor.authorZhu, Zhenyangen_US
dc.contributor.authorLi, Jiyien_US
dc.contributor.authorMao, Xiaoyangen_US
dc.contributor.editorChristie, Marcen_US
dc.contributor.editorHan, Ping-Hsuanen_US
dc.contributor.editorLin, Shih-Syunen_US
dc.contributor.editorPietroni, Nicoen_US
dc.contributor.editorSchneider, Teseoen_US
dc.contributor.editorTsai, Hsin-Rueyen_US
dc.contributor.editorWang, Yu-Shuenen_US
dc.contributor.editorZhang, Eugeneen_US
dc.date.accessioned2025-10-07T06:03:39Z
dc.date.available2025-10-07T06:03:39Z
dc.date.issued2025
dc.description.abstractLatent interpretation enables controllable image editing by discovering semantic components in the latent space of generative models. While prior works have primarily focused on GANs, their limited inversion capabilities and generation quality hinder their applicability in diverse editing tasks. In this paper, we propose a new framework for latent interpretation on pretrained diffusion autoencoders, combining the editing flexibility of latent-based methods with the generation quality of diffusion models. Our key insight is to perform semantic guidance directly in the latent space, thereby avoiding costly pixel-space feedback and enabling end-to-end training. To this end, we introduce a bidirectional editing strategy and an integrated lightweight semantic autoencoder to effectively constrain semantic directions. Our method enables fine-grained and disentangled manipulation across various image editing tasks, including facial attributes, face pose, and style transfer. Extensive experiments demonstrate state-of-the-art performance in both visual quality and editing disentanglement, compared to widely-used GAN-based and diffusion-based baselines. To the best of our knowledge, this work represents a novel step toward identify explicit semantic directions in the latent space of diffusion models, complementing the research on latent interpretation beyond GANs toward more flexible and precise image editing. Our code available at https://github.com/Xenithon/LIDA.en_US
dc.description.sectionheadersImage Creation & Augmentation
dc.description.seriesinformationPacific Graphics Conference Papers, Posters, and Demos
dc.identifier.doi10.2312/pg.20251278
dc.identifier.isbn978-3-03868-295-0
dc.identifier.pages11 pages
dc.identifier.urihttps://doi.org/10.2312/pg.20251278
dc.identifier.urihttps://diglib.eg.org/handle/10.2312/pg20251278
dc.publisherThe Eurographics Associationen_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Computing methodologies → Image manipulation
dc.subjectComputing methodologies → Image manipulation
dc.titleLatent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstructionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
pg20251278.pdf
Size:
56.41 MB
Format:
Adobe Portable Document Format