Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction

Ju, Yixuan; Tan, Xuan; Zhu, Zhenyang; Li, Jiyi; Mao, Xiaoyang

Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction

dc.contributor.author	Ju, Yixuan	en_US
dc.contributor.author	Tan, Xuan	en_US
dc.contributor.author	Zhu, Zhenyang	en_US
dc.contributor.author	Li, Jiyi	en_US
dc.contributor.author	Mao, Xiaoyang	en_US
dc.contributor.editor	Christie, Marc	en_US
dc.contributor.editor	Han, Ping-Hsuan	en_US
dc.contributor.editor	Lin, Shih-Syun	en_US
dc.contributor.editor	Pietroni, Nico	en_US
dc.contributor.editor	Schneider, Teseo	en_US
dc.contributor.editor	Tsai, Hsin-Ruey	en_US
dc.contributor.editor	Wang, Yu-Shuen	en_US
dc.contributor.editor	Zhang, Eugene	en_US
dc.date.accessioned	2025-10-07T06:03:39Z
dc.date.available	2025-10-07T06:03:39Z
dc.date.issued	2025
dc.description.abstract	Latent interpretation enables controllable image editing by discovering semantic components in the latent space of generative models. While prior works have primarily focused on GANs, their limited inversion capabilities and generation quality hinder their applicability in diverse editing tasks. In this paper, we propose a new framework for latent interpretation on pretrained diffusion autoencoders, combining the editing flexibility of latent-based methods with the generation quality of diffusion models. Our key insight is to perform semantic guidance directly in the latent space, thereby avoiding costly pixel-space feedback and enabling end-to-end training. To this end, we introduce a bidirectional editing strategy and an integrated lightweight semantic autoencoder to effectively constrain semantic directions. Our method enables fine-grained and disentangled manipulation across various image editing tasks, including facial attributes, face pose, and style transfer. Extensive experiments demonstrate state-of-the-art performance in both visual quality and editing disentanglement, compared to widely-used GAN-based and diffusion-based baselines. To the best of our knowledge, this work represents a novel step toward identify explicit semantic directions in the latent space of diffusion models, complementing the research on latent interpretation beyond GANs toward more flexible and precise image editing. Our code available at https://github.com/Xenithon/LIDA.	en_US
dc.description.sectionheaders	Image Creation & Augmentation
dc.description.seriesinformation	Pacific Graphics Conference Papers, Posters, and Demos
dc.identifier.doi	10.2312/pg.20251278
dc.identifier.isbn	978-3-03868-295-0
dc.identifier.pages	11 pages
dc.identifier.uri	https://doi.org/10.2312/pg.20251278
dc.identifier.uri	https://diglib.eg.org/handle/10.2312/pg20251278
dc.publisher	The Eurographics Association	en_US
dc.rights	Attribution 4.0 International License
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	CCS Concepts: Computing methodologies → Image manipulation
dc.subject	Computing methodologies → Image manipulation
dc.title	Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: pg20251278.pdf
Size:: 56.41 MB
Format:: Adobe Portable Document Format

Download

Collections

PG2025 Conference Papers, Posters, and Demos