Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction
| dc.contributor.author | Ju, Yixuan | en_US |
| dc.contributor.author | Tan, Xuan | en_US |
| dc.contributor.author | Zhu, Zhenyang | en_US |
| dc.contributor.author | Li, Jiyi | en_US |
| dc.contributor.author | Mao, Xiaoyang | en_US |
| dc.contributor.editor | Christie, Marc | en_US |
| dc.contributor.editor | Han, Ping-Hsuan | en_US |
| dc.contributor.editor | Lin, Shih-Syun | en_US |
| dc.contributor.editor | Pietroni, Nico | en_US |
| dc.contributor.editor | Schneider, Teseo | en_US |
| dc.contributor.editor | Tsai, Hsin-Ruey | en_US |
| dc.contributor.editor | Wang, Yu-Shuen | en_US |
| dc.contributor.editor | Zhang, Eugene | en_US |
| dc.date.accessioned | 2025-10-07T06:03:39Z | |
| dc.date.available | 2025-10-07T06:03:39Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Latent interpretation enables controllable image editing by discovering semantic components in the latent space of generative models. While prior works have primarily focused on GANs, their limited inversion capabilities and generation quality hinder their applicability in diverse editing tasks. In this paper, we propose a new framework for latent interpretation on pretrained diffusion autoencoders, combining the editing flexibility of latent-based methods with the generation quality of diffusion models. Our key insight is to perform semantic guidance directly in the latent space, thereby avoiding costly pixel-space feedback and enabling end-to-end training. To this end, we introduce a bidirectional editing strategy and an integrated lightweight semantic autoencoder to effectively constrain semantic directions. Our method enables fine-grained and disentangled manipulation across various image editing tasks, including facial attributes, face pose, and style transfer. Extensive experiments demonstrate state-of-the-art performance in both visual quality and editing disentanglement, compared to widely-used GAN-based and diffusion-based baselines. To the best of our knowledge, this work represents a novel step toward identify explicit semantic directions in the latent space of diffusion models, complementing the research on latent interpretation beyond GANs toward more flexible and precise image editing. Our code available at https://github.com/Xenithon/LIDA. | en_US |
| dc.description.sectionheaders | Image Creation & Augmentation | |
| dc.description.seriesinformation | Pacific Graphics Conference Papers, Posters, and Demos | |
| dc.identifier.doi | 10.2312/pg.20251278 | |
| dc.identifier.isbn | 978-3-03868-295-0 | |
| dc.identifier.pages | 11 pages | |
| dc.identifier.uri | https://doi.org/10.2312/pg.20251278 | |
| dc.identifier.uri | https://diglib.eg.org/handle/10.2312/pg20251278 | |
| dc.publisher | The Eurographics Association | en_US |
| dc.rights | Attribution 4.0 International License | |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | CCS Concepts: Computing methodologies → Image manipulation | |
| dc.subject | Computing methodologies → Image manipulation | |
| dc.title | Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction | en_US |
Files
Original bundle
1 - 1 of 1