Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction
Loading...
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
Latent interpretation enables controllable image editing by discovering semantic components in the latent space of generative models. While prior works have primarily focused on GANs, their limited inversion capabilities and generation quality hinder their applicability in diverse editing tasks. In this paper, we propose a new framework for latent interpretation on pretrained diffusion autoencoders, combining the editing flexibility of latent-based methods with the generation quality of diffusion models. Our key insight is to perform semantic guidance directly in the latent space, thereby avoiding costly pixel-space feedback and enabling end-to-end training. To this end, we introduce a bidirectional editing strategy and an integrated lightweight semantic autoencoder to effectively constrain semantic directions. Our method enables fine-grained and disentangled manipulation across various image editing tasks, including facial attributes, face pose, and style transfer. Extensive experiments demonstrate state-of-the-art performance in both visual quality and editing disentanglement, compared to widely-used GAN-based and diffusion-based baselines. To the best of our knowledge, this work represents a novel step toward identify explicit semantic directions in the latent space of diffusion models, complementing the research on latent interpretation beyond GANs toward more flexible and precise image editing. Our code available at https://github.com/Xenithon/LIDA.
Description
CCS Concepts: Computing methodologies → Image manipulation
@inproceedings{10.2312:pg.20251278,
booktitle = {Pacific Graphics Conference Papers, Posters, and Demos},
editor = {Christie, Marc and Han, Ping-Hsuan and Lin, Shih-Syun and Pietroni, Nico and Schneider, Teseo and Tsai, Hsin-Ruey and Wang, Yu-Shuen and Zhang, Eugene},
title = {{Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction}},
author = {Ju, Yixuan and Tan, Xuan and Zhu, Zhenyang and Li, Jiyi and Mao, Xiaoyang},
year = {2025},
publisher = {The Eurographics Association},
ISBN = {978-3-03868-295-0},
DOI = {10.2312/pg.20251278}
}
