Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction

Ju, Yixuan; Tan, Xuan; Zhu, Zhenyang; Li, Jiyi; Mao, Xiaoyang

Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction

Files

pg20251278.pdf (56.41 MB)

Date

2025

Authors

Ju, Yixuan
Tan, Xuan
Zhu, Zhenyang
Li, Jiyi
Mao, Xiaoyang

Publisher

The Eurographics Association

Abstract

Latent interpretation enables controllable image editing by discovering semantic components in the latent space of generative models. While prior works have primarily focused on GANs, their limited inversion capabilities and generation quality hinder their applicability in diverse editing tasks. In this paper, we propose a new framework for latent interpretation on pretrained diffusion autoencoders, combining the editing flexibility of latent-based methods with the generation quality of diffusion models. Our key insight is to perform semantic guidance directly in the latent space, thereby avoiding costly pixel-space feedback and enabling end-to-end training. To this end, we introduce a bidirectional editing strategy and an integrated lightweight semantic autoencoder to effectively constrain semantic directions. Our method enables fine-grained and disentangled manipulation across various image editing tasks, including facial attributes, face pose, and style transfer. Extensive experiments demonstrate state-of-the-art performance in both visual quality and editing disentanglement, compared to widely-used GAN-based and diffusion-based baselines. To the best of our knowledge, this work represents a novel step toward identify explicit semantic directions in the latent space of diffusion models, complementing the research on latent interpretation beyond GANs toward more flexible and precise image editing. Our code available at https://github.com/Xenithon/LIDA.

CCS Concepts: Computing methodologies → Image manipulation

        @inproceedings{10.2312:pg.20251278
,
booktitle = {Pacific Graphics Conference Papers, Posters, and Demos
},
editor = {Christie, Marc and 
Han, Ping-Hsuan and 
Lin, Shih-Syun and 
Pietroni, Nico and 
Schneider, Teseo and 
Tsai, Hsin-Ruey and 
Wang, Yu-Shuen and 
Zhang, Eugene
},
title = {{Latent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction
}},
author = {Ju, Yixuan and 
Tan, Xuan and 
Zhu, Zhenyang and 
Li, Jiyi and 
Mao, Xiaoyang
},
year = {2025
},
publisher = {The Eurographics Association
},
ISBN = {978-3-03868-295-0
},
DOI = {10.2312/pg.20251278
}
}

URI

https://doi.org/10.2312/pg.20251278
https://diglib.eg.org/handle/10.2312/pg20251278

Collections

PG2025 Conference Papers, Posters, and Demos

Full item page