Ju, YixuanTan, XuanZhu, ZhenyangLi, JiyiMao, XiaoyangChristie, MarcHan, Ping-HsuanLin, Shih-SyunPietroni, NicoSchneider, TeseoTsai, Hsin-RueyWang, Yu-ShuenZhang, Eugene2025-10-072025-10-072025978-3-03868-295-0https://doi.org/10.2312/pg.20251278https://diglib.eg.org/handle/10.2312/pg20251278Latent interpretation enables controllable image editing by discovering semantic components in the latent space of generative models. While prior works have primarily focused on GANs, their limited inversion capabilities and generation quality hinder their applicability in diverse editing tasks. In this paper, we propose a new framework for latent interpretation on pretrained diffusion autoencoders, combining the editing flexibility of latent-based methods with the generation quality of diffusion models. Our key insight is to perform semantic guidance directly in the latent space, thereby avoiding costly pixel-space feedback and enabling end-to-end training. To this end, we introduce a bidirectional editing strategy and an integrated lightweight semantic autoencoder to effectively constrain semantic directions. Our method enables fine-grained and disentangled manipulation across various image editing tasks, including facial attributes, face pose, and style transfer. Extensive experiments demonstrate state-of-the-art performance in both visual quality and editing disentanglement, compared to widely-used GAN-based and diffusion-based baselines. To the best of our knowledge, this work represents a novel step toward identify explicit semantic directions in the latent space of diffusion models, complementing the research on latent interpretation beyond GANs toward more flexible and precise image editing. Our code available at https://github.com/Xenithon/LIDA.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Image manipulationComputing methodologies → Image manipulationLatent Interpretation for Diffusion Autoencoders via Integrated Semantic Reconstruction10.2312/pg.2025127811 pages