Jun, U-ChaeKo, JaeeunKang, JiwooMasia, BelenThies, Justus2026-04-172026-04-1720261467-8659https://diglib.eg.org/handle/10.1111/cgf70409https://doi.org/10.1111/cgf.70409Diffusion models are powerful generative frameworks for producing high-quality images by denoising latent variables from random noise. However, training with likelihood-based objectives can lead to oversmoothed high-frequency details such as textures and sharp edges. Adversarial training with GANs enhances sharpness but usually requires additional discriminator networks. We propose Latent Diffusion Generative Adversarial Networks (LD-GAN), a framework that integrates adversarial learning into diffusion models without modifying their pipeline. LD-GAN leverages the pretrained variational autoencoder as an energy-based discriminator, enabling adversarial training without extra parameters while preserving the latent priors learned from large datasets. We also introduce a structural consistency energy aligning encoder and decoder representations, improving perceptual quality. Experiments show improved sample fidelity, sharpness, and diversity across multiple generation tasks while maintaining efficient training dynamics.CC-BY-4.0Computer visionLatent Diffusion-GAN: Adversarial Learning in the Autoencoded Latent Space10.1111/cgf.7040921 pages