DViTGAN: Training ViTGANs with Diffusion

Tong, MengjunRao, HongYang, WenjiChen, ShengboZuo, FangChen, RenjieRitschel, TobiasWhiting, Emily2024-10-132024-10-132024978-3-03868-250-9https://doi.org/10.2312/pg.20241305https://diglib.eg.org/handle/10.2312/pg20241305Recent research findings indicate that injecting noise using diffusion can effectively improve the stability of GAN for image generation tasks. Although ViTGAN based on Vision Transformer has certain performance advantages compared to traditional GAN, there are still issues such as unstable training and generated image details are not rich enough. Therefore, in this paper, we propose a novel model, DViTGAN, which leverages the diffusion model to generate instance noise facilitating ViTGAN training. Specifically, we employ forward diffusion to progressively generate noise that follows a Gaussian mixture distribution, and then introduce the generated noise into the input image of the discriminator. The generator incorporates the discriminator's feedback by backpropagating through the forward diffusion process to improve its performance. In addition, we observe that the ViTGAN generator lacks positional information, leading to a decreased context modeling ability and slower convergence. To this end, we introduce Fourier embedding and relative positional encoding to enhance the model's expressive ability. Experiments on multiple popular benchmarks have demonstrated the effectiveness of our proposed model.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Collision detection; Hardware → Sensors and actuators; PCB design and layoutComputing methodologies → Collision detectionHardware → Sensors and actuatorsPCB design and layoutDViTGAN: Training ViTGANs with Diffusion10.2312/pg.2024130510 pages