Fang, ShaohengYang, HaitaoMooney, RaymondHuang, QixingBousseau, AdrienDay, Angela2025-05-092025-05-0920251467-8659https://doi.org/10.1111/cgf.70039https://diglib.eg.org/handle/10.1111/cgf700393D scene synthesis using natural language instructions has become a popular direction in computer graphics, with significant progress made by data-driven generative models recently. However, previous methods have mainly focused on one-time scene generation, lacking the interactive capability to generate, update, or correct scenes according to user instructions. To overcome this limitation, this paper focuses on text-guided interactive scene synthesis. First, we introduce the SceneMod dataset, which comprises 168k paired scenes with textual descriptions of the modifications. To support the interactive scene synthesis task, we propose a two-stage diffusion generative model that integrates scene-prior guidance into the denoising process to explicitly enforce physical constraints and foster more realistic scenes. Experimental results demonstrate that our approach outperforms baseline methods in text-guided scene synthesis tasks. Our system expands the scope of data-driven scene synthesis tasks and provides a novel, more flexible tool for users and designers in 3D scene generation. Code and dataset are available at https://github.com/bshfang/SceneMod.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Computer graphics; Natural language processing; Computer systems organization → Neural networksComputing methodologies → Computer graphicsNatural language processingComputer systems organization → Neural networksText-Guided Interactive Scene Synthesis with Scene Prior Guidance10.1111/cgf.7003912 pages