Xue, BowenGuarnera, Giuseppe ClaudioZhao, ShuangMontazeri, ZahraMasia, BelenThies, Justus2026-04-202026-04-2020261467-8659https://diglib.eg.org/handle/10.1111/cgf70329https://doi.org/10.1111/cgf.70329Despite recent advances in text-to-image generation, controlling geometric layout and PBR material properties in synthesized scenes remains challenging. We present a pipeline that first produces a G-buffer (albedo, normals, depth, roughness, shading, and metallic) from a text prompt and then renders a final image through a PBR-inspired branch network. This intermediate representation enables fine-grained control: users can copy and paste within specific G-buffer channels to insert or reposition objects, or apply masks to the irradiance channel to adjust lighting locally. As a result, real objects can be seamlessly integrated into virtual scenes. By separating user-friendly scene description from image rendering, our method offers a practical balance between detailed post-generation control and efficient text-driven synthesis. We demonstrate its effectiveness through quantitative evaluations and a user study with 156 participants, showing consistent human preference over strong baselines and confirming that G-buffer control extends the flexibility of text-guided image generation.CC-BY-4.0CCS Concepts: Computing methodologies → Reflectance modeling; Image-based rendering;CCS ConceptsComputing methodologies → Reflectance modelingImage-based renderingPBR-Inspired Controllable Diffusion for Image Generation10.1111/cgf.7032913 pages