42-Issue 7
Permanent URI for this collection
Browse
Browsing 42-Issue 7 by Issue Date
Now showing 1 - 20 of 57
Results Per Page
Sort Options
Item Neural Impostor: Editing Neural Radiance Fields with Explicit Shape Manipulation(The Eurographics Association and John Wiley & Sons Ltd., 2023) Liu, Ruiyang; Xiang, Jinxu; Zhao, Bowen; Zhang, Ran; Yu, Jingyi; Zheng, Changxi; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Neural Radiance Fields (NeRF) have significantly advanced the generation of highly realistic and expressive 3D scenes. However, the task of editing NeRF, particularly in terms of geometry modification, poses a significant challenge. This issue has obstructed NeRF's wider adoption across various applications. To tackle the problem of efficiently editing neural implicit fields, we introduce Neural Impostor, a hybrid representation incorporating an explicit tetrahedral mesh alongside a multigrid implicit field designated for each tetrahedron within the explicit mesh. Our framework bridges the explicit shape manipulation and the geometric editing of implicit fields by utilizing multigrid barycentric coordinate encoding, thus offering a pragmatic solution to deform, composite, and generate neural implicit fields while maintaining a complex volumetric appearance. Furthermore, we propose a comprehensive pipeline for editing neural implicit fields based on a set of explicit geometric editing operations. We show the robustness and adaptability of our system through diverse examples and experiments, including the editing of both synthetic objects and real captured data. Finally, we demonstrate the authoring process of a hybrid synthetic-captured object utilizing a variety of editing operations, underlining the transformative potential of Neural Impostor in the field of 3D content creation and manipulation.Item IBL-NeRF: Image-Based Lighting Formulation of Neural Radiance Fields(The Eurographics Association and John Wiley & Sons Ltd., 2023) Choi, Changwoon; Kim, Juhyeon; Kim, Young Min; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.We propose IBL-NeRF, which decomposes the neural radiance fields (NeRF) of large-scale indoor scenes into intrinsic components. Recent approaches further decompose the baked radiance of the implicit volume into intrinsic components such that one can partially approximate the rendering equation. However, they are limited to representing isolated objects with a shared environment lighting, and suffer from computational burden to aggregate rays with Monte Carlo integration. In contrast, our prefiltered radiance field extends the original NeRF formulation to capture the spatial variation of lighting within the scene volume, in addition to surface properties. Specifically, the scenes of diverse materials are decomposed into intrinsic components for rendering, namely, albedo, roughness, surface normal, irradiance, and prefiltered radiance. All of the components are inferred as neural images from MLP, which can model large-scale general scenes. Especially the prefiltered radiance effectively models the volumetric light field, and captures spatial variation beyond a single environment light. The prefiltering aggregates rays in a set of predefined neighborhood sizes such that we can replace the costly Monte Carlo integration of global illumination with a simple query from a neural image. By adopting NeRF, our approach inherits superior visual quality and multi-view consistency for synthesized images as well as the intrinsic components. We demonstrate the performance on scenes with complex object layouts and light configurations, which could not be processed in any of the previous works.Item Groupwise Shape Correspondence Refinement with a Region of Interest Focus(The Eurographics Association and John Wiley & Sons Ltd., 2023) Galmiche, Pierre; Seo, Hyewon; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.While collections of scan shapes are becoming more prevalent in many real-world applications, finding accurate and dense correspondences across multiple shapes remains a challenging task. In this work, we introduce a new approach for refining non-rigid correspondences among a collection of 3D shapes undergoing non-rigid deformation. Our approach incorporates a Region Of Interest (ROI) into the refinement process, which is specified by the user on one shape within the collection. Based on the functional map framework and more specifically on the notion of cycle-consistency, our formulation improves the overall matching consistency while prioritizing that of the region of interest. Specifically, the initial pairwise correspondences are refined by first defining the localized harmonics that are confined within the transferred ROI on each shape, and subsequently applying the CCLB (Canonical Consistent Latent Basis) framework both on the global and the localized harmonics. This leads to an enhanced matching accuracy for both the ROIs and the overall shapes across the collection. We evaluate our method on various synthetic and real scan datasets, in comparison with the state-of-the-art techniques.Item SVBRDF Reconstruction by Transferring Lighting Knowledge(The Eurographics Association and John Wiley & Sons Ltd., 2023) Zhu, Pengfei; Lai, Shuichang; Chen, Mufan; Guo, Jie; Liu, Yifan; Guo, Yanwen; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.The problem of reconstructing spatially-varying BRDFs from RGB images has been studied for decades. Researchers found themselves in a dilemma: opting for either higher quality with the inconvenience of camera and light calibration, or greater convenience at the expense of compromised quality without complex setups. We address this challenge by introducing a twobranch network to learn the lighting effects in images. The two branches, referred to as Light-known and Light-aware, diverge in their need for light information. The Light-aware branch is guided by the Light-known branch to acquire the knowledge of discerning light effects and surface reflectance properties, but without the reliance of light positions. Both branches are trained using the synthetic dataset, but during testing on real-world cases without calibration, only the Light-aware branch is activated. To facilitate a more effective utilization of various light conditions, we employ gated recurrent units (GRUs) to fuse the features extracted from different images. The two modules mutually benefit when multiple inputs are provided. We present our reconstructed results on both synthetic and real-world examples, demonstrating high quality while maintaining a lightweight characteristic in comparison to previous methods.Item A Surface Subdivision Scheme Based on Four-Directional S^1_3 Non-Box Splines(The Eurographics Association and John Wiley & Sons Ltd., 2023) Huang, Zhangjin; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.In this paper, we propose a novel surface subdivision scheme called non-box subdivision, which is generalized from fourdirectional S13 on-box splines. The resulting subdivision surfaces achieve C1 continuity with the convex hull property. This scheme can be regarded as either a four-directional subdivision or a special quadrilateral subdivision. When used as a quadrilateral subdivision, the proposed scheme can control the shape of the limit surface more flexibly than traditional schemes due to the natural introduction of auxiliary face control vertices.Item Multi-scale Iterative Model-guided Unfolding Network for NLOS Reconstruction(The Eurographics Association and John Wiley & Sons Ltd., 2023) Su, Xiongfei; Hong, Yu; Ye, Juntian; Xu, Feihu; Yuan, Xin; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Non-line-of-sight (NLOS) imaging can reconstruct hidden objects by analyzing diffuse reflection of relay surfaces, and is potentially used in autonomous driving, medical imaging and national defense. Despite the challenges of low signal-to-noise ratio (SNR) and ill-conditioned problem, NLOS imaging has developed rapidly in recent years. While deep neural networks have achieved impressive success in NLOS imaging, most of them lack flexibility when dealing with multiple spatial-temporal resolution and multi-scene images in practical applications. To bridge the gap between learning methods and physical priors, we present a novel end-to-end Multi-scale Iterative Model-guided Unfolding (MIMU), with superior performance and strong flexibility. Furthermore, we overcome the lack of real training data with a general architecture that can be trained in simulation. Unlike existing encoder-decoder architectures and generative adversarial networks, the proposed method allows for only one trained model adaptive for various dimensions, such as various sampling time resolution, various spatial resolution and multiple channels for colorful scenes. Simulation and real-data experiments verify that the proposed method achieves better reconstruction results both in quality and quantity than existing methods.Item Robust Novel View Synthesis with Color Transform Module(The Eurographics Association and John Wiley & Sons Ltd., 2023) Kim, Sang Min; Choi, Changwoon; Heo, Hyeongjun; Kim, Young Min; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.The advancements of the Neural Radiance Field (NeRF) and its variants have demonstrated remarkable capabilities in generating photo-realistic novel views from a small set of input images. While recent works suggest various techniques and model architectures that enhance speed or reconstruction quality, little attention is paid to exploring the RGB color space of input images. In this paper, we propose a universal color transform module that can maximally harness the captured evidence for the neural networks at hand. The color transform module utilizes an encoder-decoder framework that maps the RGB color space into a new latent space, enhancing the expressiveness of the input domain. We attach the encoder and the decoder at the input and output of a NeRF model of choice, respectively, and jointly optimize them to maintain the cycle consistency of the proposed transform, in addition to minimizing the reconstruction errors in the feature domain. Our comprehensive experiments demonstrate that the learned color space can significantly improve the quality of reconstructions compared to the conventional RGB representation. Its benefits are particularly pronounced in challenging scenarios characterized by low-light environments and scenes with low-textured regions. The proposed color transform pushes the boundaries of limitations in the input domain and offers a promising avenue for advancing the reconstruction capabilities of various neural representations. Source code is available at https://github.com/sangminkim-99/ColorTransformModule.Item Efficient Caustics Rendering via Spatial and Temporal Path Reuse(The Eurographics Association and John Wiley & Sons Ltd., 2023) Xu, Xiaofeng; Wang, Lu; Wang, Beibei; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Caustics are complex optical effects caused by the light being concentrated in a small area due to reflection or refraction on surfaces with low roughness, typically under a sharp light source. Rendering caustic effects is challenging for Monte Carlobased approaches, due to the difficulties of sampling the specular paths. One effective solution is using the specular manifold to locate these valid specular paths. Unfortunately, it needs many iterations to find these paths, leading to a long rendering time. To address this issue, our key insight is that the specular paths tend to be similar for neighboring shading points. To this end, we propose to reuse the specular paths spatially. More specifically, we generate some specular path samples with a low sample rate and then reuse these specular path samples as the initialization for specular manifold walk among neighboring shading points. In this way, much fewer specular path-searching iterations are performed, due to the efficient initialization close to the final solution. Furthermore, this reuse strategy can be extended for dynamic scenes in a temporal manner, such as light moving or specular geometry deformation. Our method outperforms current state-of-the-art methods and can handle multiple bounces of light and various scenes.Item MOVIN: Real-time Motion Capture using a Single LiDAR(The Eurographics Association and John Wiley & Sons Ltd., 2023) Jang, Deok-Kyeong; Yang, Dongseok; Jang, Deok-Yun; Choi, Byeoli; Jin, Taeil; Lee, Sung-Hee; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Recent advancements in technology have brought forth new forms of interactive applications, such as the social metaverse, where end users interact with each other through their virtual avatars. In such applications, precise full-body tracking is essential for an immersive experience and a sense of embodiment with the virtual avatar. However, current motion capture systems are not easily accessible to end users due to their high cost, the requirement for special skills to operate them, or the discomfort associated with wearable devices. In this paper, we present MOVIN, the data-driven generative method for real-time motion capture with global tracking, using a single LiDAR sensor. Our autoregressive conditional variational autoencoder (CVAE) model learns the distribution of pose variations conditioned on the given 3D point cloud from LiDAR. As a central factor for high-accuracy motion capture, we propose a novel feature encoder to learn the correlation between the historical 3D point cloud data and global, local pose features, resulting in effective learning of the pose prior. Global pose features include root translation, rotation, and foot contacts, while local features comprise joint positions and rotations. Subsequently, a pose generator takes into account the sampled latent variable along with the features from the previous frame to generate a plausible current pose. Our framework accurately predicts the performer's 3D global information and local joint details while effectively considering temporally coherent movements across frames. We demonstrate the effectiveness of our architecture through quantitative and qualitative evaluations, comparing it against state-of-the-art methods. Additionally, we implement a real-time application to showcase our method in real-world scenarios. MOVIN dataset is available at https://movin3d. github.io/movin_pg2023/.Item DAFNet: Generating Diverse Actions for Furniture Interaction by Learning Conditional Pose Distribution(The Eurographics Association and John Wiley & Sons Ltd., 2023) Jin, Taeil; Lee, Sung-Hee; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.We present DAFNet, a novel data-driven framework capable of generating various actions for indoor environment interactions. By taking desired root and upper-body poses as control inputs, DAFNet generates whole-body poses suitable for furniture of various shapes and combinations. To enable the generation of diverse actions, we introduce an action predictor that automatically infers the probabilities of individual action types based on the control input and environment. The action predictor is learned in an unsupervised manner by training Gaussian Mixture Variational Autoencoder (GMVAE). Additionally, we propose a two-part normalizing flow-based pose generator that sequentially generates upper and lower body poses. This two-part model improves motion quality and the accuracy of satisfying conditions over a single model generating the whole body. Our experiments show that DAFNet can create continuous character motion for indoor scene scenarios, and both qualitative and quantitative evaluations demonstrate the effectiveness of our framework.Item D-Cloth: Skinning-based Cloth Dynamic Prediction with a Three-stage Network(The Eurographics Association and John Wiley & Sons Ltd., 2023) Li, Yu Di; Tang, Min; Chen, Xiao Rui; Yang, Yun; Tong, Ruo Feng; An, Bai Lin; Yang, Shuang Cai; Li, Yao; Kou, Qi Long; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.We propose a three-stage network that utilizes a skinning-based model to accurately predict dynamic cloth deformation. Our approach decomposes cloth deformation into three distinct components: static, coarse dynamic, and wrinkle dynamic components. To capture these components, we train our three-stage network accordingly. In the first stage, the static component is predicted by constructing a static skinning model that incorporates learned joint increments and skinning weight increments. Then, in the second stage, the coarse dynamic component is added to the static skinning model by incorporating serialized skeleton information. Finally, in the third stage, the mesh sequence stage refines the prediction by incorporating the wrinkle dynamic component using serialized mesh information. We have implemented our network and used it in a Unity game scene, enabling real-time prediction of cloth dynamics. Our implementation achieves impressive prediction speeds of approximately 3.65ms using an NVIDIA GeForce RTX 3090 GPU and 9.66ms on an Intel i7-7700 CPU. Compared to SOTA methods, our network excels in accurately capturing fine dynamic cloth deformations.Item Precomputed Radiative Heat Transport for Efficient Thermal Simulation(The Eurographics Association and John Wiley & Sons Ltd., 2023) Freude, Christian; Hahn, David; Rist, Florian; Lipp, Lukas; Wimmer, Michael; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Architectural design and urban planning are complex design tasks. Predicting the thermal impact of design choices at interactive rates enhances the ability of designers to improve energy efficiency and avoid problematic heat islands while maintaining design quality. We show how to use and adapt methods from computer graphics to efficiently simulate heat transfer via thermal radiation, thereby improving user guidance in the early design phase of large-scale construction projects and helping to increase energy efficiency and outdoor comfort. Our method combines a hardware-accelerated photon tracing approach with a carefully selected finite element discretization, inspired by precomputed radiance transfer. This combination allows us to precompute a radiative transport operator, which we then use to rapidly solve either steady-state or transient heat transport throughout the entire scene. Our formulation integrates time-dependent solar irradiation data without requiring changes in the transport operator, allowing us to quickly analyze many different scenarios such as common weather patterns, monthly or yearly averages, or transient simulations spanning multiple days or weeks. We show how our approach can be used for interactive design workflows such as city planning via fast feedback in the early design phase.Item Sharing Model Framework for Zero-Shot Sketch-Based Image Retrieval(The Eurographics Association and John Wiley & Sons Ltd., 2023) Ho, Yi-Hsuan; Way, Der-Lor; Shih, Zen-Chung; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Sketch-based image retrieval (SBIR) is an emerging task in computer vision. Research interests have arisen in solving this problem under the realistic and challenging setting of zero-shot learning. Given a sketch as a query, the search goal is to retrieve the corresponding photographs in a zero-shot scenario. In this paper, we divide the aforementioned challenging work into three tasks and propose a sharing model framework that addresses these problems. First, the weights of the proposed sharing model effectively reduced the modality gap between sketches and photographs. Second, semantic information was used to handle different label spaces during the training and testing stages. The sketch and photograph domains share semantic information. Finally, a memory mechanism is used to reduce the intrinsic variety in sketches, even if they all belong to the same class. Sketches and photographs dominate the embeddings in turn. Because sketches are not limited by language, our ultimate goal is to find a method to replace text searches. We also designed a demonstration program to demonstrate the use of the proposed method in real-world applications. Our results indicate that the proposed method exhibits considerably higher zero-shot SBIR performance than do other state-of-the-art methods on the challenging Sketchy, TU-Berlin, and QuickDraw datasets.Item Generating Parametric BRDFs from Natural Language Descriptions(The Eurographics Association and John Wiley & Sons Ltd., 2023) Memery, Sean; Cedron, Osmar; Subr, Kartic; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Artistic authoring of 3D environments is a laborious enterprise that also requires skilled content creators. There have been impressive improvements in using machine learning to address different aspects of generating 3D content, such as generating meshes, arranging geometry, synthesizing textures, etc. In this paper we develop a model to generate Bidirectional Reflectance Distribution Functions (BRDFs) from descriptive textual prompts. BRDFs are four dimensional probability distributions that characterize the interaction of light with surface materials. They are either represented parametrically, or by tabulating the probability density associated with every pair of incident and outgoing angles. The former lends itself to artistic editing while the latter is used when measuring the appearance of real materials. Numerous works have focused on hypothesizing BRDF models from images of materials.We learn a mapping from textual descriptions of materials to parametric BRDFs. Our model is first trained using a semi-supervised approach before being tuned via an unsupervised scheme. Although our model is general, in this paper we specifically generate parameters for MDL materials, conditioned on natural language descriptions, within NVIDIA's Omniverse platform. This enables use cases such as real-time text prompts to change materials of objects in 3D environments such as ''dull plastic'' or ''shiny iron''. Since the output of our model is a parametric BRDF, rather than an image of the material, it may be used to render materials using any shape under arbitrarily specified viewing and lighting conditions.Item Learning to Generate and Manipulate 3D Radiance Field by a Hierarchical Diffusion Framework with CLIP Latent(The Eurographics Association and John Wiley & Sons Ltd., 2023) Wang, Jiaxu; Zhang, Ziyi; Xu, Renjing; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.3D-aware generative adversarial networks (GAN) are widely adopted in generating and editing neural radiance fields (NeRF). However, these methods still suffer from GAN-related issues including degraded diversity and training instability. Moreover, 3D-aware GANs consider NeRF pipeline as regularizers and do not directly operate with 3D assets, leading to imperfect 3D consistencies. Besides, the independent changes in disentangled editing cannot be ensured due to the sharing of some shallow hidden features in generators. To address these challenges, we propose the first purely diffusion-based three-stage framework for generative and editing tasks, with a series of well-designed loss functions that can directly handle 3D models. In addition, we present a generalizable neural point field as our 3D representation, which explicitly disentangles geometry and appearance in feature spaces. For 3D data conversion, it simplifies the preparation pipeline of datasets. Assisted by the representation, our diffusion model can separately manipulate the shape and appearance in a hierarchical manner by image/text prompts that are provided by the CLIP encoder. Moreover, it can generate new samples by adding a simple generative head. Experiments show that our approach outperforms the SOTA work in the generative tasks of direct generation of 3D representations and novel image synthesis, and completely disentangles the manipulation of shape and appearance with correct semantic correspondence in the editing tasks.Item Efficient Neural Representation of Volumetric Data using Coordinate-Based Networks.(The Eurographics Association and John Wiley & Sons Ltd., 2023) Devkota, Sudarshan; Pattanaik, Sumant; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.In this paper, we propose an efficient approach for the compression and representation of volumetric data utilizing coordinatebased networks and multi-resolution hash encoding. Efficient compression of volumetric data is crucial for various applications, such as medical imaging and scientific simulations. Our approach enables effective compression by learning a mapping between spatial coordinates and intensity values. We compare different encoding schemes and demonstrate the superiority of multiresolution hash encoding in terms of compression quality and training efficiency. Furthermore, we leverage optimization-based meta-learning, specifically using the Reptile algorithm, to learn weight initialization for neural representations tailored to volumetric data, enabling faster convergence during optimization. Additionally, we compare our approach with state-of-the-art methods to showcase improved image quality and compression ratios. These findings highlight the potential of coordinate-based networks and multi-resolution hash encoding for an efficient and accurate representation of volumetric data, paving the way for advancements in large-scale data visualization and other applications.Item Balancing Rotation Minimizing Frames with Additional Objectives(The Eurographics Association and John Wiley & Sons Ltd., 2023) Mossman, Christopher; Bartels, Richard H.; Samavati, Faramarz F.; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.When moving along 3D curves, one may require local coordinate frames for visited points, such as for animating virtual cameras, controlling robotic motion, or constructing sweep surfaces. Often, consecutive coordinate frames should be similar, avoiding sharp twists. Previous work achieved this goal by using various methods to approximate rotation minimizing frames (RMFs) with respect to a curve's tangent. In this work, we use Householder transformations to construct preliminary tangentaligned coordinate frames and then optimize these initial frames under the constraint that they remain tangent-aligned. This optimization minimizes the weighted sum of squared distances between selected vectors within the new frames and fixed vectors outside them (such as the axes of previous frames). By selecting different vectors for this objective function, we reproduce existing RMF approximation methods and modify them to consider additional objectives beyond rotation minimization. We also provide some example computer graphics use cases for this new frame tracking.Item 3D Object Tracking for Rough Models(The Eurographics Association and John Wiley & Sons Ltd., 2023) Song, Xiuqiang; Xie, Weijian; Li, Jiachen; Wang, Nan; Zhong, Fan; Zhang, Guofeng; Qin, Xueying; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Visual monocular 6D pose tracking methods for textureless or weakly-textured objects heavily rely on contour constraints established by the precise 3D model. However, precise models are not always available in reality, and rough models can potentially degrade tracking performance and impede the widespread usage of 3D object tracking. To address this new problem, we propose a novel tracking method that handles rough models. We reshape the rough contour through the probability map, which can avoid explicitly processing the 3D rough model itself. We further emphasize the inner region information of the object, where the points are sampled to provide color constrains. To sufficiently satisfy the assumption of small displacement between frames, the 2D translation of the object is pre-searched for a better initial pose. Finally, we combine constraints from both the contour and inner region to optimize the object pose. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on both roughly and precisely modeled objects. Particularly for the highly rough model, the accuracy is significantly improved (40.4% v.s. 16.9%).Item Refinement of Hair Geometry by Strand Integration(The Eurographics Association and John Wiley & Sons Ltd., 2023) Maeda, Ryota; Takayama, Kenshi; Taketomi, Takafumi; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Reconstructing 3D hair is challenging due to its complex micro-scale geometry, and is of essential importance for the efficient creation of high-fidelity virtual humans. Existing hair capture methods based on multi-view stereo tend to generate results that are noisy and inaccurate. In this study, we propose a refinement method for hair geometry by incorporating the gradient of strands into the computation of their position. We formulate a gradient integration strategy for hair strands. We evaluate the performance of our method using a synthetic multi-view dataset containing four hairstyles, and show that our refinement produces more accurate hair geometry. Furthermore, we tested our method with a real image input. Our method produces a plausible result. Our source code is publicly available at https://github.com/elerac/strand_integration.Item BubbleFormer: Bubble Diagram Generation via Dual Transformer Models(The Eurographics Association and John Wiley & Sons Ltd., 2023) Sun, Jiahui; Zheng, Liping; Zhang, Gaofeng; Wu, Wenming; Chaine, Raphaƫlle; Deng, Zhigang; Kim, Min H.Bubble diagrams serve as a crucial tool in the field of architectural planning and graphic design. With the surge of Artificial Intelligence Generated Content (AIGC), there has been a continuous emergence of research and development efforts focused on utilizing bubble diagrams for layout design and generation. However, there is a lack of research efforts focused on bubble diagram generation. In this paper, we propose a novel generative model, BubbleFormer, for generating diverse and plausible bubble diagrams. BubbleFormer consists of two improved Transformer networks: NodeFormer and EdgeFormer. These networks generate nodes and edges of the bubble diagram, respectively. To enhance the generation diversity, a VAE module is incorporated into BubbleFormer, allowing for the sampling and generation of numerous high-quality bubble diagrams. BubbleFormer is trained end-to-end and evaluated through qualitative and quantitative experiments. The results demonstrate that Bubble- Former can generate convincing and diverse bubble diagrams, which in turn drive downstream tasks to produce high-quality layout plans. The model also shows generalization capabilities in other layout generation tasks and outperforms state-of-the-art techniques in terms of quality and diversity. In previous work, bubble diagrams as input are provided by users, and as a result, our bubble diagram generative model fills a significant gap in automated layout generation driven by bubble diagrams, thereby enabling an end-to-end layout design and generation. Code for this paper is at https://github.com/cgjiahui/BubbleFormer.