Chen, LianggangxuLu, JialeCai, YiqingWang, ChangboHe, GaoqiUmetani, NobuyukiWojtan, ChrisVouga, Etienne2022-10-042022-10-0420221467-8659https://doi.org/10.1111/cgf.14658https://diglib.eg.org:443/handle/10.1111/cgf146583D scene graph generation (SGG) aims to predict the class of objects and predicates simultaneously in one 3D point cloud scene with instance segmentation. Since the underlying semantic of 3D point clouds is spatial information, recent ideas of the 3D SGG task usually face difficulties in understanding global contextual semantic relationships and neglect the intrinsic 3D visual structures. To build the global scope of semantic relationships, we first propose two types of Semantic Clue (SC) from entity level and path level, respectively. SC can be extracted from the training set and modeled as the co-occurrence probability between entities. Then a novel Semantic Clue aware Graph Convolution Network (SC-GCN) is designed to explicitly model each SC of which the message is passed in their specific neighbor pattern. For constructing the interactions between the 3D visual and semantic modalities, a visual-language transformer (VLT) module is proposed to jointly learn the correlation between 3D visual features and class label embeddings. Systematic experiments on the 3D semantic scene graph (3DSSG) dataset show that our full method achieves state-of-the-art performance.CCS Concepts: Computing methodologies → 3D point cloud understanding; Graph convolution networkComputing methodologies → 3D point cloud understandingGraph convolution networkExploring Contextual Relationships in 3D Cloud Points by Semantic Knowledge Mining10.1111/cgf.1465875-8612 pages