Zhang, XindanLi, YingSheng, HuankunZhang, XinnianChen, RenjieRitschel, TobiasWhiting, Emily2024-10-132024-10-1320241467-8659https://doi.org/10.1111/cgf.15250https://diglib.eg.org/handle/10.1111/cgf15250Unsupervised domain adaptation (UDA) is increasingly used for 3D point cloud semantic segmentation tasks due to its ability to address the issue of missing labels for new domains. However, most existing unsupervised domain adaptation methods focus only on uni-modal data and are rarely applied to multi-modal data. Therefore, we propose a cross-modal UDA on multimodal datasets that contain 3D point clouds and 2D images for 3D Semantic Segmentation. Specifically, we first propose a Dual discriminator-based Domain Adaptation (Dd-bDA) module to enhance the adaptability of different domains. Second, given that the robustness of depth information to domain shifts can provide more details for semantic segmentation, we further employ a Dense depth Feature Fusion (DdFF) module to extract image features with rich depth cues. We evaluate our model in four unsupervised domain adaptation scenarios, i.e., dataset-to-dataset (A2D2→SemanticKITTI), Day-to-Night, country-tocountry (USA→Singapore), and synthetic-to-real (VirtualKITTI→SemanticKITTI). In all settings, the experimental results achieve significant improvements and surpass state-of-the-art models.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Point-based modelsComputing methodologies → Pointbased modelsAdversarial Unsupervised Domain Adaptation for 3D Semantic Segmentation with 2D Image Fusion of Dense Depth10.1111/cgf.1525011 pages