PF-UCDR: A Local-Aware RGB-Phase Fusion Network with Adaptive Prompts for Universal Cross-Domain Retrieval

Wu, Yiqi; Hu, Ronglei; Wu, Huachao; He, Fazhi; Zhang, Dejun

PF-UCDR: A Local-Aware RGB-Phase Fusion Network with Adaptive Prompts for Universal Cross-Domain Retrieval

dc.contributor.author	Wu, Yiqi	en_US
dc.contributor.author	Hu, Ronglei	en_US
dc.contributor.author	Wu, Huachao	en_US
dc.contributor.author	He, Fazhi	en_US
dc.contributor.author	Zhang, Dejun	en_US
dc.contributor.editor	Christie, Marc	en_US
dc.contributor.editor	Han, Ping-Hsuan	en_US
dc.contributor.editor	Lin, Shih-Syun	en_US
dc.contributor.editor	Pietroni, Nico	en_US
dc.contributor.editor	Schneider, Teseo	en_US
dc.contributor.editor	Tsai, Hsin-Ruey	en_US
dc.contributor.editor	Wang, Yu-Shuen	en_US
dc.contributor.editor	Zhang, Eugene	en_US
dc.date.accessioned	2025-10-07T06:03:42Z
dc.date.available	2025-10-07T06:03:42Z
dc.date.issued	2025
dc.description.abstract	Universal Cross-Domain Retrieval (UCDR) aims to match semantically related images across domains and categories not seen during training. While vision-language pre-trained models offer strong global alignment, we are inspired by the observation that local structures, such as shapes, contours, and textures, often remain stable across domains, and thus propose to model them explicitly at the patch level. We present PF-UCDR, a framework built upon frozen vision-language backbones that performs patch-wise fusion of RGB and phase representations. Central to our design is a Fusing Vision Encoder, which applies masked cross-attention to spatially aligned RGB and phase patches, enabling fine-grained integration of complementary appearance and structural cues. Additionally, we incorporate adaptive visual prompts that condition image encoding based on domain and class context. Local and global fusion modules aggregate these enriched features, and a two-stage training strategy progressively optimizes alignment and retrieval objectives. Experiments on standard UCDR benchmarks demonstrate that PF-UCDR significantly outperforms existing methods, validating the effectiveness of structure-aware local fusion grounded in multimodal pretraining. Our code is publicly available at https://github.com/djzgroup/PF-UCDR.	en_US
dc.description.sectionheaders	Multi-Modality
dc.description.seriesinformation	Pacific Graphics Conference Papers, Posters, and Demos
dc.identifier.doi	10.2312/pg.20251279
dc.identifier.isbn	978-3-03868-295-0
dc.identifier.pages	10 pages
dc.identifier.uri	https://doi.org/10.2312/pg.20251279
dc.identifier.uri	https://diglib.eg.org/handle/10.2312/pg20251279
dc.publisher	The Eurographics Association	en_US
dc.rights	Attribution 4.0 International License
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	CCS Concepts: Computing methodologies → Computer vision tasks; Visual content-based indexing and retrieval; Image representations
dc.subject	Computing methodologies → Computer vision tasks
dc.subject	Visual content
dc.subject	based indexing and retrieval
dc.subject	Image representations
dc.title	PF-UCDR: A Local-Aware RGB-Phase Fusion Network with Adaptive Prompts for Universal Cross-Domain Retrieval	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: pg20251279.pdf
Size:: 5.84 MB
Format:: Adobe Portable Document Format

Download

Collections

PG2025 Conference Papers, Posters, and Demos