Ling, PengMo, HaoranGao, ChengyingYang, YinParakkat, Amal D.Deng, BailinNoh, Seung-Tak2022-10-042022-10-042022978-3-03868-190-8https://doi.org/10.2312/pg.20221238https://diglib.eg.org:443/handle/10.2312/pg20221238Scene sketch segmentation based on referring expression plays an important role in sketch editing of anime industry. While most existing referring image segmentation approaches are designed for the standard task of generating a binary segmentation mask for a single or a group of target(s), we think it necessary to equip these models with the ability of multi-instance segmentation. To this end, we propose GRM-Net, a one-stage framework tailored for multi-instance referring image segmentation of scene sketches. We extract the language features from the expression and fuse it into a conventional instance segmentation pipeline for filtering out the undesired instances in a coarse-to-fine manner and keeping the matched ones. To model the relative arrangement of the objects and the relationship among them from a global view, we propose a global reference mechanism (GRM) to assign references to each detected candidate to identify its position. We compare with existing methods designed for multi-instance referring image segmentation of scene sketches and for the standard task of referring image segmentation, and the results demonstrate the effectiveness and superiority of our approach.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Scene understanding; Image SegmentationComputing methodologies → Scene understandingImage SegmentationMulti-instance Referring Image Segmentation of Scene Sketches based on Global Reference Mechanism10.2312/pg.202212387-126 pages