Evaluating Zero-Shot Monocular Depth Estimation Models for Tactile Rendering of Paintings

Magherini, Roberto; Servi, Michaela; Buonamici, Francesco; Furferi, Rocco

Evaluating Zero-Shot Monocular Depth Estimation Models for Tactile Rendering of Paintings

dc.contributor.author	Magherini, Roberto	en_US
dc.contributor.author	Servi, Michaela	en_US
dc.contributor.author	Buonamici, Francesco	en_US
dc.contributor.author	Furferi, Rocco	en_US
dc.contributor.editor	Campana, Stefano	en_US
dc.contributor.editor	Ferdani, Daniele	en_US
dc.contributor.editor	Graf, Holger	en_US
dc.contributor.editor	Guidi, Gabriele	en_US
dc.contributor.editor	Hegarty, Zackary	en_US
dc.contributor.editor	Pescarin, Sofia	en_US
dc.contributor.editor	Remondino, Fabio	en_US
dc.date.accessioned	2025-09-05T20:56:56Z
dc.date.available	2025-09-05T20:56:56Z
dc.date.issued	2025
dc.description.abstract	Access to pictorial art remains a significant challenge for visually impaired individuals, as 2D paintings require transformation into tactile 2.5D/3D models. While deep learning offers promising tools for monocular depth estimation (MDE), applying state-of-the-art zero-shot models to artworks presents unique difficulties due to artistic conventions (perspective, lighting, texture) and the lack of ground truth, especially concerning details crucial for tactile perception. This paper addresses this gap by qualitatively evaluating a wide range of SOTA zero-shot MDE models - including DepthAnything (v1/v2), Marigold, Metric3D v2, ZoeDepth, UniDepth (v1/v2/v2_old), GeoWizard (v1/v2), and Depth-Pro - on their ability to generate depth maps suitable for tactile rendering from two 20th-century Italian paintings with distinct styles and input qualities. The assessment, based on criteria like detail preservation, contour definition, spatial coherence, and artifact absence, reveals that while zero-shot models can interpret basic spatial structures, performance varies considerably. Models such as DepthAnything v2 and GeoWizard v2 demonstrated superior capabilities in preserving key features for tactile fruition, emerging as promising candidates. However, no model produced a directly usable output, highlighting persistent challenges in handling artistic styles and pictorial textures. This study provides the first systematic comparison in this niche application, offering practical insights for cultural institutions aiming to leverage AI for accessibility. It concludes that current zero-shot models, while valuable starting points requiring validation and refinement, show significant potential but also underscore the need for further research in areas like targeted post-processing, art-specific metrics, and user-centered validation to make cultural heritage truly accessible to all.	en_US
dc.description.sectionheaders	Digital Technologies for CHANGES (CHANGES SESSION) - Part 3
dc.description.seriesinformation	Digital Heritage
dc.identifier.doi	10.2312/dh.20253048
dc.identifier.isbn	978-3-03868-277-6
dc.identifier.pages	8 pages
dc.identifier.uri	https://doi.org/10.2312/dh.20253048
dc.identifier.uri	https://diglib.eg.org/handle/10.2312/dh20253048
dc.publisher	The Eurographics Association	en_US
dc.rights	Attribution 4.0 International License
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.title	Evaluating Zero-Shot Monocular Depth Estimation Models for Tactile Rendering of Paintings	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: dh20253048.pdf
Size:: 1.08 MB
Format:: Adobe Portable Document Format

Download

Collections

Track 12 – Digital Technologies for CHANGES