R-CNN based PolygonalWedge Detection Learned from Annotated 3D Renderings and Mapped Photographs of Open Data Cuneiform Tablets

Stötzner, ErnstHomburg, TimoBullenkamp, Jan PhilippMara, HubertBucciero, AlbertoFanini, BrunoGraf, HolgerPescarin, SofiaRizvic, Selma2023-09-022023-09-022023978-3-03868-217-22312-6124https://doi.org/10.2312/gch.20231157https://diglib.eg.org:443/handle/10.2312/gch20231157Motivated by the demands of Digital Assyriology and the challenges of detecting cuneiform signs, we propose a new approach using R-CNN architecture to classify and localize wedges. We utilize the 3D models of 1977 cuneiform tablets from the Frau Professor Hilprecht Collection available as pen data. About 500 of these tablets have a transcription available in the Cuneiform Digital Library Initiative (CDLI) database. We annotated 21.000 cuneiform signs as well as 4.700 wedges resulting in the new open data Mainz Cuneiform Benchmark Dataset (MaiCuBeDa), including metadata, cropped signs, and partially wedges. The latter is also a good basis for manual paleography. Our inputs are MSII renderings computed using the GigaMesh Software Framework and photographs having the annotations automatically transferred from the renderings. Our approach consists of a pipeline with two components: a sign detector and a wedge detector. The sign detector uses a RepPoints model with a ResNet18 backbone to locate individual cuneiform characters in the tablet segment image. The signs are then cropped based on the sign locations and fed into the wedge detector. The wedge detector is based on the idea of Point RCNN approach. It uses a Feature Pyramid Network (FPN) and RoI Align to predict the positions and classes of the wedges. The method is evaluated using different hyperparameters, and post-processing techniques such as Non-Maximum Suppression (NMS) are applied for refinement. The proposed method shows promising results in cuneiform wedge detection. Our detector was evaluated using the Gottstein system and with the PaleoCodage encoding. Our results show that the sign detector performs better when trained on 3D renderings than photographs. We showed that detectors trained on photographs are usually less accurate. The accuracy on photographs improves when trained, including 3D renderings. Overall, our pipeline achieves decent results, with some limitations due to the relatively small amount of data. However, even small amounts of high-quality renderings of 3D datasets with expert annotations dramatically improved sign detection.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Object detection; Machine learning; Applied computing → ArchaeologyComputing methodologies → Object detectionMachine learningApplied computing → ArchaeologyR-CNN based PolygonalWedge Detection Learned from Annotated 3D Renderings and Mapped Photographs of Open Data Cuneiform Tablets10.2312/gch.2023115747-5610 pages