Chen, ChenLiu, ZhichengBruckner, StefanRaidou, Renata G.Turkay, Cagatay2023-06-102023-06-1020231467-8659https://doi.org/10.1111/cgf.14855https://diglib.eg.org:443/handle/10.1111/cgf14855We present a state-of-the-art report on visualization corpora in automated chart analysis research. We survey 56 papers that created or used a visualization corpus as the input of their research techniques or systems. Based on a multi-level task taxonomy that identifies the goal, method, and outputs of automated chart analysis, we examine the property space of existing chart corpora along five dimensions: format, scope, collection method, annotations, and diversity. Through the survey, we summarize common patterns and practices of creating chart corpora, identify research gaps and opportunities, and discuss the desired properties of future benchmark corpora and the required tools to create them.CCS Concepts: Computing methodologies -> Machine learning; Human-centered computing -> VisualizationComputing methodologiesMachine learningHuman centered computingVisualizationThe State of the Art in Creating Visualization Corpora for Automated Chart Analysis10.1111/cgf.14855449-47022 pages