SimBaTex: Similarity-based Text Exploration

Witschard, DanielJusufi, IlirKerren, AndreasByška, Jan and Jänicke, Stefan and Schmidt, Johanna2021-06-122021-06-122021978-3-03868-144-1https://doi.org/10.2312/evp.20211067https://diglib.eg.org:443/handle/10.2312/evp20211067Natural language processing in combination with visualization can provide efficient ways to discover latent patterns of similarity which can be useful for exploring large sets of text documents. In this poster abstract, we describe the ongoing work on a visual analytics application, called SimBaTex, which is based on embedding technology, dynamic specification of similarity criteria, and a novel approach for similarity-based clustering. The goal of SimBaTex is to provide search-and-explore functionality to enable the user to identify items of interest in a large set of text documents by interactive assessment of both high-level similarity patterns and pairwise similarity of chosen texts.Human centered computingVisual analyticsInformation systemsContent analysis and feature selectionSimBaTex: Similarity-based Text Exploration10.2312/evp.202110675-7