• Login
    View Item 
    •   Eurographics DL Home
    • Eurographics Workshops and Symposia
    • EG GCH: EUROGRAPHICS Workshop on Graphics and Cultural Heritage
    • GCH 2020 - Eurographics Workshop on Graphics and Cultural Heritage
    • View Item
    •   Eurographics DL Home
    • Eurographics Workshops and Symposia
    • EG GCH: EUROGRAPHICS Workshop on Graphics and Cultural Heritage
    • GCH 2020 - Eurographics Workshop on Graphics and Cultural Heritage
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A NLP Enhanced Visual Analytics Tool for Archives Metadata

    Thumbnail
    View/Open
    083-083.pdf (143.5Kb)
    Date
    2020
    Author
    Ozdemir, Anil
    Müstecep, Dilara
    Agaoglu, Orhan
    Balcisoy, Selim
    Pay-Per-View via TIB Hannover:

    Try if this item/paper is available.

    Metadata
    Show full item record
    Abstract
    Today, almost all cultural heritage (CH) institutions are starting to digitize parts of their collections and archives to improve accessibility, preservation of originals, publicity, and visibility of the institution on the Internet. With this recent development, digital document collections have been multiplying. These collections are spread over more than one area of life in a vast domain, including art, history, mathematics, physics, etc. Such a situation creates a substantial volume of documents digitally available. Also, it creates the need for various approaches that allow users to understand latent meanings in collections, discover and investigate relationships, and extract the necessary information from collections. To address this need, we introduce a visual exploratory tool that facilitates the uncovering of hidden information and stories underlying documents, extracting the key individuals, temporal expressions, locations, entities, and keywords within the documents ,establishing a network between documents and allow researchers and archivists to form and test hypotheses and observe individual relationships, networks, and stories present in the archives metadata collections.Consequently, we have designed and developed a visual exploration tool for large archives with limited metadata employing state of the art Natural Language Processing (NLP) techniques to assist cultural heritage researchers. To design such a tool, we have collaborated with archive professionals from an cultural institution, SALT (https:// saltonline.org/) which focused on public service producing research-based exhibitions, publications, and digitization projects. As a result of our conversations Salt team we decided to use Waqfs of Crete which is an archive consisting of official records of Muslim inhabitants of Crete. Documents spanning the period from 1825 to 1928 in Ottoman Turkish and Greek provide an opportunity to examine the multi-layered social structure on the island, especially from a cultural and economic perspective. The metadata contains information for approximately 10 thousand documents and includes the summary of those documents, the year they were published, the location, the language used, and the documents' picture. Also, We extracted various features including locations, key individuals, dates, entities and keywords from the document summaries on metadata using NLP methods including regular expressions for extracting , and word embedding models for capturing similarities between documents. We have integrated all of these features into designed tool to let the user to see networks that can represent the relationship between documents, as well as easily access similar documents in the archive. In the network we demonstrated, particular nodes correspond to the documents itself. To assign an weighted edge between two documents in the network, the total number of shared individuals and keywords between documents are computed and edges are set based on a predetermined threshold value. This threshold has been found by manually tweaking both considering the speed at which the result is reflected on the application and average number of shared attributes. To capture similarity between documents, we used state-of-theart word embedding models including Word2vec, FastText and Transformer which provides a method to compute dense vector representations for documents. Consequently, each document was represented as fixed-sized mathematical vectors as output of each model, and the similarity between documents was calculated by taking the arithmetic cosine similarities of vectors. The designed interface consisting of six components which includes interactive map that allows the user to view documents in different locations and view the document networks that formed by calculating total number of shared attributes between documents. Remaining components include information box that contains document-specific attributes such as location, time, person, entities, and keyword, document browser that enable users and researchers to browse documents easily, individual and keyword search menu and filtering panel. In this way, the users may find documents that are roughly related to each other very quickly. Later, the user can browse each document on its network and view documents that have common individuals and keywords with each other. Thus, the user may follow the interactions between documents like a story and able to do this for all the people who lived in the 19th century on Crete's island.
    BibTeX
    @inproceedings {h.20201297,
    booktitle = {Eurographics Workshop on Graphics and Cultural Heritage},
    editor = {Spagnuolo, Michela and Melero, Francisco Javier},
    title = {{A NLP Enhanced Visual Analytics Tool for Archives Metadata}},
    author = {Ozdemir, Anil and Müstecep, Dilara and Agaoglu, Orhan and Balcisoy, Selim},
    year = {2020},
    publisher = {The Eurographics Association},
    ISSN = {2312-6124},
    ISBN = {978-3-03868-110-6},
    DOI = {10.2312/gch.20201297}
    }
    URI
    https://doi.org/10.2312/gch.20201297
    https://diglib.eg.org:443/handle/10.2312/gch20201297
    Collections
    • GCH 2020 - Eurographics Workshop on Graphics and Cultural Heritage

    Eurographics Association copyright © 2013 - 2020 
    Send Feedback | Contact - Imprint | Data Privacy Policy | Disable Google Analytics
    Theme by @mire NV
    System hosted at  Graz University of Technology.
    TUGFhA
     

     

    Browse

    All of Eurographics DLCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    BibTeX | TOC

    Create BibTeX Create Table of Contents

    Eurographics Association copyright © 2013 - 2020 
    Send Feedback | Contact - Imprint | Data Privacy Policy | Disable Google Analytics
    Theme by @mire NV
    System hosted at  Graz University of Technology.
    TUGFhA