Large Data Scalability in Interactive Visual Analysis

Piringer, Harald

Large Data Scalability in Interactive Visual Analysis

Files

piringer.pdf (14.92 MB)

Date

2011

Authors

Piringer, Harald

Publisher

Piringer

Text.PhDThesis

Abstract

In many areas of science and industry, the amount of data is growing fast and often already exceeds the ability to evaluate it. On the other hand, the unprecedented amount of available data bears an enormous potential for supporting decision-making. Turning data into comprehensible knowledge is thus a key challenge of the 21st century. The power of the human visual system makes visualization an appropriate method to comprehend large data. In particular interactive visualization enables a discourse between the human brain and the data that can transform a cognitive problem to a perceptual one. However, the visual analysis of large and complex datasets involves both visual and computational challenges. Visual limits involve perceptual and cognitive limitations of the user and restrictions of the display devices while computational limits are related to the computational complexity of the involved algorithms. The goal of this thesis is to advance the state of the art in visual analysis with respect to the scalability to large datasets. Due to the multifaceted nature of scalability, the contributions span a broad range to enhance computational scalability, to improve the visual scalability of selected visualization approaches, and to support an analysis of high-dimensional data. Concerning computational scalability, this thesis describes a generic architecture to facilitate the development of highly interactive visual analysis tools using multi-threading. The architecture builds on the separation of the main application thread and dedicated visualization threads, which can be cancelled early due to user interaction. A quantitative evaluation shows fast visual feedback during continuous interaction even for millions of entries. Two variants of scatterplots address the visual scalability of different types of data and tasks. For continuous data, a combination of 2D and 3D scatterplots intends to combine the advantages of 2D interaction and 3D visualization. Several extensions improve the depth perception in 3D and address the problem of unrecognizable point densities in both 2D and 3D. For partly categorical data, the thesis contributes Hierarchical Difference Scatterplots to relate multiple hierarchy levels and to explicitly visualize differences between them in the context of the absolute position of pivoted values. While comparisons in Hierarchical Difference Scatterplots are only qualitative, this thesis also contributes an approach for quantifying subsets of the data by means of statistical moments for a potentially large number of dimensions. This approach has proven useful as an initial overview as well as for a quantitative comparison of local features like clusters. As an important application of visual analysis, the validation of regression models also involves the scalability to multi-dimensional data. This thesis describes a design study of an approach called HyperMoVal for this task. The key idea is to visually relate n-dimensional scalar functions to known validation data within a combined visualization. The integration with other multivariate views is a step towards a user-centric workflow for model building. Being the result of collaboration with experts in engine design, HyperMoVal demonstrates how visual analysis is suitable to significantly improve real-world tasks. Positive user feedback suggests a high impact of the contributions of this thesis also outside the visualization research community. Moreover, most contributions of this thesis have been combined in a commercially distributed software framework for engineering applications that will hopefully raise the awareness and promote the use of visual analysis in multiple application domains.