Visual Analytics Approaches for Descriptor Space Comparison and the Exploration of Time Dependent Data

Bremm, Sebastian2015-01-212015-01-212013-12-02https://diglib.eg.org/handle/10.2312/8299https://doi.org/10.2312/diss.20138299Modern technologies allow us to collect and store increasing amounts of data. However, their analysis is often difficult. For that reason, Visual Analytics combines data mining and visualization techniques to explore and an- alyze large amounts of complex data. Visual Analytics approaches exist for various problems and applications, but all share the idea of a tight combination of visualization and automatic analysis. Their respective implemen- tations are highly specialized on the given data and the analytical task. In this thesis I present new approaches for two specific topics, visual descriptor space comparison and the analysis of time series. Visual descriptor space comparison enables the user to analyze different representations of complex datasets e.g., phylogenetic trees or chemical compounds. I propose approaches for data sets with hierarchic or unknown structure, each combining an automatic analysis with interactive visualization. For hierarchically organized data, I suggest a novel similarity score embedded in an interactive analysis framework linking different views, each specialized on a particular analytical tasks. This analysis framework is evaluated in cooperation with biologists in the area of phylogenetic research. To extend the scalability of my approach, I introduce CloudTrees, a new vi- sualization technique for the comparison of large trees with thousands of leaves. It reduces overplotting problems by ensuring the visibility of small but important details like high scoring subtrees. For the comparison of data with unknown structure, I assess several state of the art projection quality measures to analyze their capability for descriptor comparison. For the creation of appropriate ground truth test data. I suggest an interactive tool called PCDC for the controlled creation of high dimensional data with different properties like data distribution or number and size of contained clusters. For the visual comparison of unknown structured data, I introduce a technique which bases on the comparison of two dimensional projections of the descriptors using a two dimensional colormap. I present the approach for scatterplots and extended it to Self- Organizing Maps (SOMs) including reliability encoding. I embed the automatic and visual comparison in an interactive analysis pipeline, which automatically calculates a set of representative descriptors out of a larger collection of descriptors. For a deeper analysis of the proposed result and the underlying characteristics of the input data, the analyst can follow each step of the pipeline. The approach is applied to a large set of chemical data in a high throughput screening analysis scenario. For the analysis of time dependent, categorical data I propose a new approach called Time Parallel Sets (TIPS). It focuses on the analysis of group changes of objects in large datasets. Different automatic algorithms identify and select potentially interesting points in time for a detailed analysis. The user can interactively track groups or single objects, add or remove selected points in time or change parameters of the detection algorithms according to the analytical goal. The approach is applied to two scenarios: Emergency evacuation of buildings and tracking of mobile phone calls over long time periods. Large time series can be compressed by transforming them into sequences of symbols whereas each symbol represents a set of similar subsequences in time. For these time sequences, I propose new visual-analytical tools, starting with an interactive, semi-automatic definition of symbol similarity. Based on this, the sequences are visualized using different linked views, each specialized on other analytical problems. As an example usecase, a financial dataset containing the risk estimations and return values of 60 companies over 500 days is analyzed.application/pdf-Visual Analytics Approaches for Descriptor Space Comparison and the Exploration of Time Dependent DataText.PhDThesis10.2312/diss.20138299