HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques

dc.contributor.authorChatzimparmpas, A.en_US
dc.contributor.authorPaulovich, F. V.en_US
dc.contributor.authorKerren, A.en_US
dc.contributor.editorHauser, Helwig and Alliez, Pierreen_US
dc.date.accessioned2023-03-22T15:07:13Z
dc.date.available2023-03-22T15:07:13Z
dc.date.issued2023
dc.description.abstractDespite the tremendous advances in machine learning (ML), training with imbalanced data still poses challenges in many real‐world applications. Among a series of diverse techniques to solve this problem, sampling algorithms are regarded as an efficient solution. However, the problem is more fundamental, with many works emphasizing the importance of instance hardness. This issue refers to the significance of managing unsafe or potentially noisy instances that are more likely to be misclassified and serve as the root cause of poor classification performance.This paper introduces HardVis, a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios. Our proposed system assists users in visually comparing different distributions of data types, selecting types of instances based on local characteristics that will later be affected by the active sampling method, and validating which suggestions from undersampling or oversampling techniques are beneficial for the ML model. Additionally, rather than uniformly undersampling/oversampling a specific class, we allow users to find and sample easy and difficult to classify training instances from all classes. Users can explore subsets of data from different perspectives to decide all those parameters, while HardVis keeps track of their steps and evaluates the model's predictive performance in a test set separately. The end result is a well‐balanced data set that boosts the predictive power of the ML model. The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case. Finally, we also look at how useful our system is based on feedback we received from ML experts.en_US
dc.description.number1
dc.description.sectionheadersArticles
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume42
dc.identifier.doi10.1111/cgf.14726
dc.identifier.issn1467-8659
dc.identifier.pages135-154
dc.identifier.urihttps://doi.org/10.1111/cgf.14726
dc.identifier.urihttps://diglib.eg.org:443/handle/10.1111/cgf14726
dc.publisherEurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd.en_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectinstance hardness
dc.subjectimbalanced data
dc.subjectsampling techniques
dc.subjectmachine learning
dc.subjectvisual analytics
dc.subjectvisualization
dc.titleHardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniquesen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
v42i1pp135-154-cgf14726.pdf
Size:
2.11 MB
Format:
Adobe Portable Document Format
Collections