Visualization Model Validation via Inline Replication

Abstract

Data visualizations typically show a representation of a data set with little to no focus on the repeatability or generalizability of the displayed trends and patterns. However, insights gleaned from these visualizations are often used as the basis for decisions about future events. Visualizations of retrospective data therefore often serve as “visual predictive models.” However, this visual predictive model approach can lead to invalid inferences. In this article, we describe an approach to visual model validation called Inline Replication. Inline Replication is closely related to the statistical techniques of bootstrap sampling and cross-validation and, like those methods, provides a non-parametric and broadly applicable technique for assessing the variance of findings from visualizations. This article describes the overall Inline Replication process and outlines how it can be integrated into both traditional and emerging “big data” visualization pipelines. It also provides examples of how Inline Replication can be integrated into common visualization techniques such as bar charts and linear regression lines. Results from an empirical evaluation of the technique and two prototype Inline Replication–based visual analysis systems are also described. The empirical evaluation demonstrates the impact of Inline Replication under different conditions, showing that both (1) the level of partitioning and (2) the approach to aggregation have a major influence over its behavior. The results highlight the trade-offs in choosing Inline Replication parameters but suggest that using 𝑛=5 partitions is a reasonable default.

Publication
Information Visualization (Volume 18, Issue 4)