Visual Analytics to Combat Selection Bias in Retrospective EHR Data Analyses


Retrospective analyses of electronic health records (EHRs) and other health data sources are increasingly common as investigators seek to employ data collected during routine health care delivery to learn about health practices and outcomes. The data analysis process typically includes cohort selection, in which a group of patients (and their corresponding data) are identified from within a health organization’s overall population of patients. However, with such systems, the lack of randomization combined with the high level of expected interdependence between variables can produce cohorts that are highly skewed in ways that are unexpected to the analyst. To combat these effects, we are developing new methods and tools to make these selection bias effects more transparent to analysts. In this submission, we provide a high-level overview of a new form of visualization-based selection bias report which, when integrated within a cohort selection system, can make it easy for analysts to discover unexpected shifts in cohort variable distributions between a selected cohort and a baseline population which might otherwise go unnoticed.

American Medical Informatics Association (AMIA) Annual Symposium Podium Abstract