Statistical Thinking for Data Science

[caption id=“attachment_631” align=“alignright” width=“150”]SciPy 2015 Conference SciPy 2015 Conference[/caption]

In some of my recent research work, I’ve been thinkg about ways to expand the typical exploratory visualization process to incorporate more rigorous statistical assessments of the underlying samples being visualized.  The power of visualization is that it can help people make quick inferences about complex data.  Of course, this is also the danger.  Quick inferences often overlook subtle—and often invisible—issues like selection bias or confounding factors.  This talk Chris Fonnesbeck at SciPy 2015 makes several good points emphasizing that “more data” isn’t the solution for all problems.  His talk doesn’t directly address visualization, but the issues he raises are broadly important.