Big Data Hubris?

A recent article in the magazine SCIENCE presents an important warning to those who see Big Data as the solution to all problems. The article (“The Parable of Google Flu: Traps in Big Data Analysis” by Lazer, Kennedy, King, and Vespignani) tells of serious flaws in what has been one of the canonical examples of Big Data at work. The article was also featured in a recent blog post at the NY Times.

Data Scientists as Editors

I came across this intereseting essay from Moritz Stefaner in which he outlines his perspectve on the recent focus on “storytelling” within the data visualization community. I largely agree with his thesis, especially his point about viewing a visualization designer as an author, one who brings to bear editorial decisions about what data to use, where to focus, and what to exclude. There are degrees of editorialship, but even the most quantitative methods include some degree of human decision.

Statistics and Intuition

This morning I came across this TED talk by Peter Donnelly. It was was originally posted way back in 2006, so this isn’t something new. However, it is worth a listen for anyone thinking about data science and its applications. The moral of the story is that statistics are often counter-intuitive, and poor applications of statistical methods or invalid assumptions can lead to dramatically incorrect conclusions. [iframe width=“560” height=“315” src=“http://embed.

P Values and Statistical Errors

Are P values the “gold standard” for statistically significant findings? From an article today that appeared today in Nature: "The irony is that when UK statistician Ronald Fisher introduced the P value in the 1920s, he did not mean it to be a definitive test. He intended it simply as an informal way to judge whether evidence was significant in the old-fashioned sense: worthy of a second look." In the era of “big data” and automated algorithms that can test thousands of correlations, we are essentially guaranteed to find statistical significant P values in our analyses.

Racial Stereotypes and Survey Bias

We generally tend to think of racial identifiers as static labels, an attribute we are born with like gender. I am—and have always been—a white male. Moreover, we know that cultural stereotypes around race have a strong influence on how people are perceived. For this reason, studies are often conducted to capture the degree to which racial identification leads to measurable disparities. We use these presumably static labels to report on differences in health, economics, education, and more.