With data analysis being the key to assessing scientific validity, low probability values (or p-values) continue to be the gold standard for determining the quality of research results. Should this be the case, however? Christie Aschwanden summarized this quandary in a fivethirtyeight.com blog last year.
The type of method used to analyze numerical results can give different results depending on which method is used. This may lead to either unconscious or conscious use of statistical methods that will result in a low enough p value to validate the results of scientific studies. Some data analysts suspect that widespread so-called p-hacking (tweaking data to obtain a p-value of less than 0.05) calls into question the validity of many research studies.
To examine the variability of data analysis, The Center for Open Science’s co-founder Brain Nosek used a crowdsourcing project to examine different ways of analyzing the same set of data. The team invited highly competent data scientists to take part.
This experiment involved examining the same data set to answer the question of whether soccer referees gave more red cards to dark-skinned rather than light-skinned athletes. Twenty-nine teams containing a total of 61 analysts used a wide variety of methods to analyze the data ranging from linear regression to Bayesian approaches. The data scientists also had the freedom to decide what secondary variables they used in their analyses.
Surprisingly, the researchers got a variety of results:
- 29 teams found that the referees did in fact give more red cards to dark-skinned players
- 9 teams found no significant relationship between red cards and skin color
This disparity of results significantly calls into question the validity of using one type of data analysis to answer a research question. It also highlights the subjective choices that skilled researchers must make when they analyze data. This type of study reinforces the need to replicate and expand scientific inquiry to find the truth.