There have been a number of papers in recent years addressing the use and misuse of statistics in the neuroscience literature. The inappropriate application of statistical methods, and errors in interpretation, appear to be a widespread issue (Nieuwenhuis et al., 2011). Furthermore, whether the data have been treated and interpreted correctly is often difficult to evaluate when reading papers (sometimes due to a lack of information in the report), and likewise to address when analyzing data and writing papers. (As a disclaimer, I have to admit that I probably made some of the same mistakes, and still find applying the appropriate tests and experimental design a challenge. We do our best, and always try to do better, but reader beware….)
Unfortunately, this problem is embedded in the collective culture of the field, and in the past has often not even been recognized. As a journal editor and reviewer, I am surprised by how many papers are submitted that have not applied the appropriate statistical tests (especially corrections for doing multiple tests on a data set), have misused tests, or have failed to report enough information to allow the reader to make an independent evaluation of the results. This continues to occur, even when the journal instructions to authors are explicit about reporting requirements. When reporting is incomplete, the reader is forced to make assumptions regarding the author’s approach, and take on faith that they have applied the process correctly, consistently, and fairly. In addition, the often used “p < 0.05” standard can be misleading, and applying such a fixed threshold to drive the scientific interpretation of a series of experiments can lead to missed interpretations, and rejection of hypotheses that should not yet be discarded (Higgs, 2013). Measurements in biological systems usually reveal a significant amount of “biological variability”, which compounds the problem. Scientists want to see a an effect of a manipulation to support or refute their hypothesis, while at the same time doing as few experiments as necessary. Experiments are time-consuming and expensive, grant money is scarce, there is pressure to reduce animal usage, and the pressure to publish quickly is high. Over the years this has led to a culture of doing small, stepwise, experiments, which are often interpreted by walking down a path defined by “p < 0.05”. Although this approach may be good for testing whether an hypothesis is reasonable or not, it is still fraught with traps, can easily lead to an answer that is not representative of the underlying mechanisms, and contributes to the larger problem of irreproducibility, a topic of concern at the top levels of funding agencies. Related to this, the need for appropriate sample sizes have been recently discussed (Krzywinski and Altman, 2013)
There are many issues to be addressed in this larger problem. Various initiatives have been announced and implemented to try to deal with the larger problems.
First, the main issue really goes to how we, as neuroscientists (or scientists in general), teach our students and post docs best practices for how to choose the appropriate approaches, how to design experiments, and how to present their data. In the past, this has been solely the domain of the trainees’ lab, but has evolved into part of the curriculum in many graduate programs. In addition, we need our trainees to recognize that statistics is an evolving science, and a bit of an art. It can be informative to read the “R” discussion groups to get a sense of how different statisticians approach problems. We need to train our students that hypotheses are never “proven”, and that most arguments follow “patterns of plausible inference” (Polya, 1954) that have an implicit uncertainty with every conclusion, rather than binary logic. We need to teach them to hold to high standards for the design and interpretation of experiments. We need to teach them to be skeptical when reading the literature. We need to teach them to be cautious in interpreting experimental outcomes, and about best practices in experimental design and execution. These are not new areas, but I sense that they are being neglected in some quarters.
Second, the gatekeepers for published works - the editors and reviewers - have to take responsibility for improving the state of the field. Many journals have added specific instructions to authors to try to partially address the problems in reporting statistics; for example see the Journal of Neurophysiology in the mid 2000’s (Current-Everett and Benos, 2004). Unfortunately, reviewers and editors are spotty about enforcing the requirements, and few journals can afford to retain a team of statisticians to evaluate every paper that undergoes review. However, some journals are now attempting to address the problems. For example the Nature journals, which (amongst others) have come under fire for this very issue, have published papers that investigate and reveal the classes of problems in published work, and have issued new guidelines for reporting (Nature, 2013) that will, at least partially, make the application of statistical analyses and some data analyses more transparent. Other journals have already raised the bar (several a couple of years ahead of the Nature journals), and I suspect that it will not be long before full disclosure is required by most, if not all, journals.
Third, the status of funding for research has an impact, both in being able to achieve statistical nirvana and in reducing irreproducibility. In the biomedical sciences, the majority of the funding is from NIH, and the NIH budget has not grown even with inflation for the past several years. This has made competition for resources tight, and even when grants are awarded, the budgets are often cut from what was requested, leaving labs short personnel and supplies. There are good reasons for the Institutes making these decisions in times of tight funding, but the consequences have an impact on the resulting quality (and quantity) of science, and ultimately on the progress that can be made over the lifetime of a grant. However, this is a complex issue that deserves much more discussion than can be covered here.
Polya, G. Patterns of Plausible Inference. Princeton University Press, 1954. ISBN:9780691025100.