In my May post for this blog, I wrote about a piece by Stanford professor Dr. John Ioannidis and his colleagues, detailing why, as they put it "small sample size undermines the reliability of neuroscience." [See previous blog post: Why Most Published Neuroscience Findings are False] As you might imagine, Ioannidis's piece ruffled some feathers. In this month's issue of Nature Reviews Neuroscience, the rest of the neuroscience community has its rejoinder.
Here is a brief play-by-play.
Neuroscience needs a theory.
First up: John Ashton of the University of Otago, New Zealand. He argues that increasing the sample size in neuroscience is not the most important problem facing analysis and interpretation of our experiments. In fact, he says, increasing the sample size just encourages hunting around for ever-smaller and ever-less-meaningful effects. With enough samples, any effect, no matter how small, will eventually pass for statistically significant. Instead, he believes neuroscientists should focus on experiments that directly test a theoretical model. We should conduct experiments that have clear, obviously-nullifiable hypotheses and some predictable effect size (based on the theoretical model). Continuing to chase after smaller and smaller effects, without linking them to a larger framework, he argues, will cause neuroscience research to degenerate into "mere stamp collecting" (a phrase he borrows from Ernest Rutherford...who believed that "all science is either physics or stamp collecting".)
Ioannidis and company reply, first by agreeing that having a theoretical framework and a good estimate of effect size would be great, but these ideals are not always possible. They also state that sometimes very small effects are meaningful, as in genome-wide association studies, and that larger sample size will provide a better estimate of those effect sizes.
“Surely God loves the 0.06 nearly as much as the 0.05”
Next up: Peter Bacchetti of the University of California, San Francisco. Like Ashton, Bacchetti believes that small sample size is not the real problem in neuroscience research. He identifies yet another issue in our research practices, however, arguing that the real problem is a blind adherence to the standard of p = 0.05. Dichotomizing experimental findings into successful and unsuccessful bins (read...publishable and basically unpublishable bins) based on this arbitrary cutoff leads to publication bias, misinterpretation of the state of the field, and difficulty generating meaningful meta-analyses (not to mention the terrible incentive placed on scientists to cherry-pick data, experiments, animals, analyses, etc. that “work”).
Ioannidis and colleagues essentially agree, saying that a more reasonable publication model would involve publishing all experiments’ effect sizes with confidence intervals, rather than just p-values. As this "would require a major restructuring of the incentives for publishing papers" and "has not happened," however, Ioannidis and company argue that we should fix a tractable research/analysis problem and do our experiments with a more reasonable sample size.
Mo samples mo problems.
Finally: Philip Quinlan of the University of York, UK. Quinlan cites a paper titled "Ten ironic rules for non-statistical reviewers" to make the argument that small sample size studies really aren't so bad after all. Besides, he says, experiments that require a large sample size are just hunting for very small effects.
Ioannidis and company essentially dismiss Dr. Quinlan entirely. They respond that underpowered studies will necessarily miss effects that are not truly huge. Larger studies allow a more precise estimation of effect size, which is useful whether the effect is large or small, and finally, what constitutes a "meaningful" effect size is often not known in advance. Such an assessment depends entirely on the question and data already at hand.
There you have it, folks! If you have any of your own correspondence, feel free to post it in the comments section.