Reproducibility Crisis: Fallacies to Be Wary of and Ways to De-Bias

Posted by Michele Mei on Aug 23, 2018 9:30:00 AM

While the scientific community is enveloped in a reproducibility crisis (and debates as to whether there is one), there are certainly steps life science researchers can take to ensure more reproducible outcomes. We can start by limiting self-bias and improving reporting standards. But first, what is reproducibility and why is there a crisis?

Reproducibility is a foundation of experimental science that serves to validate results, much like the idea of (but not to be confused with) replicability.

Reproducibility refers to whether a study and its results can be, respectively, repeated and obtained, when using the same protocol, even when experimental conditions (e.g. researcher, location, facilities) vary to some degree.

In other words, reproducibility is a way for science to check itself. It ensures that results are not just a product of chance. Successfully reproduced results serve to validate, while failure to do so raises question of questionable research practices. Indeed, there has been a growing narrative of a reproducibility crisis in the scientific community. In a survey conducted by Nature, 90% of >1500 researchers indicated that there was a “significant” or “slight” reproducibility crisis. However, outright fraud or fabrication appears to only be a tiny piece (1-2%) of the puzzle. Despite concerns of a reproducibility crisis, a majority of the same Nature survey respondents still indicated that they trust the existing published literature.

As someone who has spent nearly five years surrounded by and working in academic research, I can understand this sentiment. Scientists seek to discover truth; for the majority, deliberate falsification would be unthinkable.

Instead, what is more likely affecting the reproducibility of studies is unconscious bias and insufficient reporting. When are we vulnerable to these fallacies and what can we do to catch ourselves?

Ways to de-bias and improve reporting

As humans, we all have cognitive biases, even unconsciously (especially unconsciously). When it comes to research, pressure for “significance” or partiality for any hypothesis may lead to practices that are subconsciously geared toward that outcome.

1. Finding Random Patterns

animal in the clouds

It is human nature to find patterns in randomness. Think of a Rorschach test or when you see images in the clouds. The human brain tends to pick up on patterns according to an expected model. When it comes to research, this fallacy often happens during the data collection and analysis stages. Personally, I was always eager to graph my data and would plot each point as my experiments progressed. But in doing so, I inevitably saw trends that were, in the end, random.

This type of bias can be especially dangerous during data analysis if one becomes tied to a preconceived trend (more on that later).

But first, there are a couple of ways to counter this innate bias.

  • Develop an analysis plan before data collection. During the data analysis stage, researchers have a great deal of freedom to handle and trim data (to clean data, remove outliers, and include/exclude data). Planning and committing to an analysis protocol prior to starting the experiment can diminish the natural tendency to “see” patterns where they do not exist. Additionally, presenting this plan to other researchers and lab mates can be a way to hold yourself accountable. Currently, there are online databases – such as the open-source program, Open Science Framework  researchers can use to upload their study protocols and store data for public access. This program was created by the Centre for Open Science in order to increase reproducibility in research.
  • Consider blind data analysis. Blind data analysis is a method of temporarily hiding the real data from the researcher while they perform the usual steps of analysis such as plotting distributions, handling extremes, and applying statistical tests. The idea is similar to that of a double-blind study in clinical trials where neither the scientist nor participant knows which drug they are taking. In blind data analysis, the researcher conducts analysis on a perturbed version of the real data. Some methods to conduct a blind data analysis include having a computer or even colleague gently alter the data (adding random noise, replacing data labels with generic ones, or applying constants to different groups). By slightly changing the data, the pseudo dataset still retains important characteristics of the original, such as its outliers and variance. As a result, researchers can follow their original analysis plans and effectively mitigate bias.

2. (Hypothesis) Attachment Issues

expectations vs. hypothesis

Another problem that arises early on in a study is when researchers become attached to an expected outcome or hypothesis – and attachment issues are never healthy. When an outcome is already expected (even believed) to be true, any evidence that supports it may go unscrutinized, while alternative hypotheses are ignored and forgotten. For instance, literature research often provides the researcher with a general idea of what other experts agree on. After days or even weeks of reading other studies, you will gain a sense of what outcome agrees with the overall consensus. Subsequently you may already form ideas about how your data will turn out and forget to consider other outcomes. While hypotheses development is important, attachment to a particular result can be dangerous and lead to biased data analysis. 

  • Resist the inclination to “p-hack.” As mentioned previously, researchers have a lot of flexibility when it comes to data analysis. It has even been argued that almost any dataset can be presented as statistically significant, either by cherry-picking data, choosing which measurements to use for analysis, post-hoc tests, or even switching to non-parametric tests. A commonly reported practice is “p-hacking.” The p-value is defined as the probability of a result occurring due to chance. In life sciences, the cutoff for a significant result is usually set at p<0.05, while a p>0.05 is insignificant (and probably more difficult to publish). Desire for a significant result can lead to p-hacking, in which researchers play with the degrees of freedom until they reach a significance of p<0.05.
  • Don’t selectively report. For researchers, it is critical to report the whole truth if we expect to have reproducible results. Even though fabrication is not common, having stake in an outcome can lead to selective reporting which ultimately skews opinion and creates false leads. Selective reporting practices tend to highlight supporting evidence while obscuring unconvincing data. A simple, but common practice is the decision to report exact vs. non-exact p-values. For instance, a barely significant value of p=0.045 can also be reported as p<0.05, which may lead other researchers astray. Another example of selective reporting is when researchers only pursue studies if the preliminary results were in favor of their hypothesis.

3. Expect the Unexpected

This tip goes hand-in-hand with the previous. Just as researchers may experience confirmation bias, they can also fall victim to disconfirmation bias. This refers to the tendency to distrust and scrutinize unexpected outcomes (while easily accepting expected results.) Personally, I imagine that the majority of researchers, myself included, has experienced this self-doubt at least once. For example, if the literature generally supports one outcome, a deviation from the expected result can incite a panicked feeling of having made a mistake.

  • Don’t simply rationalize results. If you have ever taken a laboratory course and attempted to reproduce a classic experiment, it is likely that you experienced the dread of having a completely different outcome than the textbook dictates. Subsequently, the rest of your lab report’s discussion section is spent rationalizing about “human error.” Although most life science researchers are not testing established theories, new results that do not agree with published literature are often met with more inspection, even by the original researchers themselves. In efforts to rationalize insignificant results, researchers use appealing phrases to express that results were “almost significant” rather than offer an opposing argument and explanation. This results in a biased interpretation of the data and discourages further exploration.

4. Lastly, make your study reproducible.

transparency

A hurdle that researchers face when trying to repeat another study is they often lack the details to accurately do so. In 2013, The Reproducibility Project: Cancer Biology (RP:CP) set out to replicate 50 high-impact cancer biology papers, but now expects to only complete 18 due to a lack of detailed protocols and reagents. Even though it seems obvious, there is still an evident need for better and more transparent science communication. Here are a few ways to help make your study more reproducible.

  • Standardize lab protocols. As students come and go to work in a lab, copies of protocols may slowly deviate. Even minor differences in a protocol may lead to irreproducible results. For example, when a postdoc in my former lab left to accept a job offer, he did not leave updated copies of the proteomic protocols that were optimized for our study species. As a result, another PhD student spent weeks of unnecessary time, effort, and materials following outdated lab protocols that did not have optimized purification methods. Standardizing and keeping updated, detailed protocols can not only improve lab efficiency, but also serve to improve reproducibility.
  • Make use of collaborative platforms. Because of the reproducibility crisis, various online portals and platforms have been created in the past decade alone that allows for greater transparency and detailed record keeping. I have already mentioned the Open Science Framework where researchers can post their protocols and data analysis plans for public access. While most of these efforts have not yet gained much attention, it is time to be aware of these resources. For example, antibodies are extensively used in life science research to identify or isolate other molecules. Despite being one of the most commonly used tools, variability and cross-reactivity of antibodies are sources of concern in regard to reproducibility. To improve transparency, some platforms such as pAbmAbs and Antybuddy act as social recommendation services, allowing users to provide feedback on antibodies obtained from vendors. Other sites also serve to help researchers find services or techniques to validate their antibodies such as antibodypedia.com. At ABclonal, we try to do our part by providing user reviews, validation data, and links to publications using our reagents. If you are interested in using antibodies in your research,check out our catalogue.

Final Note

The reproducibility crisis has proven to be a sustained concern and source of debate. Failed attempts to reproduce important findings are not only raising doubts toward the scientific community, but also being used to oppose particular concepts such as climate change. The best way for scientists to improve this narrative is to be aware of inherit cognitive biases, actively work to counter them, and be as transparent as possible with reporting.

Tags: Science Communication, Reproducibility Crisis, reproducibility