Introduction
29-Oct-2024
Good practices for working with data
How to use the version control system Git to track changes to code
How to use the package and environment manager Conda
How to use the workflow managers Snakemake and Nextflow
How to generate automated reports using Quarto and Jupyter
How to use Docker and Apptainer to distribute containerised computational environments
National Bioinformatics Infrastructure Sweden
The Reproducibility project set out to replicate 100 experiments published in high-impact psychology journals. 1
About one-half to two-thirds of the original findings could not be observed in the replication study.
A survey in Nature revealed that irreproducible experiments are a problem across all domains of science.1
Medicine is among the most affected research fields. A study in Nature found that 47 out of 53 medical research papers focused on cancer research were irreproducible.1
Replication of 18 articles on microarray-based experiments published in Nature Genetics in 2005 & 20061
Replication of 18 articles on microarray-based experiments published in Nature Genetics in 2005 & 20061
The results of only 26% out of 204 randomly selected papers in the journal Science could be reproduced. 1
“Many journals are revising author guidelines to include data and code availability.”
“(…) an improvement over no policy, but currently insufficient for reproducibility.”
There are many so-called excuses not to work reproducibly:
“Thank you for your interest in our paper. For the [redacted] calculations I used my own code, and there is no public version of this code, which could be downloaded. Since this code is not very user-friendly and is under constant development I prefer not to share this code.”
“We do not typically share our internal data or code with people outside our collaboration.”
“When you approach a PI for the source codes and raw data, you better explain who you are, whom you work for, why you need the data and what you are going to do with it.”
“I have to say that this is a very unusual request without any explanation! Please ask your supervisor to send me an email with a detailed, and I mean detailed, explanation.”
Data | |||
Same | Different | ||
Code | Same | Reproducible | Replicable |
Different | Robust | Generalisable |
“Why call the course Reproducible Research, when it could just as well be called Research?”
- Niclas Jareborg, NBIS data management expert
Decent:
Good:
Great:
.csv
rather than .xls
.Decent:
Good:
Great:
Decent:
Good:
Great:
Before the project:
During the project:
After the project: