Gets converted to format of choice. Original files (and conversion settings) are lost
Hard-coded in various analysis scripts
First submission
Mailed back and forth between collaborators in ever-changing (but nicely coloured) Excel sheets
Review
Leads a quiet life on the HPC cluster, until the project expires and the data has to be urgently retrieved
Second submission
Ends its days on an external hard drive on the researcher’s desk
Reformatted and included as PDF in the supplementary
Publication
“Data available upon request”
FAIR data
Strive to make your data FAIR1 for both machines and humans:
Findable
Accessible
Interoperable
Reusable
Data management plan
Check requirements of funding agency and field of research 1
Determine required storage space for short and long term
Provide helpful metadata
Consider legal/ethical restrictions if working with sensitive data
Find suitable data repositories
Strive towards uploading data to its final destination at the beginning of a project
Data sharing
Why Open Access?
Publicly funded research should be unrestricted
Published results should be verifiable by others
Enables other to build upon previous work
Organising your projects
Which sample file represents the most up to date version?
$ ls -l data/-rw-r--r-- user staff Nov 12 22:00 samples.tsv-rw-r--r-- user staff Nov 16 11:39 samplesFinal.tsv-rw-r--r-- user staff Nov 18 22:41 samplesFinalV2.tsv-rw-r--r-- user staff Nov 18 13:25 samplesUSE_THIS_ONE.tsv-rw-r--r-- user staff Nov 15 22:39 samplesV2.tsv
The project directory
The first step towards working reproducibly: Get organised!
Divide your work into distinct projects
Keep all files needed to go from raw data to final results in a dedicated directory
Use relevant subdirectories
There are many ways to organise a project
A simple but effective example is the following:
code/ Code needed to go from input files to final resultsdata/ Raw data - this should never editeddoc/ Documentation of the projectenv/ Environment-related files, e.g. Conda environments or Dockerfilesresults/ Output from workflows and analysesREADME.md Project description and instructions