Overview
Teaching: 5 min
Exercises: 0 minQuestions
Why submit my data to a repository?
What is data?
How do I find a suitable repository?
Objectives
Explain why data should be publicly available and what the term data means.
Explain different types of repositories and how to find a suitable one.
Why submit your datasets to a repository?
Why should I share my data?
- Open Science & FAIR
- Reproducibility
- Trail of evidence
- 3rd party access
- Archival
- Publication of paper requires it
What is data?
There are different types of data
- Raw: straight from the instrument eg fastq, bam, cram
- Processed: normalization, removal of outliers, expression measurements, statistics
- Metadata: minimum information to reproduce the data, sample information, precise protocols
How to find a suitable repository
Types of repositories
- Domain specific:
- Best choice if suitable, long-term plan, typically free of charge, maximum reach.
- E.g. ENA, ArrayExpress, PRIDE
- General purpose:
- Second best, long-term plan, might cost (now or in future), good reach but less specific in metadata → more difficult for future users to judge if a dataset will be useful
- E.g. Zenodo, Figshare, Dryad
- In house/institutional
- For archive/backup purpose mainly, might cost, limited reach unless also published in data catalogue
How find a domain specific repository?
- EBI wizard - guide depending on data type
- ELIXIR deposition databases - core resources with long-term data preservation and accessibility plans
- FAIRsharing.org/databases - catalogue of many repositories, with possibility to filter on e.g. domain.
Key Points
Benefits of sharing data are several e.g. reproducibility purposes, follow the Open Science directive, meet requirement from publishers.
There are different types of data e.g. raw, processed and metadata.
If possible, use a domain-specific repository since it has maximum reach in the research community.