Introduction to repository submission

Licenced under CC-BY 4.0 and OSI-approved licenses, see licensing.

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • Why submit my data to a repository?

  • What is data?

  • How do I find a suitable repository?

Objectives
  • Explain why data should be publicly available and what the term data means.

  • Explain different types of repositories and how to find a suitable one.

Why submit your datasets to a repository?

Why should I share my data?

  • Open Science & FAIR
  • Reproducibility
  • Trail of evidence
  • 3rd party access
  • Archival
  • Publication of paper requires it

What is data?

There are different types of data

  • Raw: straight from the instrument eg fastq, bam, cram
  • Processed: normalization, removal of outliers, expression measurements, statistics
  • Metadata: minimum information to reproduce the data, sample information, precise protocols

How to find a suitable repository

Types of repositories

  • Domain specific:
    • Best choice if suitable, long-term plan, typically free of charge, maximum reach.
    • E.g. ENA, ArrayExpress, PRIDE
  • General purpose:
    • Second best, long-term plan, might cost (now or in future), good reach but less specific in metadata → more difficult for future users to judge if a dataset will be useful
    • E.g. Zenodo, Figshare, Dryad
  • In house/institutional
    • For archive/backup purpose mainly, might cost, limited reach unless also published in data catalogue

How find a domain specific repository?

Key Points

  • Benefits of sharing data are several e.g. reproducibility purposes, follow the Open Science directive, meet requirement from publishers.

  • There are different types of data e.g. raw, processed and metadata.

  • If possible, use a domain-specific repository since it has maximum reach in the research community.