Overview
Teaching: 10 min
Exercises: 0 minQuestions
What is ENA?
What do I submit to ENA?
How can I submit?
Objectives
Explain the parts of the data model: Study, Sample, Experiment, Run, Analysis.
Explain the different types of submissions.
European Nucleotide Archive (ENA)
The ENA is a repository providing submission of, and access to, annotated DNA and RNA sequences.
It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects.
Submissions are represented using a number of different metadata objects. Before submitting data to ENA, it is important to familiarise yourself with the ENA metadata model and what parts of your research project can be represented by which metadata objects. This will determine what you need to submit.
-
For example, a publication is typically associated with a study (project), sequenced source material is represented using samples, and sequencing experiment details are captured by the experiment object.
-
Note that data files are also submitted by associating them with metadata objects. Sequence read data is associated with run objects while other data files are associated with analysis objects.
The full metadata model with relationships between the different types of objects is illustrated below.
-
Study: A study (project) groups together data submitted to the archive and controls its release date. A study accession is typically used when citing data submitted to ENA. Note that all associated data and other objects are made public when the study is released.
-
Sample: A sample contains information about the sequenced source material. Samples are associated with checklists, which define the fields used to annotate the samples. Samples are always associated with a taxon.
-
Experiment: An experiment contains information about a sequencing experiment including library and instrument details.
-
Run: A run is part of an experiment and refers to data files containing sequence reads.
-
Analysis: An analysis contains secondary analysis results derived from sequence reads (e.g. a genome assembly),
-
Submission: A submission contains submission actions to be performed by the archive. A submission can add more objects to the archive, update already submitted objects or make objects publicly available.
What to submit
-
Study - place-holder for the project; needs to be done first
-
Sample - place-holder for the biomaterial information; second thing to do
-
Raw reads - both the sequencing information (Experiment) and the data files (Run); last thing to do
Ways to submit
- Interactive - using browser
- Webin-CLI - command-line submission interface
- Raw reads only
- validate, upload and submit in a single step
- write a manifest file
- Programmatic submission - XML document submitted using cURL
Key Points
European Nucleotide Archive (ENA) stores annotated DNA and RNA sequences.
Submissions are represented using a number of different metadata objects.
Study, sample and raw reads are the objects to be submitted.
Submissions can be done via browser, command-line interface (raw reads only) or programmatically.