Overview
Teaching: 10 min
Exercises: 40 minQuestions
I have my datasets, what is the submission process?
Objectives
Explain the steps of submission to European Nucleotide Archive.
Do a submission using the Webin Submissions Portal
Prerequisites
This exercise requires a sample metadata spreadsheet previously created in the OpenRefine module,
ENA_samples_workshop_DM_practices.tsv
. Create a folder on your desktop, e.g.Desktop/dm-practices/data
and download the tsv file above by clicking on the link.You also need to create an account at European Nucleotide Archive:
- Go to ENA submit homepage
- Click on Register and fill in the required details
Do a submission to ENA
-
In this exercise you will do an interactive submission of your Study and Samples using the Webin submissions portal. There is also an optional exercise where you will submit your sequence data using the command line submission interface Webin-CLI, which enables automatic validation.
- Use the test submission site when you want to test, and the production site for real submissions:
- Test site: https://wwwdev.ebi.ac.uk/ena/submit/webin
- Production site: https://www.ebi.ac.uk/ena/submit/webin
- Note: The test service is restarted every night, any submissions made to the test service will be removed by the following day. Hence, do not start a test submission one day, and expect to continue the next day.
- Submission steps:
- Login to the Webin Submissions Portal
- Register study - Provide study level information
- Register sample(s) - Provide sample metadata
- Create manifest file(s) - envelope / metadata for sequence files
- Validate and upload manifest file(s) and sequence file(s))
1. Login to the Webin Submissions Portal
- Go to the test service: https://wwwdev.ebi.ac.uk/ena/submit/webin and log in with your Webin username and password.
-
The submissions portal allows for interactive submission, and is the recommended way for registration of your Study and Samples. To the left, in the top of the welcome page, there is a dashboard menu which will expand when you click on it.
2. Register study (project)
-
Click on the Dashboard menu and select Register Study (Project)
Picture
- Enter the following information
- Release date: 10-Oct-2022
- Short descriptive study title: SARS-CoV-2 genomic surveillance in Italy
- Study Name: SARS-CoV-2
- Detailed study abstract: RNA sequencing for genomic surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Italy.
- Click on Submit
-
Verify that the submission was successful in the pop-up Submission window, then click on Close
Solution
3. Register samples
- Click on the Dashboard menu and select Register Samples
Picture
-
This will lead to two options, either download a spreadsheet to register samples or the reverse, i.e., upload a filled spreadsheet. Typically you do not have a spreadsheet to begin with, but since we have produced one in the OpenRefine module, we can skip ahead and select the upload option.
Picture
-
Select the file
ENA_samples_workshop_DM_practices.tsv
from your computer (if you haven’t previously, download the file first from here). -
Click on Submit Completed Spreadsheet.
-
Verify that the submission was successful in the pop-up Submission window, then click on Close
Picture
Optional: Steps to submit samples from scratch
- From the Register Samples menu, select
Download spreadsheet to register samples
.- Select a checklist, for our purpose the
Pathogens Checklists
>ENA virus pathogen reporting standard checklist
is suitable.Picture
- Click on Recommended fields and remove the following fields:
virus identifier
,receipt date
,definition for seropositive sample
,serotype
,host habitat
,host behaviour
, andisolation source non-host-associated
.- Click on Next and then on Download TSV template.
- Open the template in your favorite text editor, fill in all sample information, and then upload the file via
Dashboard > Register Samples > Upload filled spreadsheet to register samples
.- Verify that the submission was successful in the pop-up Submission window, then click on Close.
4. (Optional) Prepare manifest file
-
A manifest file serves as an envelope for the sequence file, and contains the metadata describing how the sequencing was done.
- The manifest file has two columns separated by a tab (or any whitespace characters):
- Field name (first column): case insensitive field name
- Field value (second column): field value
- The following metadata fields are supported in the manifest file:
- STUDY: Study accession or unique name (alias)
- SAMPLE: Sample accession or unique name (alias)
- NAME: Unique experiment name
- PLATFORM: See permitted values. Not needed if INSTRUMENT is provided.
- INSTRUMENT: See permitted values
- INSERT_SIZE: Insert size for paired reads
- LIBRARY_NAME: Library name (optional)
- LIBRARY_SOURCE: See permitted values
- LIBRARY_SELECTION: See permitted values
- LIBRARY_STRATEGY: See permitted values
- DESCRIPTION: free text library description (optional)
- The following file name fields are supported in the manifest file:
- BAM: Single BAM file
- CRAM: Single CRAM file
- FASTQ: Single fastq file
- In a text editor such as Notepad, create a manifest file for the NEBNext_OAS_12 sequence file and put it in the data folder (e.g. Desktop/dm-practices/data/NEBNext_OAS_12_manifest.txt), write the following information:
STUDY
SAMPLE
INSTRUMENT NextSeq 500
LIBRARY_SOURCE VIRAL RNA
LIBRARY_SELECTION size fractionation
LIBRARY_STRATEGY AMPLICON
FASTQ NEBNext_OAS_12.fastq.gzThe field values for STUDY and SAMPLE needs to be collected from your submission:
- In the browser, where you submitted the study and samples, go to the Dashboard menu and click on the
Studies Report
Picture Dashboard Studies Report
- Copy the accession number (starting with PRJEB) into the manifest file as the STUDY field value.
- Go back to the Dashboard menu and click on the
Samples report
Picture Dashboard Samples Report
- Locate the accession number (starting with ERS) for NEBNext_OAS_12 and copy this into the manifest file as the SAMPLE field value.
- In the browser, where you submitted the study and samples, go to the Dashboard menu and click on the
- Save the manifest file.
Solution
See example manifest file NEBNext_OAS_12_manifest.txt but note that STUDY and SAMPLE have no field values since this is unique to each study and sample submission and needs to be entered manually.
5. (Optional) Validate and submit the manifest file and the sequence file
Prerequisites
- A sequence file is required: NEBNext_OAS_12.fastq.gz, download it to you data folder (e.g. Desktop/dm-practices/data/). In order to submit sequence file(s), and accompanying metadata regarding the sequencing, you must install Webin-CLI:
- Webin-CLI requires that you have Java installed before you can run it. You should have version 1.8 or newer, which can be downloaded from Java.
- You will also need to download and install a Java Runtime Environment (JRE) and we recommend Zulu Open JDK available at https://www.azul.com/downloads/?package=jdk
- In the folder where you created the data subfolder, create another subfolder named
prg
, (e.g.Desktop/dm-practices/prg
).- Download Webin-CLI Java jar file from its GitHub repository, and put it in the
prg
subfolder.
Open the Command Prompt window and go to the folder dm-practices
on your Desktop using the command cd Desktop\dm-practices\
.
- Note for Mac users: Please use forward slash, i.e. write the command
cd Desktop/dm-practices/
In order to use Webin-CLI we need to give instructions on who we are and what we want to do. This is done using options
:
-
In order to see which options are available, type the following command in the Command Prompt window
Windows:
java -jar prg\webin-cli-4.1.0.jar -help
Mac:java -jar prg/webin-cli-4.1.0.jar -help
Out of these, we will make use of the following options:
-context
: the submission type (genome, transcriptome, sequence or reads)
-userName
: the Webin submission account name.
-password
: the Webin submission account password.
-manifest
: the manifest file name.
-outputDir
: directory for output files.
-inputDir
: input directory for files declared in manifest file.
-validate
: validates the files defined in the manifest file.
-submit
: validates and submits the files defined in the manifest file.
-test
: use Webin test service instead of the production service.
Please note that the Webin upload area is shared between test and production services, and that test submission files will not be archived.
For more details about the command line options of Webin-CLI, please follow this link: https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html#command-line-options.
First we will do a validation (-validate option) to the test server (-test option), using data
folder as both input and output directory.
-
Copy or type the command below, and replace the
Webin-XXX
andmyPassword
with your own webin username and password, respectively:Windows:
java -jar prg\webin-cli-4.1.0.jar -context reads -userName Webin-XXXXX -password myPassword -manifest NEBNext_OAS_12_manifest.txt -outputDir data -inputDir data -validate -test
Mac:
java -jar prg/webin-cli-4.1.0.jar -context reads -userName Webin-XXXXX -password myPassword -manifest NEBNext_OAS_12_manifest.txt -outputDir data -inputDir data -validate -test
- Press Enter
- If the validation is successful the last output row will read:
INFO : The submission has been validated successfully.
When the validation is successful, it is time to do a submit instead of a validation (-submit option instead of -validate). We will still use the test server, and the same input and output directories as for the validation step. Try to write the command yourself or take a peak at the solution below.
Solution
Windows:
java -jar prg\webin-cli-4.1.0.jar -context reads -userName Webin-XXXXX -password myPassword -manifest NEBNext_OAS_12_manifest.txt -outputDir data -inputDir data -submit -test
Mac:
java -jar prg/webin-cli-4.1.0.jar -context reads -userName Webin-XXXXX -password myPassword -manifest NEBNext_OAS_12_manifest.txt -outputDir data -inputDir data -submit -test
The processing will take a while but since the validation was successful, it is only the upload of the sequence file that might misfire. Well done!
ENA training material
References
The data used in this exercise has been extracted from a project submitted to ENA: https://www.ebi.ac.uk/ena/browser/view/PRJEB42601
Key Points
When in doubt on how to submit, go to the test submission site and do a test submission: https://wwwdev.ebi.ac.uk/ena/submit/webin
The submission process includes registration of study and samples via the browser. The sequences and their metadata is submitted via Webin-CLI, a command-line tool that allows for validation.
The whole process of submission, from file upload to receiving an accession number takes time. Do not do this late in the project, when publishers require that you publish datasets before review and deadline is 24 hours.
If you ever are stuck, contact us data stewards at NBIS by sending an email to data@nbis.se or ask for a consultation via our homepage.