3 Analysis
Sequence data has arrived. What next?
3.1 Checklist
3.2 How to:
Structure data
All data should be structured in the data/
folder as follows to make data findable.
data/
| - deliveries
| - raw-data
| - outputs \ - frozen
data/deliveries
contains read-only folders of deliveries from the sequencing centers.data/raw-data
contains subfolders that indicate the sequence data type, e.g.PacBio-Revio-WGS
and within those folders are symlinks to data in subfolders ofdata/deliveries
. Additional data downloaded from other locations should also be in subfolders and include a shell script that can redownload the file from the original location. e.g.,#! /usr/bin/env bash curl -O ftp://path/to/public/archive/file.ext
data/outputs
contains the results from tools lauched in the analyses folders using the same label as the launching folder. e.g.,analyses/01_assembly-workflow_initial-run_rackham
puts results indata/outputs/01_assembly-workflow_initial-run_rackham
.data/frozen
contains symlinks to folders indata/outputs
which are stage end-points, e.g. the raw-reads have been processed in various ways, and after looking at QC controls, one folder is selected to be used for assembly. This is symlinked in frozen.
- Make a translation table in data/raw-data linking the NGI delivery files to the data we’re going to use. Need a way to mark bad data.
Assemble sequence data
Annotate assemblies
Perform downstream analyses
Integrate new analyses
- Need a protocol to integrate custom scripts into template while it’s not integrated into the workflow.
- Put custom code in
code/scripts
,code/snakemake
,code/nextflow
, and launch scripts undercode/launch_templates
. - Make sure the code uses containers or conda environments to package the software environment.
- Make an issue on the template to integrate the code into the template so that it’s shareable until it’s integrated into a workflow.
- Make an issue on the relevant workflow to integrate the tools.
Troubleshoot
- Need a protocol for troubleshooting. Who to ask