3 Analysis

Sequence data has arrived. What next?

3.2 How to:

All data should be structured in the data/ folder as follows to make data findable.

data/
  | - deliveries
  | - raw-data
  | - outputs
  \ - frozen

data/deliveries contains read-only folders of deliveries from the sequencing centers.
data/raw-data contains subfolders that indicate the sequence data type, e.g. PacBio-Revio-WGS and within those folders are symlinks to data in subfolders of data/deliveries. Additional data downloaded from other locations should also be in subfolders and include a shell script that can redownload the file from the original location. e.g.,
```
#! /usr/bin/env bash
curl -O ftp://path/to/public/archive/file.ext
```
data/outputs contains the results from tools lauched in the analyses folders using the same label as the launching folder. e.g., analyses/01_assembly-workflow_initial-run_rackham puts results in data/outputs/01_assembly-workflow_initial-run_rackham.
data/frozen contains symlinks to folders in data/outputs which are stage end-points, e.g. the raw-reads have been processed in various ways, and after looking at QC controls, one folder is selected to be used for assembly. This is symlinked in frozen.

Data in data/raw-data/* should be symlinked from data/deliveries/**.

E.g.,

cd data/raw-data/PacBio-Hifi
find ../../deliveries -name "*.bam" -o -name "*.pbi" -exec ln -s {} . \;

and

cd data/raw-data/Illumina-HiC
find ../../deliveries -name "*.fastq.gz" -exec ln -s {} . \;

cd analyses/01_ebp-assembly-workflow

Update the assembly_parameters.yml with the paths to input files. Check the YML for a bash snippet to fill out the section.
Update the workflow_parameters.yml with any extra workflow parameters. In particular, check anything marked as TODO, e.g., selecting the mitochondrial code table to use.
Run the workflow.

./run_nextflow.sh

Note

The workflow above only runs until Hi-C mapping. Steps for manual curation onwards are still being implemented.

No protocols as of yet

No protocols as of yet

Custom analyses might be needed. In these cases please make use of the other project folders, and do your best to version control all the steps.

Put custom code in code/scripts, code/snakemake, and code/nextflow.
Make sure the code uses containers or conda environments to package the software environment.
Make an issue on the template to integrate the code into the template so that it’s shareable until it’s integrated into a workflow.
Make an issue on the relevant workflow to integrate the tools.

If you encounter any issues with using these protocols please ask on #vr-accessibility-ebp.