Project standard operating procedures
1 Running assembly projects
If you’re new to these protocols, please see the onboarding material first.
1.1 Quick Start
Make a private Project repository from this template repository on Github.
- Click the green
Use this template
button on Github in the upper right corner. - Check
NBISweden/assembly-project-template
is selected asRepository template
. - Check
Owner
isNBISweden
. - Provide a repository name following
<project>-<species>-<year>-<short_description>
where<project>
:VREBP
: For VR-EBP projectsERGA
: For ERGA projectsBGE
: For BGE projectsSMS
: For NBIS user-fee projectsLTS
: For NBIS peer-review projects
<species>
: Species name<year>
: Year project started<short_description>
: Short project description.
- Ensure repository is private, then click Create repository.
- Click the green
Clone it into the NAISS Storage project or your folder on NAC.
cd <project allocation> git clone git@github.com:NBISweden/<repo>.git
Update README in the repository with project details.
Add references to references.bib of important information.
Copy NGI deliveries to data folder (see launch page).
Link relevant raw data in
data/raw-data
.Update
assembly_parameters.yml
to point to files indata/raw-data
.Run analyses (
./run_nextflow.sh
)Refer to the other pages here for more in-depth descriptions of the protocols.
The template provides an organised folder structure, and skeleton files to quickly start analyzing.
Analyses are primarily run on Uppmax or PDC. Github is used as the primary repository, and analysis files should be tracked and pushed regularly.
1.2 Running a test assembly analysis
Follow the steps above to make a repository for a test species. If you would like to use real data then feel free to use Laetiporus sulphureus (Chicken of the Woods).
From the Data tab, download the bam file for PacBio HiFi into the deliveries folder:
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR680/ERR6808041/m64229e_210602_121910.ccs.bc1020_BAK8B_OA--bc1020_BAK8B_OA.bam
and the FastQ files for HiC (Arima v2) into the deliveries folder:
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_2.fastq.gz
Symlink the files into appropriate folders under raw-data
.
Then edit the assembly_parameters.yml
to point to the data linked under raw-data
, using the bash snippets in the assembly_parameters.yml
to help you write the input file.
Update the workflow_parameters.yml
and change the mitohifi.code
parameter to 4 (see NCBI Taxonomy Browser).
Finally, open a screen
session and then run the launch script (./run_nextflow.sh
).