Project standard operating procedures

Author

Mahesh Binzer-Panchal

Published

December 13, 2024

1 Running assembly projects

If you’re new to these protocols, please see the onboarding material first.

1.1 Quick Start

  • Make a private Project repository from this template repository on Github.

    1. Click the green Use this template button on Github in the upper right corner.
    2. Check NBISweden/assembly-project-template is selected as Repository template.
    3. Check Owner is NBISweden.
    4. Provide a repository name following <project>-<species>-<year>-<short_description> where
      • <project>:
        • VREBP: For VR-EBP projects
        • ERGA: For ERGA projects
        • BGE: For BGE projects
        • SMS: For NBIS user-fee projects
        • LTS: For NBIS peer-review projects
      • <species>: Species name
      • <year>: Year project started
      • <short_description>: Short project description.
    5. Ensure repository is private, then click Create repository.
  • Clone it into the NAISS Storage project or your folder on NAC.

    cd <project allocation>
    git clone git@github.com:NBISweden/<repo>.git 
  • Update README in the repository with project details.

  • Add references to references.bib of important information.

  • Copy NGI deliveries to data folder (see launch page).

  • Link relevant raw data in data/raw-data.

  • Update assembly_parameters.yml to point to files in data/raw-data.

  • Run analyses (./run_nextflow.sh)

  • Refer to the other pages here for more in-depth descriptions of the protocols.

The template provides an organised folder structure, and skeleton files to quickly start analyzing.

Analyses are primarily run on Uppmax or PDC. Github is used as the primary repository, and analysis files should be tracked and pushed regularly.

1.2 Running a test assembly analysis

Follow the steps above to make a repository for a test species. If you would like to use real data then feel free to use Laetiporus sulphureus (Chicken of the Woods).

From the Data tab, download the bam file for PacBio HiFi into the deliveries folder:

wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR680/ERR6808041/m64229e_210602_121910.ccs.bc1020_BAK8B_OA--bc1020_BAK8B_OA.bam

and the FastQ files for HiC (Arima v2) into the deliveries folder:

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_2.fastq.gz

Symlink the files into appropriate folders under raw-data.

Then edit the assembly_parameters.yml to point to the data linked under raw-data, using the bash snippets in the assembly_parameters.yml to help you write the input file.

Update the workflow_parameters.yml and change the mitohifi.code parameter to 4 (see NCBI Taxonomy Browser).

Finally, open a screen session and then run the launch script (./run_nextflow.sh).