Project standard operating procedures

Author

Mahesh Binzer-Panchal

Published

February 22, 2024

1 Protocols

Here are the standard operating procedures to follow when performing a genome assembly, annotation, and/or further analysis.

1.1 Why do we need these protocols?

  • To make data findable - (strict folder structure)
  • Ease project tracking - (git)
  • Reduce workload - (automation, code sharing)
  • Reproducibility - (workflows, notebooks, git, documentation, containers, interoperability)
  • Documentation - (reporting, summaries, issue tracking)

1.2 Quick Start

  • Make a private Project repository from this template repository on Github.

    1. Click the green Use this template button on Github in the upper right corner.
    2. Check NBISweden/assembly-project-template is selected as Repository template.
    3. Check Owner is NBISweden.
    4. Provide a repository name following <project>-<species>-<year>-<short_description> where
      • <project>:
        • VREBP: For VR-EBP projects
        • ERGA: For ERGA projects
        • BGE: For BGE projects
        • SMS: For NBIS short term projects
      • <species>: Species name
      • <year>: Year project started
      • <short_description>: Short project description.
    5. Ensure repository is private, then click Create repository.
  • Clone it into the NAISS Storage project.

    cd /proj/snic2021-6-194
    git clone git@github.com:NBISweden/<repo>.git 
  • Update README in the repository with project details.

  • Add references to references.bib of important information.

  • Copy NGI deliveries to data folder.

  • Link relevant raw data in data/raw-data.

  • Update assembly_parameters.yml to point to files in data/raw-data.

  • Run analyses, activating any necessary compute environments.

  • Refer to the other pages here for more in-depth descriptions of the protocols.

The template provides an organised folder structure, and skeleton files to quickly start analyzing.

Analyses are primarily run on Uppmax. Github is used as the primary repository, and analysis files should be tracked and pushed regularly.