Variant calling index

List of exercises

Author

Per Unneberg

Published

18-Sep-2025

About

A generic variant calling workflow consists of the following basic steps:

read quality control and filtering
read mapping
removal / marking of duplicate reads
joint / sample-based variant calling and genotyping

There are different tweaks and additions to each of these steps, depending on application and method. The variant calling exercises here present the basic steps to go from raw data to variant calls.

The exercises are based on the Monkeyflowers dataset. Make sure to read the dataset document before running any commands as it will give you the biological background and general information about where to find and how to setup the data. We will focus on the red and yellow ecotypes in what follows.

Intended learning outcomes

Perform qc on sequencing reads and interpret results
Prepare reference for read mapping
Map reads to reference
Mark duplicates
Perform raw variant calling to generate a set of sites to exclude from recalibration
Perform base quality score recalibration
Perform variant calling on base recalibrated data
Do genotyping on all samples and combine results to a raw variant call set

Listing

Title	Description
Variant calling introduction	Introduction to variant calling and the command line interface.
Data quality control	Introduction to the command line interface. Preparation of data, raw data quality control and filtering for downstream analyses.
Read mapping and duplicate removal	Read mapping to reference sequence and removal of duplicate reads.
Variant calling workflow	Perform variant calling and genotyping. Introduction to workflow manager systems.

Additional material

Variant calling, long description: Describes all steps of a standard variant calling workflow from data preparation to final summary QC. All commands are run manually without the aid of a workflow manager. From earlier course round.