Course Materials
As the focus of the course is on hands-on work, the topics have been designed to cover the fundamental analyses that are common in many population genomics studies. The course consists of lectures and exercises, with a focus on the practical aspects of analyses. Whereas lectures introduce some background theory, their primary aim is to set the stage for accompanying exercises.
The manuscript route
In principle, you could imagine the course structure to follow that of a manuscript (Fuller et al., 2020)
High-throughput DNA sequencing has now made it possible to generate whole-genome resequencing data for multiple individuals and populations, and a first step is to map sequence data to a reference, perform variant calling and variant filtering.
Once a high-quality variant set has been obtained, a common task is to describe variation, either in terms of summary statistics such as nucleotide diversity (\(\pi\)) or site-frequency spectra (sfs), or as descriptions of population structure in terms of admixture or pca plots.
Genetic diversity is also affected by population history and demographic processes such as population expansion, bottlenecks, migration events and hybridizations.
Finally, it is often of interest to identify adaptive traits, to which end selection tests and scans can be performed. The tests are designed to detect signals of selection, either via direct selection on loci, or by looking at haplotype structures to detect linked selection.
The baseline model
Much of what has been described in The manuscript route has recently been treated in an article on recommendations for improving statistical inference in population genomics (Johri et al., 2022). In it, the authors point out that whereas historically theoretical advances outpaced data production, that is no longer true due to the advent of next-generation sequencing. In particular, they caution researchers to attach too much faith to a test that explains the data well, as there are many alternative hypotheses with equal explanatory power, but with drastically different conclusions. At the very least, a population genomics study should aim at first generating a baseline model consisting of all or several of the following components:
- mutation
- recombination
- gene conversion
- purifying selection acting on functional regions and its effects on linked variants (background selection)
- genetic drift with demographic history and geographic structure
The exercises are designed to address many of the points above, and to highlight cases where competing hypotheses may actually explain data to equal degrees.