Monkeyflowers dataset

Speciation genomics and simulation

Author

Per Unneberg

Published

15-Nov-2023

The allure of monkeyflowers (DOI: 10.1126/science.365.6456.854)

The monkeyflower model system

Monkeyflowers (Mimulus) have recently become a key model in evolution and plant biology (Pennisi, 2019). The monkeyflower system consists of 160–200 species that display an amazing phenotypic variation. The genome is small, only 207Mbp, which makes it an ideal candidate for genomics - and for computer exercises!

The monkeyflower genomic landscape

Recently, Stankowski et al. (2019) used the monkeyflower system to investigate what forces affect the genomic landscape. Burri (2017) has suggested that background selection (BGS) is one of the main causes for correlations between genomic landscapes, and that one way to study this phenomenon is to look at closely related taxa. This is one of the objectives of the Stankowski et al. (2019) paper.

They performed whole-genome resequencing of 37 individuals from 7 subspecies and 2 ecotypes of Mimulus aurantiacus and its sister taxon M. clevelandii (Figure 1), all sampled in California (Figure 2).

Figure 1: Evolutionary relationships across the radiation

Figure 2: Sampling locations

Genomewide statistics, such as diversity (\(\pi\)), divergence (\(d_{XY}\)) and differentiation \(F_{ST}\), were calculated within and between taxa to generate genomic diversity landscapes. The landscapes were highly similar across taxa, and local variation in genomic features, such as gene density and recombination rate, was predictive of variation in landscape patterns. These features suggest the influence of selection, in particular BGS.

Although many characteristics were predicted by a model where BGS is one of the main causes, there were deviations. Therefore, the authors performed simulations in SLiM (Haller & Messer, 2019) with alternative models to see whether other factors could explain the observed patterns.

In all, six scenarios were studied:

  1. neutral evolution
  2. BGS (non-neutral mutations are deleterious)
  3. Bateson-Dobzhansky-Muller incompatibility (BDMI); after split, a fraction variants deleterious in one population, neutral in other
  4. positive selection
  5. BGS and positive selection
  6. local adaptation; as 4 but also after split some variants are beneficial in one population, neutral in other

Figure 3 shows typical results of the simulations.

Figure 3: Genomic landscapes simulated under different divergence histories.

In conclusion, the authors found that although BGS plays a role, it does not sufficiently explain all observations, and that other aspects of natural selection (such as rapid adaptation) are responsible for the similarities between genomic landscapes.

A locus that previously had been associated with differentiation of red and yellow ecotypes was investigated in more detail. The locus is located on linkage group 4 (LG4), and we will be using both a 3Mbp region of interest (ROI) surronding the locus, and the whole linkage group, for different exercises.

Data

The dataset consists of 37 samples (see Table 1 for example information). Raw sequence reads were downloaded from Sequence Read Archive (SRA), bioproject PRJNA549183 and mapped to the reference sequence M_aurantiacus_v1_splitline_ordered.fasta. Reads that mapped to the ROI were extracted and constitute the sequence data that will be used during the exercises.

Table 1: Example of monkeyflower samples. See file sampleinfo.csv in data repository for full listing.
Sample Run ScientificName SampleName Taxon Latitude Longitude
SRS4979271 SRR9309782 Diplacus longiflorus LON-T33_1 ssp. longiflorus 34.3438 -118.5099
SRS4979267 SRR9309785 Diplacus longiflorus LON-T8_8 ssp. longiflorus 34.1347 -118.6452
SRS4979269 SRR9309784 Erythranthe parviflora PAR-KK161 ssp. parviflorus 34.0180 -119.6730
SRS4979266 SRR9309787 Erythranthe parviflora PAR-KK168 ssp. parviflorus 34.0180 -119.6730
SRS4979268 SRR9309786 Erythranthe parviflora PAR-KK180 ssp. parviflorus 34.0180 -119.6730
SRS4979265 SRR9309789 Erythranthe parviflora PAR-KK182 ssp. parviflorus 34.0193 -119.6802

UPPMAX data storage

The monkeyflower dataset is located in UPPMAX project naiss2023-22-1084 at /proj/naiss2023-22-1084/webexport/monkeyflower. In addition to local access, data can be accessed remotely through https://export.uppmax.uu.se/naiss2023-22-1084.

Github

The github repository pgip-data contains reference sequence and read data for 37 monkeyflower individuals for the region LG4:12,000,000-12,100,000. The data resides in the data/monkeyflower/tiny subdirectory. This data set is used as input data to render the website.

The repository hosts a Snakemake workflow to generate all data needed for the exercises.

References

Burri, R. (2017). Interpreting differentiation landscapes in the light of long-term linked selection. Evolution Letters, 1(3), 118–131. https://doi.org/10.1002/evl3.14
Haller, B. C., & Messer, P. W. (2019). SLiM 3: Forward Genetic Simulations Beyond the Wright. Molecular Biology and Evolution, 36(3), 632–637. https://doi.org/10.1093/molbev/msy228
Pennisi, E. (2019). The allure of monkeyflowers. Science, 365(6456), 854–857. https://doi.org/10.1126/science.365.6456.854
Stankowski, S., Chase, M. A., Fuiten, A. M., Rodrigues, M. F., Ralph, P. L., & Streisfeld, M. A. (2019). Widespread selection and gene flow shape the genomic landscape during a radiation of monkeyflowers. PLOS Biology, 17(7), e3000391. https://doi.org/10.1371/journal.pbio.3000391