Data

Short descriptions of the datasets used in the tutorials.

1 Covid-19 data

The data we are using in the first 6 tutorials is 10x data of peripheral blood mononuclear cells (PBMCs) from Covid patients and healthy controls from the paper “Immunophenotyping of COVID-19 and influenza highlights the role of type I interferons in development of severe COVID-19” in Science.

A peripheral blood mononuclear cell (PBMC) is any peripheral blood cell having a round nucleus. These cells consist of lymphocytes (T cells, B cells, NK cells), monocytes and dendritic cells, whereas erythrocytes and platelets have no nuclei, and granulocytes (neutrophils, basophils, and eosinophils) have multi-lobed nuclei.

Data was downloaded from GEO GSE149689 entry. For the tutorials we have selected 4 of the severe patients and 4 controls from that dataset. Each donor was then downsampled to 1500 cells per individual just to speed up the processing times in the labs. The script used to select samples and downsampling can be found in our GitHub repo.

2 Hematopoesis data

In the trajectory exercise we continue with immune cells but to get the full development of the different lineages we need to have bone marrow data. The dataset we are using is an integrated object with bone marrow data from multiple studies.

The data was integrated with Harmony and saved as a Seurat object. We already have subsetted the dataset (with 6688 cells and 3585 genes). In addition there was some manual filtering done to remove clusters that are disconnected and cells that are hard to cluster, which can be seen in this script

3 Spatial transcriptomics data

For the spatial transcriptomics tutorial we are using public Visium data from the 10x website that has been included in the data resources for the Seurat and Scanpy packages. We are using tow sections of the mouse brain (Sagittal).

The single cell data that we are using for mapping of celltypes onto the spatial data is a mouse cortex dataset from Allen brain institute.