<- "./data/covid/"
path_covid if (!dir.exists(path_covid)) dir.create(path_covid, recursive = T)
download.file("https://lu.box.com/shared/static/205zwfegtinm49yuelbvxawln6r3wyjd", destfile = file.path(path_covid, "seurat_covid_raw.rds"))
download.file("https://lu.box.com/shared/static/nf5eps2uus9qcjx0ml0n313p9a8r1vb7", destfile = file.path(path_covid, "seurat_covid_qc_dr_clst.rds"))
# If downloading via R terminal fails, you can download the images directly using the following links:
# seurat_covid_raw.rds: https://lu.box.com/s/205zwfegtinm49yuelbvxawln6r3wyjd
# seurat_covid_qc_dr_clst.rds: https://lu.box.com/s/nf5eps2uus9qcjx0ml0n313p9a8r1vb7
The visualization tutorila here is inspired from the NBIS workshop on Single-cell RNA-seq Analysis. The visualization tutorial her eare part of the single-cell workshop as well.
1 Get data
In this tutorial, we will run all tutorials with a set of 8 PBMC 10x datasets from 4 covid-19 patients and 4 healthy controls, the samples have been subsampled to 1500 cells per sample. We can start by defining our paths.
With data in place, now we can start loading libraries we will use in this tutorial.
suppressPackageStartupMessages({
library(Seurat)
library(Matrix)
library(ggplot2)
library(patchwork)
})<- readRDS("./data/covid/seurat_covid_raw.rds") seurat_covid_raw
All the eight different datasets mentioned above are already merged into a single Seurat
object. The two files you have downloaded are explained below:
seurat_covid_raw.rds
: Contains all the data in raw format. There was no pre-processing of the data done here.seurat_covid_qc_dr_clst.rds
: In this file, the cells and the genes went through a different QC proces, followed byDimentionality Reduction
andClustering
.
2 Raw data
Let us just take a look at the raw-data to satrt with. Here is how to take a look at the count matrix and the metadata for every cell.
# rna counts matrix
"RNA"]]$counts[1:10, 1:4]
covid_raw[[
# metadat of the cells
head(covid_raw@meta.data, 10)
When you look at the metadata you can see the rownames
are ids of each cell. Each of the column is explained below.
- orig.ident: represents the patient id.
- nCount_RNA: the total number of molecules detected within a cell
- nFeature_RNA: the number of genes detected in each cell
- type: If the sample is
covid
orcontrol
- percent_mito: Mitochondrial gene content in percent
- percent_ribo: Ribosomal gene content in percent
- percent_hb: Percentage hemoglobin genes
- percent_plat: Percentage for some platelet markers
Thee values are pre-calculated to help assist with the visualization.
2.1 Plot QC
To plot the raw QC of the datset, we will use the function VlnPlot()
. This is a function from the Seurat
package and it basically generates a ggplot2
object from the Seurat object. Violin plots are a different way of looking at boxplot
.
<- c("nFeature_RNA", "nCount_RNA", "percent_mito", "percent_ribo", "percent_hb", "percent_plat")
feats VlnPlot(covid_raw, group.by = "orig.ident", split.by = "type", features = feats, pt.size = 0.1, ncol = 3)