Input files with qc-stats were: data/sum/qc_summary.csv
Sampleinfo file: /proj/b2017179/private/scRNAseq/meta/Meta_data_norun1.csv
Gene expression file: data/sum/merge.rpkmforgenes_rpkm.txt
The file /proj/b2017179/private/analysis/s_petropoulos_1701/source/qc_settings.yaml
specifies settings:
Filtering criteria are:
Distribution within batches are be plotted for: Embryo, Ctrl_treat, Characteristics.treatment., Characteristics.inferred.lineage., Characteristics.inferred.trophectoderm.subpopulation., sex, Picking date, Prepping date, RN_LN, dir1
Output except for this report will be 3 files:
First, all the QC-stats that will be used for filtering are plotted as according to each criteria. The filtering criteria are defined in /proj/b2017179/private/analysis/s_petropoulos_1701/source/qc_settings.yaml
under section filter_settings: filters
Histograms with all QC-data, red for failed cells and blue for non-failed cells according to each criteria, cutoffs defined in headings for each section
Scatter plots with all QC-data, red for filtered cells and black for non-filtered cells
Overlap of filtered cells per filtering criteria. Each box shows the intersection of 2 different filtering criteria, and along the diagonal the total number of cells below cutoff for one criteria is shown. The final column “unique” shows number of cells are only below cutoff for that particular stat.
The number of cells per batch (as defined by section “batches” in /proj/b2017179/private/analysis/s_petropoulos_1701/source/qc_settings.yaml
) and proportion of filtered cells per batch.
The distribution of each of the QC-stats split by the batches are plotted as violin plots, together with a histogram for all samples. If specified, the section “plot” defines which qc-measures to plot stats for, if none is specified, all columns in the input file are plotted.
Scatterplot of the qc-stats colored by batch: Embryo. Colors per batch are defined the same way as in the first violin plot on previous plots.
Note: the script will only plot this scatter for the first batch defined in section “batches” of /proj/b2017179/private/analysis/s_petropoulos_1701/source/qc_settings.yaml
.
Summary of expression data using scater package. Removing 35 cells from expression matrix with 834 samples. Keeping 47882 genes out of 58313, that have higher sum expression than 15.98.
Highest expressed genes colored by batch: Embryo.
Explained variance for different QC-measures/batch info.
PCA with PC1 and PC2 coloured by different QC-measures/batch info.
Top correlated PCs for each of the different QC-measures/batch info