# Clustering

In this tutorial we will continue the analysis of the integrated dataset. We will use the integrated PCA to perform the clustering. First we will construct a \(k\)-nearest neighbour graph in order to perform a clustering on the graph. We will also show how to perform hierarchical clustering and k-means clustering on PCA space.

Let’s first load all necessary libraries and also the integrated dataset from the previous step.

``````if (!require(clustree)) {
install.packages("clustree", dependencies = FALSE)
}``````
``## Loading required package: clustree``
``## Loading required package: ggraph``
``## Loading required package: ggplot2``
``````suppressPackageStartupMessages({
library(scater)
library(scran)
library(cowplot)
library(ggplot2)
library(rafalib)
library(pheatmap)
library(igraph)
})

## Graph clustering

The procedure of clustering on a Graph can be generalized as 3 main steps:

1. Build a kNN graph from the data

2. Prune spurious connections from kNN graph (optional step). This is a SNN graph.

3. Find groups of cells that maximizes the connections within the group compared other groups.

### Building kNN / SNN graph

The first step into graph clustering is to construct a k-nn graph, in case you don’t have one. For this, we will use the PCA space. Thus, as done for dimensionality reduction, we will use ony the top N PCA dimensions for this purpose (the same used for computing UMAP / tSNE).

``````# These 2 lines are for demonstration purposes only
g <- buildKNNGraph(sce, k = 30, use.dimred = "MNN")
``````# plot the KNN graph