You have found some older gene expression data, based on the microarray technology. They contains measurements for 22215 genes for 189 samples, across 7 tissue (kidney, hippocampus, cerebellum, colon, liver, endometrium and placenta).
Data can be loaded and preview:
GSM11805.CEL.gz GSM11814.CEL.gz GSM11823.CEL.gz GSM11830.CEL.gz
1007_s_at 10.191267 10.509167 10.272027 10.252952
1053_at 6.040463 6.696075 6.144663 6.575153
117_at 7.447409 7.775354 7.696235 8.478135
121_at 12.025042 12.007817 11.633279 11.075286
GSM12067.CEL.gz
1007_s_at 10.157605
1053_at 6.606701
117_at 8.116336
121_at 10.832528
'data.frame': 189 obs. of 6 variables:
$ filename : chr "GSM11805.CEL.gz" "GSM11814.CEL.gz" "GSM11823.CEL.gz" "GSM11830.CEL.gz" ...
$ DB_ID : chr "GSM11805" "GSM11814" "GSM11823" "GSM11830" ...
$ ExperimentID : chr "GSE781" "GSE781" "GSE781" "GSE781" ...
$ Tissue : chr "kidney" "kidney" "kidney" "kidney" ...
$ SubType : chr "normal" "cancer" "normal" "cancer" ...
$ ClinicalGroup: chr NA NA NA NA ...
Exercise 1 (Partition methods) Use first two genes only and run k-means clustering.
- Find optimal number of \(k\) using Silhouette method.
- Plot samples using the first two genes as x and y coordinates and visualize your cluster results on one a scatter plot.
- Use first 1000 genes and the same value of \(k\). Is this clustering solution better now? How can you tell?
Exercise 2 (HCL) Select samples corresponding to two tissues of your choice. Run HCL and compare dendrograms:
- with complete and ward linkage, distance measure Euclidean
- with complete and ward linkage, distance measure Canberra
Exercise 3 (Pvclust) Try running pvclust on the samples you’ve chosen above. Which clusters are supported by bootstrapping?
Exercise 4 (Heatmap) Select top 100 genes based on variance (with highest variance). Make a heatmap using ComplexHeatmap package. Group columns (samples) by tissue and split rows (genes) using k-means (k = 7).
- Do you see any interesting patterns?
- How would you go about extracting genes belonging to a specific cluster, if you were interested in running functional annotations on those?
Answers to exercises
cerebellum colon endometrium hippocampus kidney liver
38 34 15 31 39 26
placenta
6