Introduction to ggplot

Lokesh Mano

NBIS, SciLifeLab

14-Apr-2025

Contents

Quick checkups

  • Coffee breaks (Morning and afternoon fika)
  • Webpage structure
  • Plots from drop-down
  • Times mentioned in schedule are super arbitrary

R basics

n <- c(2,3,4,2,1,2,4,5,10,11,8,9)
print(n)
 [1]  2  3  4  2  1  2  4  5 10 11  8  9
z <- n +3
print(z)
 [1]  5  6  7  5  4  5  7  8 13 14 11 12
z <- n +3
mean(z)
[1] 8.083333
s <- c("I", "love", "Batman")
print(s)
[1] "I"      "love"   "Batman"

Data types

  • int stands for integers
  • dbl stands for doubles or real numbers
  • chr stands for character vectors or strings
  • dttm stands for date and time,
  • lgl stands for logical with just TRUE or FALSE
  • fctr stands for factors which R uses to state categorical variables
  • date stands for dates

You can find what kind of vectors you have or imported by using the function class()

Data Formats

  • Wide format
Sample_1 Sample_2 Sample_3 Sample_4
ENSG00000000003 321 303 204 492
ENSG00000000005 0 0 0 0
ENSG00000000419 696 660 472 951
ENSG00000000457 59 54 44 109
ENSG00000000460 399 405 236 445
ENSG00000000938 0 0 0 0
  • familiarity
  • conveniency
  • you see more data

Data Formats

  • Long format
Sample_ID Gene count
Sample_1 ENSG00000000003 321
Sample_1 ENSG00000000005 0
Sample_1 ENSG00000000419 696
Sample_1 ENSG00000000457 59
Sample_1 ENSG00000000460 399
Sample_1 ENSG00000000938 0
Sample_ID Sample_Name Time Replicate Cell Gene count
Sample_1 t0_A t0 A A431 ENSG00000000003 321
Sample_1 t0_A t0 A A431 ENSG00000000005 0
Sample_1 t0_A t0 A A431 ENSG00000000419 696
Sample_1 t0_A t0 A A431 ENSG00000000457 59
Sample_1 t0_A t0 A A431 ENSG00000000460 399
Sample_1 t0_A t0 A A431 ENSG00000000938 0

Data Formats

  • Long format
Sample_ID Sample_Name Time Replicate Cell Gene count
Sample_1 t0_A t0 A A431 ENSG00000000003 321
Sample_1 t0_A t0 A A431 ENSG00000000005 0
Sample_1 t0_A t0 A A431 ENSG00000000419 696
Sample_1 t0_A t0 A A431 ENSG00000000457 59
Sample_1 t0_A t0 A A431 ENSG00000000460 399
Sample_1 t0_A t0 A A431 ENSG00000000938 0
  • Easier to add data to the existing dataset without restructuring
  • Most databases store and maintain data in long-formats due to its efficiency
  • R tools like ggplot require data in long format

Data Frames

  • Let us take a quick look into data.frame in R:
  • Imported files are usually in data.frame
  • Structured matrix with row.names and colnames
  • Probably most used data.type in Biology!

Important tips

  • ? and ??
    • ? help manual for a particular function
    • ?? searches the entire R library for the term
    • vignette("ggplot2")
  • TAB completion
    • Probably most useful to avoid unnecessary error messages (and/or frustration)!
  • Case sensitive
print(N)
Error: object 'N' not found
print(n)
 [1]  2  3  4  2  1  2  4  5 10 11  8  9

Reading files

iris-head-corrupted.csv
Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5,3.6,#1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa

Reserved variables

  • Reserved variables like T, F, character and many others …
  • How can you check if something is a reserved variable?

Grammar of Graphics

  • Data: Input data
  • Geom: A geometry representing data. Points, Lines etc
  • Aesthetics: Visual characteristics of the geometry. Size, Color, Shape etc
  • Scale: How visual characteristics are converted to display values
  • Statistics: Statistical transformations. Counts, Means etc
  • Coordinates: Numeric system to determine position of geometry. Cartesian, Polar etc
  • Facets: Split data into subsets

Building a graph

Build-Demo

Build-Demo

Build-Demo

Build-Demo

Geoms

Geoms

Multiple geoms

Aesthetics

  • Aesthetic mapping vs aesthetic parameter

Aesthetics

Scales

  • If you would like to change the default aesthetics
  • scales: position, color, fill, size, shape, alpha, linetype
  • syntax: scale_<aesthetic>_<type>

Discrete Colors:

Scales • Discrete Colors

Scales • Continuous Colors

Scales • Shape

  • In RStudio, type scale_, then press TAB
  • Similarly you can use it for shape, axis, fill, etc..

Facets

  • Split to subplots based on variable(s)
  • Facetting in one dimension

Facets

facet_wrap

Facets • facet_grid

  • Facetting in two dimensions

Thank you. Questions?