There are several different data structures that are commonly used in R. The different data structures can be seen as different ways to organise data. In this exercise we will focus on vectors, which are the base data structure in R, and will also get an overview of the key data types (modes) that are found in R. At the end of this exercise you should know:
From the lecture you might remember that all elements in any data stuctures found in R will be of a certain type (or have a certain mode). The four most commonly used data types in R are: logical, integer, double (often called numeric), and character. The names hints at what they are.
In many cases the mode of an entry is determined by the content so if you save the value 5.1 as a variable in R, the variable will by R automatically be recognised as numeric. If you instead have a text string like “hello world” it will have the mode character. Below you will also see examples of how you can specify the mode and not rely on R inferring the right mode based on content.
Depending on the type of data one needs to store in R different data structures can be used. The four most commonly used data types in R is vectors, lists, matrixes and data frames. We will in this exercise work only with vectors.
The most basic data structure in R are vectors. Vectors are 1-dimensional data structures that contain only one type of data (eg. all entries must have the same mode). To create a vector in R one can use the function c()
(concatenate or combine) as seen below. This example will create a vector named example.vector with 3 entries in it.
example.vector <- c(10, 20, 30)
If you need more information about the function c()
you can always use the built-in manual in R. Typing ?c()
will bring up the documentation for the function c()
.
Once you have created this vector in R, you can access it by simply typing its name in an interactive session.
example.vector
## [1] 10 20 30
The output generate on screen shows the entries in your vector and the 1 in squared brackets indicates what position in the vector the entry to the right of it has. In this case 10 is the first entry of the vector.
If we for some reason only wanted to extract the value 10 from this vector we can use the fact that we know it is the first position to do so.
example.vector[1]
## [1] 10
Since a vector can only contain one data type, all members need to be of the same type. If you try to combine data of different types into the same vector, R will not warn you, but instead coerce it to the most flexible type (From least to most flexible: Logical, integer, double, character). Hence, adding a number to a logical vector will turn the whole vector to a numeric vector.
To check what data type an object is, run the R built-in function class()
, with the object as the only parameter.
class(example.vector)
## [1] "numeric"
If you for any reason want to have more information about any object you have stored in your R session the command str()
is very helpful.
str(example.vector)
## num [1:3] 10 20 30
As in other programming languages there are a set of basic operators in R.
Operation | Description | Example | Example Result |
---|---|---|---|
x + y |
Addition | 1 + 3 |
4 |
x - y |
Subtraction | 1 - 3 |
-2 |
x * y |
Multiplication | 2 * 3 |
6 |
x / y |
Division | 1 / 2 |
0.5 |
x ^ y |
Exponent | 2 ^ 2 |
4 |
x %% y |
Modular arethmetic | 1 %% 2 |
1 |
x %/% y |
Integer division | 1 %/% 2 |
0 |
x == y |
Test for equality | 1 == 1 |
TRUE |
x <= y |
Test less or equal | 1 <= 1 |
TRUE |
x >= y |
Test for greater or equal | 1 >= 2 |
FALSE |
x && y |
Non-vectorized boolean AND | c(T,F) && c(T,T) |
TRUE |
x & y |
Vectorized boolean AND | c(T,F) & c(T,T) |
TRUE FALSE |
x || y |
Non-vectorized boolean OR | c(T,F) || c(T,T) |
TRUE |
x | y |
Vectorized boolean OR | c(T,F) || c(T,T) |
TRUE TRUE |
!x |
Boolean not | 1 != 2 |
TRUE |
Besides these, there of course numerous more or less simple functions available in any R session. For example, if we want to add all values in our example.vector that we discussed earlier, we can do that using addition:
example.vector[1] + example.vector[2] + example.vector[3]
## [1] 60
But we can also use the function sum()
that adds all numeric values present as arguments.
sum(example.vector)
## [1] 60
To learn more about a function use the built in R manual as described earlier. If you do not know the name of a function that you believe should be found in R, use the function help.search()
or use google to try and identify the name of the command.
In all exercises on this course it is important that you prior to running the commands in R, try to figure out what you expect the result to be. You should then verify that this will indeed be the result by running the command in an R session. In case there is a discrepency between your expectations and the actual output make sure you understand why before you move forward. If you can not figure out howto, or which command to run you can click the key to reveal example code including expected output. Also note that in many cases there multiple solutions that solve the problem equally well.
Open R-studio and create two numeric vectors named x and y that are of equal length. Use these vectors to answer the questions below.
x <- c(2, 4 ,7)
y <- c(1, 5, 11)
length(x)
## [1] 3
length(x + y)
## [1] 3
sum(x)
## [1] 13
sum(y*y)
## [1] 147
x + y
## [1] 3 9 18
z <- x * 2
length(z)
## [1] 3
z.mean <- mean(z)
length(z.mean)
## [1] 1
vec.tmp <- 5:107
vec.tmp
## [1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
## [19] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## [37] 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
## [55] 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
## [73] 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
## [91] 95 96 97 98 99 100 101 102 103 104 105 106 107
vec.tmp2 <- rep(3, length(vec.tmp))
vec.tmp2
## [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [38] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [75] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
rep(1:17, 1:17)
## [1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7
## [26] 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 10 10 10 10 10
## [51] 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12
## [76] 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14
## [101] 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16
## [126] 16 16 16 16 16 16 16 16 16 16 16 17 17 17 17 17 17 17 17 17 17 17 17 17 17
## [151] 17 17 17
c(1, 3, 5) + c(2, 4, 6)
c(1, 3, 5) + c(2, 4, 6, 8)
c(1, 3) - c(2, 4, 6 ,8)
Create a new character vector that contains the following words and save it using a suitable name:
apple, banana, orange, kiwi, potato
.
veggies <- c("apple", "banana", "orange", "kiwi", "potato")
Do the following on your newly created vector.
veggies[3]
## [1] "orange"
veggies[-5]
veggies[1:4]
## [1] "apple" "banana" "orange" "kiwi"
## [1] "apple" "banana" "orange" "kiwi"
veggies[veggies=="apple" | veggies == "banana" | veggies == "orange" | veggies == "kiwi"]
veggies[veggies!="potato"]
## [1] "apple" "banana" "orange" "kiwi"
## [1] "apple" "banana" "orange" "kiwi"
as.numeric(veggies)
## [1] NA NA NA NA NA
selection <- c(FALSE, TRUE, FALSE, TRUE, FALSE)
veggies[selection]
## [1] "banana" "kiwi"
Alternative solution, why do this work?
selection2 <- c(FALSE, TRUE)
veggies[selection2]
## [1] "banana" "kiwi"
names(veggies) <- c("a", "b", "o", "k", "p")
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
letter.sample <- sample(letters, size = 30, replace = TRUE)
letter.sample <- factor(letter.sample)
summary(letter.sample)
## a b d e f g h i j l n o q r s t x
## 1 2 2 1 2 1 3 1 1 1 1 1 2 2 4 2 3
letters[14:19]
## [1] "n" "o" "p" "q" "r" "s"
letters[1:length(letters)-1]
letters[-length(letters)]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y"
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y"
which(letters=="u")
## [1] 21
paste(letters, sep = "", collapse = "")
## [1] "abcdefghijklmnopqrstuvwxyz"
rnorm()
.
norm.rand <- rnorm(100, mean = 2, sd = 4)
length(norm.rand[norm.rand<0])
## [1] 34
sd(norm.rand)
mean(norm.rand)
median(norm.rand)
## [1] 3.790974
## [1] 1.628312
## [1] 1.478818
norm.rand[11] <- NA
sd(norm.rand, na.rm = TRUE)
mean(norm.rand, na.rm = TRUE)
median(norm.rand, na.rm = TRUE)
## [1] 3.770706
## [1] 1.682799
## [1] 1.518462
norm.rand[100] <- "L"
sd(norm.rand, na.rm = TRUE)
mean(norm.rand, na.rm = TRUE)
median(norm.rand, na.rm = TRUE)
## [1] 3.747392
## [1] NA
## [1] "1.87388843472795"
geno <- rep("Geno", 57)
needed.letters <- rep(letters[1:19], 3)
needed.numbers <- rep(1:3, 19)
temp <- paste(geno, needed.letters, needed.numbers, sep = "_")
sort(temp)
# One line solution that avoids need of knowing length(geno) and sorting
# Find s position in alphabet
which(letters == "s")
paste("Geno",rep(letters[1:19],rep(3,19)),1:3,sep="_")
## [1] "Geno_a_1" "Geno_a_2" "Geno_a_3" "Geno_b_1" "Geno_b_2" "Geno_b_3"
## [7] "Geno_c_1" "Geno_c_2" "Geno_c_3" "Geno_d_1" "Geno_d_2" "Geno_d_3"
## [13] "Geno_e_1" "Geno_e_2" "Geno_e_3" "Geno_f_1" "Geno_f_2" "Geno_f_3"
## [19] "Geno_g_1" "Geno_g_2" "Geno_g_3" "Geno_h_1" "Geno_h_2" "Geno_h_3"
## [25] "Geno_i_1" "Geno_i_2" "Geno_i_3" "Geno_j_1" "Geno_j_2" "Geno_j_3"
## [31] "Geno_k_1" "Geno_k_2" "Geno_k_3" "Geno_l_1" "Geno_l_2" "Geno_l_3"
## [37] "Geno_m_1" "Geno_m_2" "Geno_m_3" "Geno_n_1" "Geno_n_2" "Geno_n_3"
## [43] "Geno_o_1" "Geno_o_2" "Geno_o_3" "Geno_p_1" "Geno_p_2" "Geno_p_3"
## [49] "Geno_q_1" "Geno_q_2" "Geno_q_3" "Geno_r_1" "Geno_r_2" "Geno_r_3"
## [55] "Geno_s_1" "Geno_s_2" "Geno_s_3"
## [1] 19
## [1] "Geno_a_1" "Geno_a_2" "Geno_a_3" "Geno_b_1" "Geno_b_2" "Geno_b_3"
## [7] "Geno_c_1" "Geno_c_2" "Geno_c_3" "Geno_d_1" "Geno_d_2" "Geno_d_3"
## [13] "Geno_e_1" "Geno_e_2" "Geno_e_3" "Geno_f_1" "Geno_f_2" "Geno_f_3"
## [19] "Geno_g_1" "Geno_g_2" "Geno_g_3" "Geno_h_1" "Geno_h_2" "Geno_h_3"
## [25] "Geno_i_1" "Geno_i_2" "Geno_i_3" "Geno_j_1" "Geno_j_2" "Geno_j_3"
## [31] "Geno_k_1" "Geno_k_2" "Geno_k_3" "Geno_l_1" "Geno_l_2" "Geno_l_3"
## [37] "Geno_m_1" "Geno_m_2" "Geno_m_3" "Geno_n_1" "Geno_n_2" "Geno_n_3"
## [43] "Geno_o_1" "Geno_o_2" "Geno_o_3" "Geno_p_1" "Geno_p_2" "Geno_p_3"
## [49] "Geno_q_1" "Geno_q_2" "Geno_q_3" "Geno_r_1" "Geno_r_2" "Geno_r_3"
## [55] "Geno_s_1" "Geno_s_2" "Geno_s_3"