R Logo@NBIS logo

Vectors in R

There are several different data structured that are commonly used in R. The different data structures can be seen as different ways to organise data. In this exercise we will focus on vectors that are the base data structure in R and will also repeat some of the information on the key data types found in R. At the end of this exercise you should know:

Data types

From the lectures and the earlier exercise you might remember that all elements in any data stuctures found in R will be of a certain type (or have a certain mode). The four most commonly used data types in R are: logical, integer, double (often called numeric), and character. The names hints at what they are.

Vectors in R

Depending on the type of data one needs to store in R different data structures can be used. The four most commonly used data types in R is vectors, lists, matrixes and data frames. We will in this exercise work only with vectors.

Vectors are the most basic data structure in R and even single entry data are stored as vectors. Vectors are 1-dimensional data structures that contain only one type of data (eg. all entries must have the same mode). To create a vector in R one can use the function c() (concatenate or combine) as seen below. This example will create a vector named example.vector with 3 entries in it.

example.vector <- c(10, 20, 30)

NB! If you need more information about the function c() you can always use the built-in manual in R. Typing ?c will bring up the documentation for the function c().

Once you have created this vector in R, you can access it by simply typing its name in an interactive session.

example.vector

[1] 10 20 30

The output generate on screen shows the entries in your vector and the 1 in squared brackets indicates what position in the vector the entry to the right of it have. In this case 10 is the first entry of the vector.

If we for some reason only wanted to extract the value 10 from this vector we can use the fact that we know it is the first position to do so.

example.vector[1]

[1] 10

Since a vector can contain only one data type, all members need to be of the same type. If you try to combine data of different types into the same vector, R will not warn you, but instead coerce it to the most flexible type (From least to most flexible: Logical, integer, double, character). Hence, adding a number to a logical vector will turn the whole vector to a numeric vector.

To check what data type an object is, run the R built-in function class(), with the object as the only parameter.

class(example.vector)

[1] "numeric"

If you for any reason want to have more information about any object you have stored in your R session the command str() is very helpful.

str(example.vector)

num [1:3] 10 20 30

Vectors have a length, which corresponds to the number of entries it contains. To obtain the length of an vector the function length is used.

length(example.vector)

[1] 3

Besides these, there of course numerous more or less simple functions available in any R session. For example, if we want to add all values in our example.vector that we discussed earlier, we can do that using addition:

example.vector[1] + example.vector[2] + example.vector[3]

[1] 60

But we can also use the function sum() that adds all numeric values present as arguments.

sum(example.vector)

[1] 60

To learn more about a function use the built in R manual as described earlier. If you do not know the name of a function that you believe should be found in R, use the function help.search() or use google to try and identify the name of the command.

Try to see if you can use google to find the R command that returns the minumu value in a vector.

You can also assign names to to entries in a vector with the function names

names(example.vector) <- c("A", "B", "C")

and the same function without any assignment will instead return the names of the entries in the vector.

names(example.vector)

[1] "A" "B" "C"

With named vector the names can be used to select entries. One can hence do this

example.vector["A"]

a command that is generating the same results as

example.vector[1]

Exercise: Creating and working with vectors

As with yesterdays exercises it is important that you prior to running the commands ponder about the output expected. In addition asking your fellow course participants (and the TAs) is often a good way to formulate your thinking around problems.

Create and modify vectors

Open R-studio and create two numeric vectors named x and y that are of equal length. Use these vectors to answer the questions below.

:computer: Create vectors

:key: Click to see example R code to generate vectors
x <- c(2, 4 ,7)
y <- c(1, 5, 11)


  1. How many numbers are there in the vector x?
    :key: Click to see how
    length(x)
    
    [1] 3
    


  2. How many numbers will x + y generate?
    :key: Click to see how
    length(x + y)
    
    [1] 3
    


  3. What is the sum of all values in x?
    :key: Click to see how
    sum(x)
    
    [1] 13
    


  4. What is the sum of y times y?
    :key: Click to see how
    sum(y*y)
    
    [1] 147
    


  5. What do you get if you add x and y?
    :key: Click to see how
    x + y
    
    [1]  3  9 18
    


  6. Assign x times 2 to a new vector named z
    :key: Click to see how
    z <- x * 2
    


  7. How many numbers will z have, why?
    :key: Click to see how
    length(z)
    
    [1] 3
    


  8. Assign the mean of z to a new vector named z.mean and determine the length of z.mean
    :key: Click to see how
    z.mean <- mean(z)
    length(z.mean)
    
    [1] 1
    


  9. Create a numeric vector with all integers from 5 to 107
    :key: Click to see how
    vec.tmp <- 5:107
    vec.tmp
    
    [1]   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22
    [19]  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
    [37]  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58
    [55]  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
    [73]  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94
    [91]  95  96  97  98  99 100 101 102 103 104 105 106 107
    


  10. Create a numeric vector with the same length as the previos one, but only containg the number 3
    :key: Click to see how
    vec.tmp2 <- rep(3, length(vec.tmp))
    vec.tmp2
    
    [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
    [38] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
    [75] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
    


  11. Create a vector that contain all numbers from 1 to 17, where each number occurs the the same number of times as the number itself eg. 1, 2, 2, 3, 3, 3…
    :key: Click to see how
    rep(1:17, 1:17)
    
    [1]  1  2  2  3  3  3  4  4  4  4  5  5  5  5  5  6  6  6  6  6  6  7  7  7  7
    [26]  7  7  7  8  8  8  8  8  8  8  8  9  9  9  9  9  9  9  9  9 10 10 10 10 10
    [51] 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12
    [76] 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14
    [101] 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16
    [126] 16 16 16 16 16 16 16 16 16 16 16 17 17 17 17 17 17 17 17 17 17 17 17 17 17
    [151] 17 17 17
    
  12. What will be the result of the following calculations?
    • c(1, 3, 5) + c(2, 4, 6)
    • c(1, 3, 5) + c(2, 4, 6, 8)
    • c(1, 3) - c(2, 4, 6 ,8)
  13. Print a truth table for OR (for three distinct logical values). Read about truth tables here https://en.wikipedia.org/wiki/Truth_table A truth table is hence the pairwise comparison of the different boolean values available. It should hence contain the output from:
    TRUE or FALSE
    TRUE or NA
    TRUE or TRUE
    FALSE or FALSE
    FALSE or NA
    NA or NA

    You can create the table manually by doing the comparisons one by one in R, but the solution below uses a more advanced approach and is likely a bit overwhelming, but look at the commands and try to read about the function in the R manual, to see if you can sort out how the function works.

    Click to see how
    x <- c(NA, FALSE, TRUE)
    names(x) <- as.character(x)
    outer(x, x, "|")
    
          NA   FALSE TRUE
    NA    NA   NA    TRUE
    FALSE NA   FALSE TRUE
    TRUE  TRUE TRUE  TRUE
    


  14. Create two numeric vectors of length 4 and test run all the basic operators (as seen in the table on the data types exercise) with these two as arguments. Make sure you understand the output generated by R.

Modify and subset vectors

Create a new character vector that contains the following words and save it using a suitable name: apple, banana, orange, kiwi, potato

:key: Click to see how
veggies <- c("apple", "banana", "orange", "kiwi", "potato")


Do the following on your newly created vector.

  1. Select orange from the vector
    :key: Click to see how
    veggies[3]
    
    [1] "orange"
    


  2. Select all fruits from the vector
    :key: Click to see how
    veggies[-5]
    
    [1] "apple"  "banana" "orange" "kiwi"
    
    veggies[1:4]
    
    [1] "apple"  "banana" "orange" "kiwi"
    


  3. Do the same selection as in question 2 without using index positions
    :key: Click to see how
    veggies[veggies=="apple" | veggies == "banana" | veggies == "orange" | veggies == "kiwi"]
    
    [1] "apple"  "banana" "orange" "kiwi"
    
    veggies[veggies!="potato"]
    
    [1] "apple"  "banana" "orange" "kiwi"
    


  4. Convert the character string to a numeric vector
    :key: Click to see how
    as.numeric(veggies)
    
    [1] NA NA NA NA NA
    Warning message:
    NAs introduced by coercion
    


  5. Create a vector of logic values that can be used to extract every second value from your character vector
    :key: Click to see how
    selection <- c(FALSE, TRUE, FALSE, TRUE, FALSE)
    veggies[selection]
    
    [1] "banana" "kiwi"
    


    :key: Alternative solution, why do this work?
    selection2 <- c(FALSE, TRUE)
    veggies[selection2]
    
    [1] "banana" "kiwi"
    


  6. Add the names a, b, o, k and p to the vector
    :key: Click to see how
    names(veggies) <- c("a", "b", "o", "k", "p")
    


  7. Create a vector containing all the letters in the alphabet (NB! this can be done without having to type all letters). Google is your friend
    :key: Click to see how
    letters
    
    [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
    [20] "t" "u" "v" "w" "x" "y" "z"
    


  8. Sample 30 values randomly with replacement from your letter vector and convert the character vector to factors. Which of the levels have most entries in the vector?
    :key: Click to see how
    letter.sample <- sample(letters, size = 30, replace = TRUE)
    letter.sample <- factor(letter.sample)
    summary(letter.sample)
    
    a b c e g k l m n o q r t v w x z
    3 1 2 1 3 1 1 1 3 1 2 2 1 3 2 1 2
    


  9. Extract the letter 14 to 19 from the created vector
    :key: Click to see how
    letters[14:19]
    
    [1] "n" "o" "p" "q" "r" "s"
    


  10. Extract all but the last letter
    :key: Click to see how
    letters[1:length(letters)-1]
    
    [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
    [20] "t" "u" "v" "w" "x" "y"
    
    letters[-length(letters)]
    
    [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
    [20] "t" "u" "v" "w" "x" "y"
    


  11. Which is the index position of the letter u in the vector?
    :key: Click to see how
    which(letters=="u")
    
    [1] 21
    


  12. Create a new vector of length one that holds all the alphabet a single entry
    :key: Click to see how
    paste(letters, sep = "", collapse = "")
    
    [1] "abcdefghijklmnopqrstuvwxyz"
    


  13. Create a numeric vector by sampling 100 numbers from a normal distribution with mean 2 and standard deviation 4. Hint! Check the function rnorm()
    :key: Click to see how
    norm.rand <- rnorm(100, mean = 2, sd = 4)
    


  14. How many of the generated values are negative?
    :key: Click to see how
    length(norm.rand[norm.rand<0])
    [1] 23
    


  15. Calculate the standard deviation, mean, median of your random numbers
    :key: Click to see how
    sd(norm.rand)
    mean(norm.rand)
    median(norm.rand)
    
    [1] 3.541989
    [1] 1.910667
    [1] 1.631083
    


  16. Replace the 11th value in your random number vector with NA and calculate the same summary statistics again
    :key: Click to see how
    norm.rand[11] <- NA
    sd(norm.rand, na.rm = TRUE)
    mean(norm.rand, na.rm = TRUE)
    median(norm.rand, na.rm = TRUE)
    
    [1] 3.553763
    [1] 1.889685
    [1] 1.62893
    


  17. Replace the last position in the vector with the letter L and calculate the same summary statistics.
    :key: Click to see how
    norm.rand[100] <- "L"
    sd(norm.rand, na.rm = TRUE)
    mean(norm.rand, na.rm = TRUE)
    median(norm.rand, na.rm = TRUE)
    
    Warning message:
    In var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
    NAs introduced by coercion
    [1] NA
    Warning message:
    In mean.default(norm.rand, na.rm = TRUE) :
    argument is not numeric or logical: returning NA
    [1] NA
    Warning message:
    In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
    argument is not numeric or logical: returning NA
    


  18. In many cases one has data from multiple replicates and different treatments in such cases it can be useful to have names of the type: Geno_a_1, Geno_a_2, Geno_a_3, Geno_b_1, Geno_b_2…, Geno_s_3 Try to create this such a vector without manually typing it all in.
    :key: Click to see how
    geno <- rep("Geno", 57)
    needed.letters <- rep(letters[1:19], 3)
    needed.numbers <- rep(1:3, 19)
    temp <- paste(geno, needed.letters, needed.numbers, sep = "_")
    sort(temp)
    [1] "Geno_a_1" "Geno_a_2" "Geno_a_3" "Geno_b_1" "Geno_b_2" "Geno_b_3"
    [7] "Geno_c_1" "Geno_c_2" "Geno_c_3" "Geno_d_1" "Geno_d_2" "Geno_d_3"
    [13] "Geno_e_1" "Geno_e_2" "Geno_e_3" "Geno_f_1" "Geno_f_2" "Geno_f_3"
    [19] "Geno_g_1" "Geno_g_2" "Geno_g_3" "Geno_h_1" "Geno_h_2" "Geno_h_3"
    [25] "Geno_i_1" "Geno_i_2" "Geno_i_3" "Geno_j_1" "Geno_j_2" "Geno_j_3"
    [31] "Geno_k_1" "Geno_k_2" "Geno_k_3" "Geno_l_1" "Geno_l_2" "Geno_l_3"
    [37] "Geno_m_1" "Geno_m_2" "Geno_m_3" "Geno_n_1" "Geno_n_2" "Geno_n_3"
    [43] "Geno_o_1" "Geno_o_2" "Geno_o_3" "Geno_p_1" "Geno_p_2" "Geno_p_3"
    [49] "Geno_q_1" "Geno_q_2" "Geno_q_3" "Geno_r_1" "Geno_r_2" "Geno_r_3"
    [55] "Geno_s_1" "Geno_s_2" "Geno_s_3"