4  Wilcoxon rank sum test

two unrelated groups

Wilcoxon rank sum test is used to assess whether an outcome variable differs between two exposure groups, so it equivalent to the non-parametric two sample t test. It examines whether the median difference between two groups is equal to zero. Let’s follow an example to get a better idea how it works.

Example 4.1 (Wilcoxon rank sum test) We have weighted new born babies born to 5 non-smokers and 6 smokers. The measurements, with weight in kg, are shown below. Let’s see if there is enough evidence to reject the null hypothesis of the median difference between the groups being equal to zero.

The data are shown below:

Code
# input data
bw.nonsmokers <- c(3.99, 3.89, 3.6, 3.73, 3.31)
bw.smokers <- c(3.18, 2.74, 2.9, 3.27, 3.15, 2.42)

# group labels
grp.nonsmokers <- rep("No", 1, length(bw.nonsmokers))
grp.smokers <- rep("Yes", 1, length(bw.smokers))

# no. of observations per group
n.nonsmokers <- length(bw.nonsmokers)
n.smokers <- length(bw.smokers)

# put data into one data frame
data.babies <- data.frame(id = 1:(n.nonsmokers + n.smokers),
                          weight = c(bw.nonsmokers, bw.smokers),
                          smoking = c(grp.nonsmokers, grp.smokers))

# print data
data.babies %>%
  print()
   id weight smoking
1   1   3.99      No
2   2   3.89      No
3   3   3.60      No
4   4   3.73      No
5   5   3.31      No
6   6   3.18     Yes
7   7   2.74     Yes
8   8   2.90     Yes
9   9   3.27     Yes
10 10   3.15     Yes
11 11   2.42     Yes

4.1 Define the null and alternative hypothesis under study

\(H_0:\) the difference between the medians of the two groups equals to zero

\(H_1:\) the difference between the medians of the two groups does not equals to zero

4.2 Test statistics: rank the values

We rank the values of the weights from both groups together in ascending order of magnitude. If any of the values are equal, we average their ranks.

Code
# rank weight variable in ascending order
df.wilcoxon.rank.sum <- data.babies %>%
  mutate(rank = rank(weight)) %>%
  print()
   id weight smoking rank
1   1   3.99      No   11
2   2   3.89      No   10
3   3   3.60      No    8
4   4   3.73      No    9
5   5   3.31      No    7
6   6   3.18     Yes    5
7   7   2.74     Yes    2
8   8   2.90     Yes    3
9   9   3.27     Yes    6
10 10   3.15     Yes    4
11 11   2.42     Yes    1

4.3 Test statistics: sum up the ranks in the smaller group

We add up the ranks in the group with the smaller sample size. If both groups have equal number of measurements just pick one group. Here, the smaller group are the no smokers, and the rank sum up to \(T=45\)

Code
# sum up ranks for the smaller group
data.sumrank <- df.wilcoxon.rank.sum %>%
  group_by(smoking) %>%
  summarize(T = sum(rank)) %>% 
  filter(smoking == "No") %>%
  pull(T) %>%
  print()
[1] 45

4.4 Test statistics: find & interpret the P-value

We compare the \(T\) value with the values in “Critical range for the Wilcoxon rank sum test” found online or here. The range shown for \(P=0.05\) is from 18 to 42 for sample size 5 and 6 respectively. \(T\) value below 18 or above 42 corresponds to \(P-value < 0.05\). In our case \(T=45\) so above 42, hence we have enough evidence to reject the null hypothesis that the median birth weight of children born to smokers is the same as the median birth weight of children born to non-smokers.

4.5 In R

In R we compute the test with kruskal.test() function changing paired parameter to False.

# compute Wilcoxon rank sum test
wilcox.test(data.babies$weight ~ data.babies$smoking, 
            exact = T, 
            paired = F)

    Wilcoxon rank sum exact test

data:  data.babies$weight by data.babies$smoking
W = 30, p-value = 0.004329
alternative hypothesis: true location shift is not equal to 0

4.6 Note on confidence intervals

To get the confidence intervals we could set conf.int = T:

# compute Wilcoxon rank sum test incl. CI
wilcox.test(data.babies$weight ~ data.babies$smoking, 
            exact = F, 
            paired = F, 
            conf.int = T)

    Wilcoxon rank sum test with continuity correction

data:  data.babies$weight by data.babies$smoking
W = 30, p-value = 0.008113
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 0.3300051 1.2499709
sample estimates:
difference in location 
             0.7224397 

or we could obtain CI via bootstrapping as we have seen earlier today:

# calculate bootstrapping CI
n <- 1000 # number of bootstrapped samples
v.mdiff <- c() # vector to hold difference in means for each iteration

for (i in 1:n){
  s.nonsmokers <- sample(bw.nonsmokers, replace = T) # sampling from nonsmokers
  s.smokers <- sample(bw.smokers, replace = T) # sampling from smokers

  m.nonsmokers <- median(s.nonsmokers) # calculate median of nonsmokers
  m.smokers <- median(s.smokers) # calculate median of nonsmokers

  v.mdiff[i] <- m.nonsmokers - m.smokers # difference in median
}

# use percentiles to calculate 95% CI, top and bottom 2.5%
CI.95 <- quantile(v.mdiff, probs = c(0.025, 0.975))
print(CI.95)
    2.5%    97.5% 
0.268875 1.230000