Exercises: Hypothesis tests, resampling
Exercise 1 (Pollen) You believe that the proportion of Swedish students allergic to pollen is greater than 0.3 (the proportion allergic to pollen in Sweden). To test this you observe 20 people in a student group at BMC in Uppsala, 9 or them are allergic to pollen.
Is this reason to believe that the proportion of Swedish students allergic to pollen i greater than 0.3? Perform a hypothesis test to answer the question.
Can you identify any problems with this study setup?
\(H_0: \pi=0.3\) \(H_1: \pi>0.3\)
Set the significance level to \(\alpha=0.05\).
Test statistic, \(X\), the number of allergic people in a sample of size 20.
\(x_{obs} = 9\)
Simulate null distribution
Compute p-value, i.e. if null is true what is the probability to observe \(x_{obs}\) or higher?
<- 9
xobs <- mean(xnull>=xobs)) (x
[1] 0.1119
As \(p>\alpha\) we will accept the null hypothesis, i.e. there is no reason to belive that the students are more allergic than the general Swedish population.
Problems with the study: Discuss in your group! Is it reasonable to select 20 students at BMC to answer a question about all students in Sweden?
Exercise 2 (Diet) A diet study aims to study how the hemoglobin (Hb) levels in blood are affected by an iron-rich diet consisting of tofu, soybeans, broccoli, lentils and peas. To perform the study the dietician has recruited 40 male participants, who are randomly assigned to the iron-rich diet or control group (no change in participants diet), 20 participant in each group.
The observed Hb levels (in g/L);
<- c(197, 186, 157, 170, 193, 188, 175, 186, 177, 191, 168, 193, 191, 189, 188, 192, 179, 186, 197, 203)
ctrl <- c(187, 218, 196, 210, 206, 178, 181, 193, 172, 202, 169, 221, 183, 222, 185, 174, 192, 192, 162, 211) iron
Perform a hypothesis test to investigate if the Hb level is affected (increased or decreased) by the iron-rich diet.
Define \(H_0\) and \(H_1\)
\(H_0: \mu_{diet} = \mu_{ctrl}\) No difference in mean iron level between control group and iron rich group
\(H_1: \mu_{diet} \neq \mu_{ctrl}\)
Will use the significance level, \(\alpha=0.05\)
Select test statistic \(D = \bar X_d - \bar X_c\), where \(\bar X_c\) is the mean Hb level in a control group of 20 people and \(\bar X_d\) is the mean Hb level in a diet group of 20 people.
The observed value; \(d_{obs}\)
<- mean(iron)
mdiet <- mean(ctrl)
mctrl <- mdiet - mctrl) (dobs
[1] 7.4
Compute null distribution using permutation.
## Under null all observations are equivalent
<- c(iron, ctrl)
allobs <- replicate(10000, {
dnull ##Permute the 40 observations and assign the 20 first to the iron group
<- sample(allobs)
x <- mean(x[1:20]) - mean(x[21:40])
d
})hist(dnull)
Compute p-value;
<- mean(abs(dnull) >= abs(dobs))) (p
[1] 0.1257
As \(p>\alpha\), the null hypothesis is accepted, i.e. there is no reason to believe that the iron-rich diet affects the blood Hb level.