1 Probability theory

Learning outcomes

  • understand the concept of random variables
  • understand the concept of probability
  • understand and learn to use resampling to compute probabilities
  • understand the concept probability mass function
  • understand the concept probability density function
  • understand the concept cumulative distribution functions
  • use normal distribution
  • understand the central limit theorem

1.1 Introduction to probability

Some things are more likely to occur than others. Compare:

  • the chance of the sun rising tomorrow with the chance that no-one is infected with COVID-19 tomorrow
  • the chance of a cold dark winter in Stockholm with the chance of no rainy days over the summer months in Stockholm

We intuitively believe that the chance of sun rising or dark winter occurring are enormously higher than COVID-19 disappearing over night or having no rain over the entire summer. Probability gives us a scale for measuring the likeliness of events to occur. Probability rules enable us to reason about uncertain events. The probability rules are expressed in terms of sets, a well-defined collection of distinct objects.

Suppose we perform an experiment that we do not know the outcome of, i.e. we are uncertain about the outcome. We can however list all the outcomes that might occur.

  • sample space is the set \(S\) of these possible outcomes of the experiment, e.g. getting 1, 2 etc. on the 6-sided dice \(S=\{1,2,3,4,5,6\}\)
  • an event is a subset of the sample space
  • an event is said to occur if the outcome of the experiment belong to this set
  • \(E'\)complement of the event \(E\) contains all the outcomes in \(S\) which are not in \(E\)
  • \(E \cap F = \emptyset\) are said to be disjoint

1.1.1 Axioms of probability

  1. \(0 \leq P(E) \leq 1\) for any event \(E \subseteq S\)
  2. \(P(S) = 1\)
  3. if \(E\), \(F\) are disjoint events, then \(P(E \cup F) = P(E) + P(F)\)

1.1.2 Common rules and theorems

  • probability of the complement: let \(E \subseteq S\) be any event, then \(P(E') = 1 - P(E)\)
  • probability of the impossible event: \(P(\emptyset)=0\)
  • inequality of the probability of a subset: let \(E,F \subseteq S\) be events such that \(E \subseteq F\) then \(P(F) \geq P(E)\)
  • the probability of a union: let \(E,F \subseteq S\) be any two events, then \(P(E \cup F) = P(E) + P(F) - P(E \cap F)\)

1.1.3 Conditional probability

  • let \(E,F \subseteq S\) be two events that \(P(E)>0\) then the conditional probability of \(F\) given that \(E\) occurs is defined to be: \[P(F|E) = \frac{P(E\cap F)}{P(E)}\]

Product rule follows conditional probability: let \(E,F \subseteq S\) be events such that \(P(E)>0\) then: \[P(E \cap F) = P(F|E)P(E)\]

1.1.4 Random variables

The outcome of a random experiment can be described by a random variable.

Whenever chance is involved in the outcome of an experiment the outcome is a random variable.

A random variable can not be predicted exactly, but the probability of all possible outcomes can be described.

A random variable is usually denoted by a capital letter, \(X, Y, Z, \dots\). Values collected in an experiment are observations of the random variable, usually denoted by lowercase letters \(x, y, z, \dots\).

The population is the collection of all possible observations of the random variable. Note, the population is not always countable.

A sample is a subset of the population.

Example random variables and probabilites:

  • The weight of a random newborn baby, \(W\). \(P(W>4.0kg)\)
  • The smoking status of a random mother, \(S\). \(P(S=1)\)
  • The hemoglobin concentration in blood, \(Hb\). \(P(Hb<125 g/L)\)
  • The number of mutations in a gene
  • BMI of a random man
  • Weight status of a random man (underweight, normal weight, overweight, obese)
  • The result of throwing a die

Conditional probability can be written for example \(P(W \geq 3.5 | S = 1)\), which is the probability that \(X \geq 3.5\) if \(S = 1\), in words “the probability that a smoking mother has a baby with birth weight of 3.5 kg or more”.

1.1.5 The urn model

The urn model is a simple model commonly used in statistics and probability. In the urn model real objects (such as people, mice, cells, genes, molecules, etc) are represented by balls of different colors. A fair coin can be represented by an urn with two balls representing the coins two sides. A group of people can be modelled in an urn model, if age is the variable of interest, we write the age of each person on the balls. If we instead are interested in if the people are allergic to pollen or not, we color the balls according to allergy status.

Urn models of a fair coin, age of a group of people, pollen allergy status of a group of people.Urn models of a fair coin, age of a group of people, pollen allergy status of a group of people.Urn models of a fair coin, age of a group of people, pollen allergy status of a group of people.

Figure 1.1: Urn models of a fair coin, age of a group of people, pollen allergy status of a group of people.

By drawing balls from the urn with (or without) replacement probabilities and other properties of the model can be inferred.

1.2 Discrete random variables

A discrete random number has countable number of outcome values, such as {1,2,3,4,5,6}; {red, blue, green}; {tiny, small, average, large, huge} or all integers.

A discrete random variable can be described by its probability mass function, pmf.

The probability that the random variable, \(X\), takes the value \(x\) is denoted \(P(X=x) = p(x)\). Note that:

  1. \(0 \leq p(x) \leq 1\), a probability is always between 0 and 1.
  2. \(\sum p(x) = 1\), the sum over all possible outcomes is 1.
Example 1.1 (The number of dots on a die) When rolling a die the there are six possible outcomes; 1, 2, 3, 4, 5 and 6, each of which have the same probability, if the die is fair. The outcome of one dice roll can be described by a random variable \(X\). The probability of a particular outcome \(x\) is denoted \(P(X=x)\) or \(p(x)\).

The probability mass function of a fair six-sided die can be summarized in a table;

x 1.00 2.00 3.00 4.00 5.00 6.00
p(x) 0.17 0.17 0.17 0.17 0.17 0.17

or in a barplot;

Probability mass function of a die.

Figure 1.2: Probability mass function of a die.

non-smoker smoker
x 0 1
p(x) 0.61 0.39
Example 1.2 (CFU) The number of bacterial colonies on a plate is a random number.
Probability mass distribution of the number of bacterial colonies on an agar plate.

Figure 1.3: Probability mass distribution of the number of bacterial colonies on an agar plate.

1.2.1 Expected value

When the probability mass function is know the expected value of the random variable can be computed.

\[E[X] = \mu = \sum_{i=1}^N x_i p(x_i)\] For a uniform distribution, where every object has the same probability (in the urn model, every object is represented by one ball), the expected value can be computed as the sum of all objects divided by the total number of objects;

\[E[X] = \mu = \frac{1}{N}\sum_{i=1}^N x_i\]

1.2.2 Variance

The variance is a measure of spread and is defined as the expected value of the squared distance from the population mean;

\[var(X) = \sigma^2 = E[(X-\mu)^2] = \sum_{i=1}^n (x_i-\mu)^2 p(x_i)\]

1.2.3 Simulate distributions

Once the distribution is known, we can compute probabilities, such as \(P(X=x), P(X<x)\) and \(P(X \geq x)\). If the distribution is not known, simulation might be the solution.

Example 1.3 (Simulate coin toss) In a single coin toss the probabity of heads is 0.5. In 20 coin tosses, what is the probability of at least 15 heads?

The outcome of a single coin toss is a random variable, \(X\) with two possible outcomes \(\{H, T\}\). We know that \(P(X=H) = 0.5\). The random variable of interest is the number of heads in 20 coin tosses, \(Y\). The probability that we need to compute is \(P(Y \geq 15)\).

A coin toss. Urn model with one black ball (heads) and one white ball (tails).

Figure 1.4: A coin toss. Urn model with one black ball (heads) and one white ball (tails).

A single coin toss can be modelled by an urn with two balls. When a ball is drawn randomly from the urn, the probability to get the black ball (heads) is \(P(X=H) = 0.5\).

If we want to simulate tossing 20 coins (or one coin 20 times) we can use the same urn model, if the ball is replaced after each draw.

In R we can simulate random draws from an urn model using the function sample.

## [1] "H"
## [1] "H"

Every time you run sample a new coin toss is simulated.

The argument size tells the function how many balls we want to draw from the urn. To draw 20 balls from the urn, set size=20, remember to replace the ball after each draw!

##  [1] "T" "T" "H" "T" "H" "T" "H" "H" "H" "T" "H" "H" "H" "T" "T" "H" "T" "T" "H"
## [20] "H"

How many heads did we get in the 20 random draws?

## [1] 11

We can repeat this experiment (toss 20 coins and count the number of heads) several times to estimate the distribution of number of heads in 20 coin tosses.

To do the same thing several times we use the function replicate.

To simulate tossing 20 coins and counting the number of heads 10000 times, do the following;

Plot distribution of the number of heads in a histogram.

Now, let’s get back to the question; when tossing 20 coins, what is the probability of at least 15 heads?

\(P(X \geq 15)\)

Count how many times out of our 10000 exeriments the number is 15 or greater

## [1] 185

From this we conclude that

\(P(X \geq 15) =\) 185/10000 = 0.02

1.2.4 Parametric discrete distributions

1.2.4.1 Bernoulli trial

A Bernoulli trial is a random experiment with two outcomes; success and failure. The probability of success, \(P(success) = p\), is constant. The probability of failure is \(P(failure) = 1-p\).

When coding it is convenient to code success as 1 and failure as 0.

The outcome of a Bernoulli trial is a discrete random variable, \(X\).

\[p(x) = \left\{ \begin{array}{ll} p & \mathrm{if}\,x=1\mathrm,\,success\\ 1-p & \mathrm{if}\,x=0\mathrm,\,failure \end{array} \right.\]

Using the definitions of expected value and variance it can be shown that;

\[E[X] = p\\ var(X) = p(1-p)\]

1.2.4.2 Binomial distribution

The number of successes in a series of independent and identical Bernoulli trials is a discrete random variable, \(X\).

\(X = \sum_{i=0}^n Z_i,\)

where all \(Z_i\) describe the outcome of independent and identical Bernoilli trials with probability \(p\) for success (\(P(Z_i=1) = p\)).

The probability mass function of \(X\) is called the binomial distribution. In short we use the notation;

\[X \in Bin(n, p)\]

The probability mass function is

\[P(X=k) = {n \choose k} p^k (1-p)^{n-k}\] It can be shown that

\[E[X] = np\\ var(X) = np(1-p)\]

The binomial distribution occurs when sampling \(n\) objects with replacement from an urn with objects of two types, of which the interesting type has probability \(p\).

The probability mass function, \(P(X=k)\) can be computed using the R function dbinom and the cumulative distribution function \(P(X \leq k)\) can be computed using pbinom.

1.2.4.3 Hypergeometric distribution

The hypergeometric distribution occurs when sampling \(n\) objects without replacement from an urn with \(N\) objects of two types, of which the interesting type has probability \(p\).

The probability density function

\[P(X=k) = \frac{{Np \choose x} {N-Np \choose n-x}}{N \choose n}\] can be computed in R using dhyper and the cumulative distribution function \(P(X \leq k)\) can be computed using phyper.

1.2.4.4 Poisson distribution

The Poisson distribution describe the number of times a rare event occurs in a large number of trials.

A rare disease has a very low probability for a single individual. The number of individuals in a large population that catch the disease in a certain time period can be modelled using the Poisson distribution.

The probability mass function;

\[P(X=k) = \frac{\mu}{k!}e^{-\mu},\] where \(\mu\) is the expected value, which is \(\mu = n \pi\), where \(n\) is the number of objects sampled from the population and \(\pi\) is the probability of a single object.

The Poisson distribution can approximate the binomial distribution if \(n\) is large (\(n>10\)) and \(\pi\) is small (\(\pi < 0.1\)).

1.2.4.5 Distributions in R

Probability mass functions, \(P(X=x)\), for the binomial, hypergeometric and Poisson distributions can in R can be computed using functions dbinom, dhyper, and dpois, respectively.

Cumulative distribution functions, \(P(X \leq x)\) can be computed using pbinom, phyper and ppois.

Also, functions for computing an \(x\) such that \(P(X \leq x) = q\), where \(q\) is a probability of interest are available using qbinom, qhyper, and qpois.

Exercises: Discrete random variables

Introduction to probability

Exercise 1.1 When tossing a fair coin a) what is the probability of heads? b) what is the probability of tails?
Exercise 1.2 When tossing a fair six-sided die a) what is the probability of getting 6? b) what is the probability of an even number? c) what is the probability of getting 3 or more? d) what is the expected value of dots on the die´.

Simulation

Exercise 1.3 In a single coin toss the probability of heads is 0.5.

In 20 coin tosses,

  1. what is the probability of exactly 15 heads?
  2. what is the probability of less than 7 heads?
  3. What is the most probable number of heads?
  4. what is the probability of 5 tails or less?
  5. what is the probability of 2 heads or less?

Exercise 1.4 When rolling 10 six-sided dice, study the number of sixes.

  1. Define the random variable of interest
  2. What are the possible outcomes?
  3. Using simulation, estimate the probability mass function
  4. what is the probability to get at least 5 sixes?
  5. Which is the most likely number of sixes?
  6. What is the probability to get exactly 2 sixes?
  7. On average how many sixes do you get when throwing ten dice?
Exercise 1.5 A deck of cards consists of 52 cards; 13 diamonds, 13 spades, 13 hearts and 13 clubs. When five cards are randomly selected (a poker hand), what is the probability of getting all hearts?
Exercise 1.6 30% of a large population is allergic to pollen. If you randomly select 3 people to participate in your study, what is the probability than none of them will be allergic to pollen?
Exercise 1.7 In a class of 20 students, 6 are allergic to pollen. If you randomly select 3 of the students to participate in your study, what is the probability than none of them will be allergic to pollen?
Exercise 1.8 Of the 200 persons working at a company, 60 are allergic to pollen. If you randomly select 3 people to participate in your study, what is the probability that none of them are allergic to pollen?
Exercise 1.9 Compare your results in 1.6, 1.7 and 1.8. Did you get the same results? Why/why not?

Parametric discrete distributions

Exercise 1.10 Do exercises 1.6, 1.7 and 1.8 again, but using parametric distributions. Compare your results.

Exercise 1.11 (Gene set enrichment analysis) You have analyzed 20000 genes and a bioinformatician you are collaborating with has sent you a list of 1000 genes that she says are important. You are interested in a particular pathway A. 200 genes in pathway A are represented among the 20000 genes, 20 of these are in the bioinformaticians important list.

If the bioinformatician selected the 1000 genes at random, what is the probability to see 20 or more genes from pathway A in this list?

Conditional probability

Exercise 1.12 Diagnostic tests
A study of a method for cancer prediction is evaluated. Based on the results in the below table estimate;
pos neg tot
not cancer 98 882 980
cancer 16 4 20
total 114 886 1000
  1. the probability of a positive test result from a person with cancer?
  2. the probability of a negative test result from a person without cancer?
  3. the probability of having cancer, if the test is positive?
  4. the probability of not having cancer, if the test is negative?

  5. Connect the four computed probabilities with the following four terms;

  • Sensitivity
  • Specificity
  • Positive predictive value (PPV)
  • Negative predictive value (NPV)

Solutions: Discrete random variables

1.1

  1. 0.5
  2. 0.5

1.2

  1. \(\frac{1}{6}\)
  2. \(\frac{1}{2}\)
  3. \(\frac{2}{3}\)
  4. 3.5

1.3

  1. 0.015
  2. 0.058
  3. 10
  4. 0.020
  5. 0.0002

1.4

  1. \(X\), the number of sixes when rolling 10 dice.
  2. \({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}\)
## N
##     0     1     2     3     4     5     6     7     8 
## 16036 32311 29260 15481  5322  1353   212    23     2

  1. What is the probability to get exactly 2 sixes?
  2. On average how many sixes do you get when throwing ten dice?
  3. 0.015
## [1] 1590
## [1] 0.016
## [1] 0.016
  1. 1
  2. 0.29
## [1] 0.29
  1. 1.7
## [1] 1.7
## [1] 1.7

1.5

0.00049

## [1] 0.00044

1.6

## x
##  0  1  2  3 
## 29 45 24  2
## [1] 0.29
## x
##   0   1   2   3 
## 363 441 161  35
## [1] 0.36
## x
##     0     1     2     3 
## 34465 44125 18752  2658
## [1] 0.34
## [1] 0.34

1.7

## x
##     0     1     2     3 
## 31778 47960 18512  1750
## [1] 0.32

1.8

## x
##     0     1     2     3 
## 34463 44211 18648  2678
## [1] 0.34

Parametric discrete distributions

1.11

## [1] 0.0011

Conditional probability

1.12

  1. \(P(pos|cancer) = 16/20\), sensitivity
  2. \(P(neg|not cancer) = 882/980\), specificity
  3. \(P(cancer|pos) = 16/114\), PPV
  4. \(P(not cancer|neg) = 882/886\), NPV

1.3 Continuous random variable

A continuous random number is not limited to discrete values, but any continuous number within one or several ranges is possible.

Examples: weight, height, speed, intensity, …

A continuous random variable can be described by its probability density function, pdf.

Probability density function of the weight of a newborn baby.

Figure 1.5: Probability density function of the weight of a newborn baby.

The probability density function, \(f(x)\), is defined such that the total area under the curve is 1.

\[ \int_{-\infty}^{\infty} f(x) dx = 1 \]

The area under the curve from a to b is the probability that the random variable \(X\) takes a value between a and b.

\(P(a \leq X \leq b) = \int_a^b f(x) dx\)

The cumulative distribution function, cdf, sometimes called just the distribution function, \(F(x)\), is defined as:

\[F(x) = P(X \leq x) = \int_{-\infty}^x f(x) dx\]

\[P(X \leq x) = F(x)\]

As we know that the total probability (over all x) is 1, we can conclude that

\[P(X > x) = 1 - F(x)\] and thus

\[P(a < X \leq b) = F(b) - F(a)\]

1.3.1 Parametric continuous distributions

Two important parameters of a distribution is the expected value, \(\mu\), that describe the distributions location and the variance, \(\sigma^2\), that the spread.

The expected value, or population mean, is defined as;

\[E[X] = \mu = \int_{-\infty}^\infty x f(x) dx\] We will learn more about the expected value and how to estimate a population mean from a sample later in the course.

The variance is defined as the expected value of the squared distance from the population mean;

\[\sigma^2 = E[(X-\mu)^2] = \int_{-\infty}^\infty (x-\mu)^2 f(x) dx\]

The square root of the variance is called the standard deviation, \(\sigma\).

1.3.2 Normal distribution

The normal distribution (sometimes referred to as the Gaussian distribution) is a common probability distribution and many continuous random variables can be described by the normal distribution or be approximated by the normal distribution.

The normal probability density function

\[f(x) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2}\]

describes the distribution of a normal random variable, \(X\), with expected value \(\mu\) and standard deviation \(\sigma\), \(e\) and \(\pi\) are two common mathematical constants, \(e \approx 2.71828\) and \(\pi \approx 3.14159\).

In short we write \(X \sim N(\mu, \sigma)\).

The bell-shaped normal distributions is symmetric around \(\mu\) and \(f(x) \rightarrow 0\) as \(x \rightarrow \infty\) and as \(x \rightarrow -\infty\).

As \(f(x)\) is well defined, values for the cumulative distribution function \(F(x) = \int_{- \infty}^x f(x) dx\) can be computed.

If \(X\) is normally distributed with expected value \(\mu\) and standard deviation \(\sigma\) we write:

\[X \sim N(\mu, \sigma)\]

Using transformation rules we can define

\[Z = \frac{X-\mu}{\sigma}, \, Z \sim N(0,1)\]

Values for the cumulative standard normal distribution, \(F(z)\), are tabulated and easy to compute in R using the function pnorm.

The shaded area under hte curve is the tabulated value $P(Z \leq z) = F(z)$.

Figure 1.6: The shaded area under hte curve is the tabulated value \(P(Z \leq z) = F(z)\).

Table 1.1: Normal distribution. The table gives \(F(z) = P(Z \leq z)\) for \(Z \in N(0,1)\).
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Some value of particular interest:

\[F(1.64) = 0.95\\ F(1.96) = 0.975\]

As the normal distribution is symmetric F(-z) = 1 - F(z)

\[F(-1.64) = 0.05\\ F(-1.96) = 0.025\]

\[P(-1.96 < Z < 1.96) = 0.95\]

1.3.2.1 Sum of two normal random variables

If \(X \sim N(\mu_1, \sigma_1)\) and \(Y \sim N(\mu_2, \sigma_2)\) are two independent normal random variables, then their sum is also a random variable:

\[X + Y \sim N(\mu_1 + \mu_2, \sqrt{\sigma_1^2 + \sigma_2^2})\]

and

\[X - Y \sim N(\mu_1 - \mu_2, \sqrt{\sigma_1^2 + \sigma_2^2})\] This can be extended to the case with \(n\) independent and identically distributed random varibles \(X_i\) (\(i=1 \dots n\)). If all \(X_i\) are normally distributed with mean \(\mu\) and standard deviation \(\sigma\), \(X_i \in N(\mu, \sigma)\), then the sum of all \(n\) random variables will also be normally distributed with mean \(n\mu\) and standard deviation \(\sqrt{n} \sigma\).

1.3.3 Central limit theorem

Theorem 1.1 The sum of \(n\) independent and equally distributed random variables is normally distributed, if \(n\) is large enough.

As a result of central limit theorem, the distribution of fractions or mean values of a sample follow the normal distribution, at least if the sample is large enough (a rule of thumb is that the sample size \(n>30\)).

Example 1.4 (Mean BMI) In a population of 252 men we can study the distribution of BMI.

## [1] 25
## [1] 13
## [1] 3.6

Randomly sample 3, 5, 10, 15, 20, 30 men and compute the mean value, \(m\). Repeat many times to get the distribution of mean values.

Note, mean is just the sum divided by the number of samples \(n\).

1.3.4 \(\chi^2\)-distribution

The random variable \(Y = \sum_{i=1}^n X_i^2\) is \(\chi^2\) distributed with \(n-1\) degrees of freedom, if \(X_i\) are independent identically distributed random variables \(X_i \in N(0,1)\).

In short \(Y \in \chi^2(n-1)\).

The $\chi^2$-distribution.

Figure 1.7: The \(\chi^2\)-distribution.

1.3.5 F-distribution

The ratio of two \(\chi^2\)-distributed variables divided by their degrees of freedom is F-distributed

The F-distribution

Figure 1.8: The F-distribution

Example 1.5 The ratio of two sample variances is F-distributed

1.3.6 t-distribution

The ratio of a normally distributed variable and a \(\chi^2\)-distributed variable is t-distributed.

The t-distribution.

Figure 1.9: The t-distribution.

Example 1.6 (t-distribution) The ratio between sample mean and sample variance is t-distributed.

1.3.7 Distributions in R

Probability density functions for the normal, t, \(\chi^2\) and F distributions can in R can be computed using functions dnorm, dt, dchisq, and df, respectively.

Cumulative distribution functions can be computed using pnorm, pt, pchisq and pf.

Also, functions for computing an \(x\) such that \(P(X<x) = q\), where \(q\) is a probability of interest are available using qnorm, qt, qchisq and qf.

Exercises: Continuous random variables

Exercise 1.13 (Exercise on using the normal table) Let \(Z \sim N(0,1)\) be a standard normal random variable, and compute;

  1. \(P(Z<1.64)\)
  2. \(P(Z>-1.64)\)
  3. \(P(-1.96<Z)\)
  4. \(P(Z<2.36)\)
  5. An \(a\) such that \(P(Z<a) = 0.95\)
  6. An \(b\) such that \(P(Z>b) = 0.975\)
Note, this ecercise can be solved using the standard normal table or using the R functions pnormand qnorm.

Exercise 1.14 (Exercise in standardization/transformation) If \(X \sim N(3,2)\), compute the probabilities

  1. \(P(X<5)\)
  2. \(P(3<X<5)\)
  3. \(P(X \geq 7)\)
Exercise 1.15 The hemoglobin (Hb) value in a male population is normally distributed woth mean 188 g/L and standard deviation 14 g/L. a) Men with Hb below 158 g/L are considered anemic. Waht is the probability of a random man being anemic? b) When randomly selecting 10 men from the population, what is the probability that none of them are anemic?

Exercise 1.16 A rare disease affects 3 in 100000 in a large population. If 10000 people are randomly selected from the population, what is the probability

  1. that no one in the sample is affected?
  2. that at least two in the sample are affected?
Exercise 1.17 A drug company is producing a pill, with on average 12 mg of active substance. The amount of active substance is normally distributed with mean 12 mg and standard deviation 0.5 mg, if the production is without problems. Sometimes there is a problem with the production and the amount of active substance will be too high or too low, in which case the pill has to be discarded. What should the upper and lower critical values (limits for when a pill is acceptable) be in order not to discard more than 1/20 pills from a problem free production?

Solutions

1.13

  1. 0.9495
  2. 0.9495
  3. 0.975
  4. 0.9909
  5. 1.64
  6. -1.96

1.14

  1. 0.84
  2. 0.34
  3. 0.023

1.15

  1. 0.016
  2. 0.85

1.16 a)

## [1] 0.74
## [1] 0.037

```