Abbreviation | Description |
---|---|
id | Subject ID |
chol | Total Cholesterol [mg/dL] |
stab.glu | Stabilize Glucose [mg/dL] |
hdl | High Density Lipoprotein [mg/dL] |
ratio | Cholesterol / HDL Ratio |
glyhb | Glycosolated Hemoglobin [%] |
location | County: Buckingham or Louisa |
age | age [years] |
gender | gender |
height | height [in] |
weight | weight [lb] |
frame | frame: small, medium or large |
bp.1s | First Systolic Blood Pressure |
bp.1d | First Diastolic Blood Pressure |
bp.2s | Second Systolic Blood Pressure |
bp.2d | Second Diastolic Blood Pressure |
waist | waist [in] |
hip | hip [in] |
time.ppn | Postprandial Time [min] when labs were drawn |
3 Diabetes data
Before we continue with the descriptive statistics, let’s introduce an example data set. We will be looking at the data collected in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia, USA.
The data is available as part of faraway
package. 403 African Americans were interviewed in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia. Available variables include:
And the first few observations are:
Code
#| echo: false
#| warning: false
#| message: false
#| include: true
library(tidyverse)
library(kableExtra)
library(faraway)
# preview
glimpse(diabetes)
Rows: 403
Columns: 19
$ id <labelled> 1000, 1001, 1002, 1003, 1005, 1008, 1011, 1015, 1016, 10…
$ chol <labelled> 203, 165, 228, 78, 249, 248, 195, 227, 177, 263, 242, 21…
$ stab.glu <labelled> 82, 97, 92, 93, 90, 94, 92, 75, 87, 89, 82, 128, 75, 79,…
$ hdl <labelled> 56, 24, 37, 12, 28, 69, 41, 44, 49, 40, 54, 34, 36, 46, …
$ ratio <labelled> 3.6, 6.9, 6.2, 6.5, 8.9, 3.6, 4.8, 5.2, 3.6, 6.6, 4.5, 6…
$ glyhb <labelled> 4.31, 4.44, 4.64, 4.63, 7.72, 4.81, 4.84, 3.94, 4.84, 5.…
$ location <fct> Buckingham, Buckingham, Buckingham, Buckingham, Buckingham, B…
$ age <int> 46, 29, 58, 67, 64, 34, 30, 37, 45, 55, 60, 38, 27, 40, 36, 3…
$ gender <fct> female, female, female, male, male, male, male, male, male, f…
$ height <int> 62, 64, 61, 67, 68, 71, 69, 59, 69, 63, 65, 58, 60, 59, 69, 6…
$ weight <int> 121, 218, 256, 119, 183, 190, 191, 170, 166, 202, 156, 195, 1…
$ frame <fct> medium, large, large, large, medium, large, medium, medium, l…
$ bp.1s <labelled> 118, 112, 190, 110, 138, 132, 161, NA, 160, 108, 130, 10…
$ bp.1d <labelled> 59, 68, 92, 50, 80, 86, 112, NA, 80, 72, 90, 68, 80, NA,…
$ bp.2s <labelled> NA, NA, 185, NA, NA, NA, 161, NA, 128, NA, 130, NA, NA, …
$ bp.2d <labelled> NA, NA, 92, NA, NA, NA, 112, NA, 86, NA, 90, NA, NA, NA,…
$ waist <int> 29, 46, 49, 33, 44, 36, 46, 34, 34, 45, 39, 42, 35, 37, 36, 3…
$ hip <int> 38, 48, 57, 38, 41, 42, 49, 39, 40, 50, 45, 50, 41, 43, 40, 4…
$ time.ppn <labelled> 720, 360, 180, 480, 300, 195, 720, 1020, 300, 240, 300, …
Further:
- Glycosolated hemoglobin greater than 7.0 is usually taken as a positive diagnosis of diabetes.
- We can calculate BMI as \(BMI = 703 \times (weight \; [lb] \; / (height \;[in])^2)\) and define obesity as \(BMI \ge 30\).
- Alternatively, we can first convert pounds (lb) to kilograms (kg) by multiplying by 0.45 and inches (in) to meters (m) by multiplying by 0.0254 and then calculating BMI as \(BMI = (weight \; [kg] \; / (height \;[m])^2)\)
In R
we can add diabetes and obesity status (yes/no) and display first few measurements across all the variables as below:
Code
# add obesity and diabetes variables
<- 2.54/100
inch2m <- 0.45
pound2kg <- diabetes %>%
data_diabetes mutate(height = height * inch2m, height = round(height, 2)) %>%
mutate(waist = waist * inch2m) %>%
mutate(weight = weight * pound2kg, weight = round(weight, 2)) %>%
mutate(BMI = weight / height^2, BMI = round(BMI, 2)) %>%
mutate(obese= cut(BMI, breaks = c(0, 29.9, 100), labels = c("No", "Yes"))) %>%
mutate(diabetic = ifelse(glyhb > 7, "Yes", "No"), diabetic = factor(diabetic, levels = c("No", "Yes")))
# preview data
glimpse(data_diabetes)
Rows: 403
Columns: 22
$ id <labelled> 1000, 1001, 1002, 1003, 1005, 1008, 1011, 1015, 1016, 10…
$ chol <labelled> 203, 165, 228, 78, 249, 248, 195, 227, 177, 263, 242, 21…
$ stab.glu <labelled> 82, 97, 92, 93, 90, 94, 92, 75, 87, 89, 82, 128, 75, 79,…
$ hdl <labelled> 56, 24, 37, 12, 28, 69, 41, 44, 49, 40, 54, 34, 36, 46, …
$ ratio <labelled> 3.6, 6.9, 6.2, 6.5, 8.9, 3.6, 4.8, 5.2, 3.6, 6.6, 4.5, 6…
$ glyhb <labelled> 4.31, 4.44, 4.64, 4.63, 7.72, 4.81, 4.84, 3.94, 4.84, 5.…
$ location <fct> Buckingham, Buckingham, Buckingham, Buckingham, Buckingham, B…
$ age <int> 46, 29, 58, 67, 64, 34, 30, 37, 45, 55, 60, 38, 27, 40, 36, 3…
$ gender <fct> female, female, female, male, male, male, male, male, male, f…
$ height <dbl> 1.57, 1.63, 1.55, 1.70, 1.73, 1.80, 1.75, 1.50, 1.75, 1.60, 1…
$ weight <dbl> 54.45, 98.10, 115.20, 53.55, 82.35, 85.50, 85.95, 76.50, 74.7…
$ frame <fct> medium, large, large, large, medium, large, medium, medium, l…
$ bp.1s <labelled> 118, 112, 190, 110, 138, 132, 161, NA, 160, 108, 130, 10…
$ bp.1d <labelled> 59, 68, 92, 50, 80, 86, 112, NA, 80, 72, 90, 68, 80, NA,…
$ bp.2s <labelled> NA, NA, 185, NA, NA, NA, 161, NA, 128, NA, 130, NA, NA, …
$ bp.2d <labelled> NA, NA, 92, NA, NA, NA, 112, NA, 86, NA, 90, NA, NA, NA,…
$ waist <dbl> 0.7366, 1.1684, 1.2446, 0.8382, 1.1176, 0.9144, 1.1684, 0.863…
$ hip <int> 38, 48, 57, 38, 41, 42, 49, 39, 40, 50, 45, 50, 41, 43, 40, 4…
$ time.ppn <labelled> 720, 360, 180, 480, 300, 195, 720, 1020, 300, 240, 300, …
$ BMI <dbl> 22.09, 36.92, 47.95, 18.53, 27.52, 26.39, 28.07, 34.00, 24.39…
$ obese <fct> No, Yes, Yes, No, No, No, No, Yes, No, Yes, No, Yes, Yes, Yes…
$ diabetic <fct> No, No, No, No, Yes, No, No, No, No, No, No, No, No, No, No, …
We can now use descriptive statistics to understand the diabetes data set more.