3  Diabetes data

Before we continue with the descriptive statistics, let’s introduce an example data set. We will be looking at the data collected in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia, USA.

The data is available as part of faraway package. 403 African Americans were interviewed in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia. Available variables include:

Abbreviation Description
id Subject ID
chol Total Cholesterol [mg/dL]
stab.glu Stabilize Glucose [mg/dL]
hdl High Density Lipoprotein [mg/dL]
ratio Cholesterol / HDL Ratio
glyhb Glycosolated Hemoglobin [%]
location County: Buckingham or Louisa
age age [years]
gender gender
height height [in]
weight weight [lb]
frame frame: small, medium or large
bp.1s First Systolic Blood Pressure
bp.1d First Diastolic Blood Pressure
bp.2s Second Systolic Blood Pressure
bp.2d Second Diastolic Blood Pressure
waist waist [in]
hip hip [in]
time.ppn Postprandial Time [min] when labs were drawn

And the first few observations are:

Code
#| echo: false
#| warning: false
#| message: false
#| include: true

library(tidyverse)
library(kableExtra)
library(faraway)

# preview
glimpse(diabetes)
Rows: 403
Columns: 19
$ id       <labelled> 1000, 1001, 1002, 1003, 1005, 1008, 1011, 1015, 1016, 10…
$ chol     <labelled> 203, 165, 228, 78, 249, 248, 195, 227, 177, 263, 242, 21…
$ stab.glu <labelled> 82, 97, 92, 93, 90, 94, 92, 75, 87, 89, 82, 128, 75, 79,…
$ hdl      <labelled> 56, 24, 37, 12, 28, 69, 41, 44, 49, 40, 54, 34, 36, 46, …
$ ratio    <labelled> 3.6, 6.9, 6.2, 6.5, 8.9, 3.6, 4.8, 5.2, 3.6, 6.6, 4.5, 6…
$ glyhb    <labelled> 4.31, 4.44, 4.64, 4.63, 7.72, 4.81, 4.84, 3.94, 4.84, 5.…
$ location <fct> Buckingham, Buckingham, Buckingham, Buckingham, Buckingham, B…
$ age      <int> 46, 29, 58, 67, 64, 34, 30, 37, 45, 55, 60, 38, 27, 40, 36, 3…
$ gender   <fct> female, female, female, male, male, male, male, male, male, f…
$ height   <int> 62, 64, 61, 67, 68, 71, 69, 59, 69, 63, 65, 58, 60, 59, 69, 6…
$ weight   <int> 121, 218, 256, 119, 183, 190, 191, 170, 166, 202, 156, 195, 1…
$ frame    <fct> medium, large, large, large, medium, large, medium, medium, l…
$ bp.1s    <labelled> 118, 112, 190, 110, 138, 132, 161, NA, 160, 108, 130, 10…
$ bp.1d    <labelled> 59, 68, 92, 50, 80, 86, 112, NA, 80, 72, 90, 68, 80, NA,…
$ bp.2s    <labelled> NA, NA, 185, NA, NA, NA, 161, NA, 128, NA, 130, NA, NA, …
$ bp.2d    <labelled> NA, NA, 92, NA, NA, NA, 112, NA, 86, NA, 90, NA, NA, NA,…
$ waist    <int> 29, 46, 49, 33, 44, 36, 46, 34, 34, 45, 39, 42, 35, 37, 36, 3…
$ hip      <int> 38, 48, 57, 38, 41, 42, 49, 39, 40, 50, 45, 50, 41, 43, 40, 4…
$ time.ppn <labelled> 720, 360, 180, 480, 300, 195, 720, 1020, 300, 240, 300, …

Further:

In R we can add diabetes and obesity status (yes/no) and display first few measurements across all the variables as below:

Code
# add obesity and diabetes variables
inch2m <- 2.54/100
pound2kg <- 0.45
data_diabetes <- diabetes %>%
  mutate(height  = height * inch2m, height = round(height, 2)) %>% 
  mutate(waist = waist * inch2m) %>%  
  mutate(weight = weight * pound2kg, weight = round(weight, 2)) %>%
  mutate(BMI = weight / height^2, BMI = round(BMI, 2)) %>% 
  mutate(obese= cut(BMI, breaks = c(0, 29.9, 100), labels = c("No", "Yes"))) %>% 
  mutate(diabetic = ifelse(glyhb > 7, "Yes", "No"), diabetic = factor(diabetic, levels = c("No", "Yes")))
  
# preview data
glimpse(data_diabetes)
Rows: 403
Columns: 22
$ id       <labelled> 1000, 1001, 1002, 1003, 1005, 1008, 1011, 1015, 1016, 10…
$ chol     <labelled> 203, 165, 228, 78, 249, 248, 195, 227, 177, 263, 242, 21…
$ stab.glu <labelled> 82, 97, 92, 93, 90, 94, 92, 75, 87, 89, 82, 128, 75, 79,…
$ hdl      <labelled> 56, 24, 37, 12, 28, 69, 41, 44, 49, 40, 54, 34, 36, 46, …
$ ratio    <labelled> 3.6, 6.9, 6.2, 6.5, 8.9, 3.6, 4.8, 5.2, 3.6, 6.6, 4.5, 6…
$ glyhb    <labelled> 4.31, 4.44, 4.64, 4.63, 7.72, 4.81, 4.84, 3.94, 4.84, 5.…
$ location <fct> Buckingham, Buckingham, Buckingham, Buckingham, Buckingham, B…
$ age      <int> 46, 29, 58, 67, 64, 34, 30, 37, 45, 55, 60, 38, 27, 40, 36, 3…
$ gender   <fct> female, female, female, male, male, male, male, male, male, f…
$ height   <dbl> 1.57, 1.63, 1.55, 1.70, 1.73, 1.80, 1.75, 1.50, 1.75, 1.60, 1…
$ weight   <dbl> 54.45, 98.10, 115.20, 53.55, 82.35, 85.50, 85.95, 76.50, 74.7…
$ frame    <fct> medium, large, large, large, medium, large, medium, medium, l…
$ bp.1s    <labelled> 118, 112, 190, 110, 138, 132, 161, NA, 160, 108, 130, 10…
$ bp.1d    <labelled> 59, 68, 92, 50, 80, 86, 112, NA, 80, 72, 90, 68, 80, NA,…
$ bp.2s    <labelled> NA, NA, 185, NA, NA, NA, 161, NA, 128, NA, 130, NA, NA, …
$ bp.2d    <labelled> NA, NA, 92, NA, NA, NA, 112, NA, 86, NA, 90, NA, NA, NA,…
$ waist    <dbl> 0.7366, 1.1684, 1.2446, 0.8382, 1.1176, 0.9144, 1.1684, 0.863…
$ hip      <int> 38, 48, 57, 38, 41, 42, 49, 39, 40, 50, 45, 50, 41, 43, 40, 4…
$ time.ppn <labelled> 720, 360, 180, 480, 300, 195, 720, 1020, 300, 240, 300, …
$ BMI      <dbl> 22.09, 36.92, 47.95, 18.53, 27.52, 26.39, 28.07, 34.00, 24.39…
$ obese    <fct> No, Yes, Yes, No, No, No, No, Yes, No, Yes, No, Yes, Yes, Yes…
$ diabetic <fct> No, No, No, No, Yes, No, No, No, No, No, No, No, No, No, No, …

We can now use descriptive statistics to understand the diabetes data set more.