3 Diabetes data

Before we continue with the descriptive statistics, let’s introduce an example data set. We will be looking at the data collected in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia, USA.

The data is available as part of faraway package. 403 African Americans were interviewed in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia. Available variables include:

Abbreviation	Description
id	Subject ID
chol	Total Cholesterol [mg/dL]
stab.glu	Stabilize Glucose [mg/dL]
hdl	High Density Lipoprotein [mg/dL]
ratio	Cholesterol / HDL Ratio
glyhb	Glycosolated Hemoglobin [%]
location	County: Buckingham or Louisa
age	age [years]
gender	gender
height	height [in]
weight	weight [lb]
frame	frame: small, medium or large
bp.1s	First Systolic Blood Pressure
bp.1d	First Diastolic Blood Pressure
bp.2s	Second Systolic Blood Pressure
bp.2d	Second Diastolic Blood Pressure
waist	waist [in]
hip	hip [in]
time.ppn	Postprandial Time [min] when labs were drawn

And the first few observations are:

Code

#| echo: false
#| warning: false
#| message: false
#| include: true

library(tidyverse)
library(kableExtra)
library(faraway)

# preview
glimpse(diabetes)

Rows: 403
Columns: 19
$ id       <labelled> 1000, 1001, 1002, 1003, 1005, 1008, 1011, 1015, 1016, 10…
$ chol     <labelled> 203, 165, 228, 78, 249, 248, 195, 227, 177, 263, 242, 21…
$ stab.glu <labelled> 82, 97, 92, 93, 90, 94, 92, 75, 87, 89, 82, 128, 75, 79,…
$ hdl      <labelled> 56, 24, 37, 12, 28, 69, 41, 44, 49, 40, 54, 34, 36, 46, …
$ ratio    <labelled> 3.6, 6.9, 6.2, 6.5, 8.9, 3.6, 4.8, 5.2, 3.6, 6.6, 4.5, 6…
$ glyhb    <labelled> 4.31, 4.44, 4.64, 4.63, 7.72, 4.81, 4.84, 3.94, 4.84, 5.…
$ location <fct> Buckingham, Buckingham, Buckingham, Buckingham, Buckingham, B…
$ age      <int> 46, 29, 58, 67, 64, 34, 30, 37, 45, 55, 60, 38, 27, 40, 36, 3…
$ gender   <fct> female, female, female, male, male, male, male, male, male, f…
$ height   <int> 62, 64, 61, 67, 68, 71, 69, 59, 69, 63, 65, 58, 60, 59, 69, 6…
$ weight   <int> 121, 218, 256, 119, 183, 190, 191, 170, 166, 202, 156, 195, 1…
$ frame    <fct> medium, large, large, large, medium, large, medium, medium, l…
$ bp.1s    <labelled> 118, 112, 190, 110, 138, 132, 161, NA, 160, 108, 130, 10…
$ bp.1d    <labelled> 59, 68, 92, 50, 80, 86, 112, NA, 80, 72, 90, 68, 80, NA,…
$ bp.2s    <labelled> NA, NA, 185, NA, NA, NA, 161, NA, 128, NA, 130, NA, NA, …
$ bp.2d    <labelled> NA, NA, 92, NA, NA, NA, 112, NA, 86, NA, 90, NA, NA, NA,…
$ waist    <int> 29, 46, 49, 33, 44, 36, 46, 34, 34, 45, 39, 42, 35, 37, 36, 3…
$ hip      <int> 38, 48, 57, 38, 41, 42, 49, 39, 40, 50, 45, 50, 41, 43, 40, 4…
$ time.ppn <labelled> 720, 360, 180, 480, 300, 195, 720, 1020, 300, 240, 300, …

Further:

Glycosolated hemoglobin greater than 7.0 is usually taken as a positive diagnosis of diabetes.
We can calculate BMI as \(BMI = 703 \times (weight \; [lb] \; / (height \;[in])^2)\) and define obesity as \(BMI \ge 30\).
Alternatively, we can first convert pounds (lb) to kilograms (kg) by multiplying by 0.45 and inches (in) to meters (m) by multiplying by 0.0254 and then calculating BMI as \(BMI = (weight \; [kg] \; / (height \;[m])^2)\)

In R we can add diabetes and obesity status (yes/no) and display first few measurements across all the variables as below:

Code

# add obesity and diabetes variables
inch2m <- 2.54/100
pound2kg <- 0.45
data_diabetes <- diabetes %>%
  mutate(height  = height * inch2m, height = round(height, 2)) %>% 
  mutate(waist = waist * inch2m) %>%  
  mutate(weight = weight * pound2kg, weight = round(weight, 2)) %>%
  mutate(BMI = weight / height^2, BMI = round(BMI, 2)) %>% 
  mutate(obese= cut(BMI, breaks = c(0, 29.9, 100), labels = c("No", "Yes"))) %>% 
  mutate(diabetic = ifelse(glyhb > 7, "Yes", "No"), diabetic = factor(diabetic, levels = c("No", "Yes")))
  
# preview data
glimpse(data_diabetes)

Rows: 403
Columns: 22
$ id       <labelled> 1000, 1001, 1002, 1003, 1005, 1008, 1011, 1015, 1016, 10…
$ chol     <labelled> 203, 165, 228, 78, 249, 248, 195, 227, 177, 263, 242, 21…
$ stab.glu <labelled> 82, 97, 92, 93, 90, 94, 92, 75, 87, 89, 82, 128, 75, 79,…
$ hdl      <labelled> 56, 24, 37, 12, 28, 69, 41, 44, 49, 40, 54, 34, 36, 46, …
$ ratio    <labelled> 3.6, 6.9, 6.2, 6.5, 8.9, 3.6, 4.8, 5.2, 3.6, 6.6, 4.5, 6…
$ glyhb    <labelled> 4.31, 4.44, 4.64, 4.63, 7.72, 4.81, 4.84, 3.94, 4.84, 5.…
$ location <fct> Buckingham, Buckingham, Buckingham, Buckingham, Buckingham, B…
$ age      <int> 46, 29, 58, 67, 64, 34, 30, 37, 45, 55, 60, 38, 27, 40, 36, 3…
$ gender   <fct> female, female, female, male, male, male, male, male, male, f…
$ height   <dbl> 1.57, 1.63, 1.55, 1.70, 1.73, 1.80, 1.75, 1.50, 1.75, 1.60, 1…
$ weight   <dbl> 54.45, 98.10, 115.20, 53.55, 82.35, 85.50, 85.95, 76.50, 74.7…
$ frame    <fct> medium, large, large, large, medium, large, medium, medium, l…
$ bp.1s    <labelled> 118, 112, 190, 110, 138, 132, 161, NA, 160, 108, 130, 10…
$ bp.1d    <labelled> 59, 68, 92, 50, 80, 86, 112, NA, 80, 72, 90, 68, 80, NA,…
$ bp.2s    <labelled> NA, NA, 185, NA, NA, NA, 161, NA, 128, NA, 130, NA, NA, …
$ bp.2d    <labelled> NA, NA, 92, NA, NA, NA, 112, NA, 86, NA, 90, NA, NA, NA,…
$ waist    <dbl> 0.7366, 1.1684, 1.2446, 0.8382, 1.1176, 0.9144, 1.1684, 0.863…
$ hip      <int> 38, 48, 57, 38, 41, 42, 49, 39, 40, 50, 45, 50, 41, 43, 40, 4…
$ time.ppn <labelled> 720, 360, 180, 480, 300, 195, 720, 1020, 300, 240, 300, …
$ BMI      <dbl> 22.09, 36.92, 47.95, 18.53, 27.52, 26.39, 28.07, 34.00, 24.39…
$ obese    <fct> No, Yes, Yes, No, No, No, No, Yes, No, Yes, No, Yes, Yes, Yes…
$ diabetic <fct> No, No, No, No, Yes, No, No, No, No, No, No, No, No, No, No, …

We can now use descriptive statistics to understand the diabetes data set more.