Projects

RaukR 2025 • R Beyond the Basics

Datasets for projects
Author

Guilherme Dias

Published

08-Jun-2025

1 Datasets

Hands-on analysis of actual data is hands down the best way to learn R programming. This page contains some datasets that you can use to explore what you have learned in this course. For each data set, a brief description is provided.

Take a chance

The projects might be a good chance to explore parts of the course that didn’t necessarily “click” for you. So instead of going for something familiar, maybe take a chance and try to venture into the topics that challenged you the most.


1.1 Palmer penguins 🐧

penguins <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/heplots/peng.csv", header = T, sep = ",")
str(penguins)

1.2 Drinking habits 🍷

# this will download the csv file directly from the web
drinks <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/stevedata/nesarc_drinkspd.csv", header = T, sep = ",")
str(drinks)

1.3 Car crashes 🚗

crashes <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/DAAG/nassCDS.csv", header = T, sep = ",")
str(crashes)

1.4 Gapminder health and wealth 📈

gapminder <- readr::read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/dslabs/gapminder.csv")
str(gapminder)

1.5 StackOverflow survey 🖥️

stackoverflow <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/modeldata/stackoverflow.csv", header = T, sep = ",")
str(stackoverflow)

1.6 Doctor visits 🤒

doctor <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/AER/DoctorVisits.csv", header = T, sep = ",")
str(doctor)

1.7 Video Game Sales 🎮

# this will download the file to your working directory
download.file(url = "https://maven-datasets.s3.amazonaws.com/Video+Game+Sales/Video+Game+Sales.zip", destfile = "video_game_sales.zip")
# this will unzip the file and read it into R
videogames <- read.table(unz(filename = "vgchartz-2024.csv", "video_game_sales.zip"), header = T, sep = ",", quote = "\"", fill = T)
str(videogames)

1.8 LEGO Sets 🏗️

# this will download the file to your working directory
download.file(url = "https://maven-datasets.s3.amazonaws.com/LEGO+Sets/LEGO+Sets.zip", destfile = "lego.csv.zip")
# this will unzip the file and read it into R
lego <- read.table(unz(filename = "lego_sets.csv", "lego.csv.zip"), header = T, sep = ",", quote = "\"", fill = T)
str(lego)

1.9 Shark attacks 🦈

# this will download the file to your working directory
download.file(url = "https://maven-datasets.s3.amazonaws.com/Shark+Attacks/attacks.csv.zip", destfile = "attacks.csv.zip")
# this will unzip the file and read it into R
sharks <- read.table(unz(filename = "attacks.csv", "attacks.csv.zip"), header = T, sep = ",", quote = "\"", fill = T)
str(sharks)

2 APIs

Most real world data-rich services do not provide ready to download files like the ones we have above. Instead, data retrieval usually happens through an API, or Automation Programming Interface. These are software layers between your code/app/etc and a service or database, allowing you to retrieve data programmatically. API integration allows you to access large volume real-time or near-real-time data like stock prices or public social media posts.

R has plenty of support for working with APIs, very often though http requests (httr package). Each API will function differently and require you to read some documentation to interact with it.

Below are some public APIs (free, with rate limits) with lots of data that you can explore. But remember APIs are everywhere, so feel free to find them elsewhere as well.

2.1 The World Bank 🌎

The World Bank has historical data on economic and social development, environment, infrastructure, and governance for many countries around the world, sometimes including regional data (state and city level).

Read about the indicators API: https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation

Documentation about the call structure: https://datahelpdesk.worldbank.org/knowledgebase/articles/898581

2.2 NASA 🚀

NASA aggregates data from many of their research projects and make them available through their API portal.

The API key is free and signup is easy. You can browse their data sets here: https://api.nasa.gov/

2.3 European Central Bank 🏦

This API aggregates monetary data for the EU. It’s the same data displayed in their data portal https://data.ecb.europa.eu/.

Read more here: https://data.ecb.europa.eu/help/api/overview

2.4 Pokemon 🐛

With this API you can retrieve info for each Pokemon. Completely free and no authentication required.

https://pokeapi.co/

3 Visualization

Visualization can be useful to make datasets more comprehensible. To gain some inspiration look at the amazing visualizations made by Cédric Scherer using tidyverse https://www.behance.net/cedscherer.