Instructor Notes

Lesson

This time allotted for the teaching and exercises in lessons one through eight in this episode totals 155 minutes. This does not include time for installing OpenRefine, which could take an extra 10-30 minutes depending on how many different platforms and how many computers need OpenRefine installed.

Setup

  • There is a separate file for the setup instructions for installing OpenRefine (setup).
  • If Internet Explorer is the default browser for participants, OpenRefine may have trouble opening. The URL can be copied and pasted into a Google Chrome or Firefox browser. Or, participants can be encouraged in advance of the workshop to set one of these two browsers as their default.

The dataset used

  • A link to the dataset used in this lesson, including a description, can be found on the setup page.
  • It will need to be downloaded to the local machine before it can be loaded into OpenRefine.

The Lessons

Introduction

  • Explains what OpenRefine is, what it is used for and where to get help.

Working with OpenRefine

  • Covers the creation of an OpenRefine project using our dataset.
  • The file has a single header row and is csv.
  • Facets and clustering are introduced and there is a discussion on the different clustering algorithms and how they may produce different results.
  • Splitting columns is covered as is undo/redo.

Filtering and Sorting

  • Using Include and Exclude from a facet is covered and the difference between faceting and filtering is explained.
  • The various sort options for single or multiple columns is covered.

Examining Numbers in OpenRefine

  • Explains that everything is a string until you change it.
  • Explains how to change the data type and the additional faceting ability it provides.

Using scripts

  • Explains how actions within a project can be copied to an external file and re-applied. The same file is used to re-apply the changes.

Saving results

  • Covers the overall format of a project ‘file’ and how the components can be viewed.
  • This may require installing additional software on Windows machine (e.g. 7-zip) as the built-in un-zipping facility does not work with tar.gz files.

Select subsets

  • Covers how to export a file that only contains a subset of the data.

Other resources in OpenRefine

  • Just a list of various OpenRefine resources available on-line (taken from the Ecology lessons)