Lesson
This time allotted for the teaching and exercises in lessons one through eight in this episode totals 155 minutes. This does not include time for installing OpenRefine, which could take an extra 10-30 minutes depending on how many different platforms and how many computers need OpenRefine installed.
Setup
- There is a separate file for the setup instructions for installing OpenRefine (setup).
- If Internet Explorer is the default browser for participants, OpenRefine may have trouble opening. The URL can be copied and pasted into a Google Chrome or Firefox browser. Or, participants can be encouraged in advance of the workshop to set one of these two browsers as their default.
The dataset used
- A link to the dataset used in this lesson, including a description, can be found on the setup page.
- It will need to be downloaded to the local machine before it can be loaded into OpenRefine.
The Lessons
- Explains what OpenRefine is, what it is used for and where to get help.
- Covers the creation of an OpenRefine project using our dataset.
- The file has a single header row and is csv.
- Facets and clustering are introduced and there is a discussion on the different clustering algorithms and how they may produce different results.
- Splitting columns is covered as is undo/redo.
- Using Include and Exclude from a facet is covered and the difference between faceting and filtering is explained.
- The various sort options for single or multiple columns is covered.
Examining Numbers in OpenRefine
- Explains that everything is a string until you change it.
- Explains how to change the data type and the additional faceting ability it provides.
- Explains how actions within a project can be copied to an external file and re-applied. The same file is used to re-apply the changes.
- Covers the overall format of a project ‘file’ and how the components can be viewed.
- This may require installing additional software on Windows machine (e.g. 7-zip) as the built-in un-zipping facility does not work with tar.gz files.
- Covers how to export a file that only contains a subset of the data.
- Just a list of various OpenRefine resources available on-line (taken from the Ecology lessons)