Overview
Teaching: 10 min
Exercises: 5 minQuestions
How can we save and export our cleaned data from OpenRefine?
Objectives
Save an OpenRefine project.
Export cleaned data from an OpenRefine project.
Lesson
Saving and Exporting a Project
In OpenRefine, you can save or export the project. This means you’re saving the data and all the information about the cleaning and data transformation steps you’ve done locally on your computer. Once you’ve saved a project, you can open it up again and be just where you stopped before.
Saving
By default, OpenRefine saves your project continuously. If you close OpenRefine and open it up again, you’ll see a list of your projects. You can click any of them to re-open it.
Exporting
You can also export a project. This is helpful, for instance, if you want to send your raw data and cleaning steps to a collaborator, or share this information as a supplement to a publication.
- Click the
Exportbutton in the top right and selectOpenRefine project archive to file. - A
tar.gzfile will download to your defaultDownloaddirectory. Depending on your browser, you may have to confirm that you want to save the file. Thetar.gzextension tells you that this is a compressed file. The downloadedtar.gzfile is actually a folder containing compressed files. Linux and Mac machines will have software installed to automatically extract this type of file when you double-click on it. For Windows-based machines, you may have to install a utility like ‘7-zip’ in order to extract the file and see the files in the folder (this step, to extract the file, is optional).
Note: If you extracted the compressed file tar.gz , look at the files in this folder. What files are here? What information do you think these files contain?
Solution
You should see:
- a
historyfolder which contains a collection ofzipfiles. Each of these files itself contains achange.txtfile. Thesechange.txtfiles are the records of each individual transformation that you did to your data.- a
data.zipfile. When extracted, thiszipfile includes a file calleddata.txtwhich is a copy of your raw data. You may also see other files.
Importing
You can import an existing project into OpenRefine by clicking Open... in the upper right > Import Project and selecting the tar.gz
project file (the compressed file). This project will include all the raw data and cleaning steps from the original project.
Exporting Cleaned Data
You can also export just your cleaned data, rather than the entire project.
- Click
Exportin the top right and select the file type you want to export the data in. In this case, we will chooseComma-separated value(csv). - The file will be exported to your default
Downloaddirectory. That file can then be opened in a spreadsheet program or imported into programs like RStudio, which we’ll discuss later in our workshop.
Remember from our lesson on data organisation practices that using widely supported, non-proprietary file formats like tsv or csv improves your and others’ ability to use your data.