Overview
Teaching: 10 min
Exercises: 10 minQuestions
How can we select only a subset of our data to work with?
How can we sort our data?
Objectives
Filter to a subset of rows by text filter or include/exclude.
Sort table by a column.
Sort by multiple columns.
Lesson
Filtering
There are many entries in our data table; we can filter it to work on a subset of the data:
- Click the down arrow next to
disease outcome
and selectText filter
. Adisease outcome
filter will appear on the left margin. - Type in
recovered
. There are 33 matching rows of the original 91 rows (and these rows are selected for the subsequent steps). - At the top, change the view to
Show
50 rows. This way you will see all of the matching rows.
Exercise 2.1
- How can you filter the data to instead only include persons with state
dead
?- How would you further restrict this to only include females or males?
Solution
- In the
disease outcome
filter, typedead
instead of recovered.- To restrict to only one of the sexes, in the
sex
column, run anotherText Filter
and typefemale
. Notice that if you typemale
we also capture all expressions containingmale
, such asfemale
. To filter onlymale
, run the filter forfemale
and then use theinvert
function.
Remove the filters before moving on so that you again have the full dataset of 91 records.
Excluding entries
In addition to the simple text filtering we used above, another way to narrow our filter is to include
and/or exclude
entries in a facet. If you still have your facet for disease outcome
, you can use it. If not, use the drop-down menu > Facet
> Text facet
to create a new facet. If you hover over the name in the facet window, you will see the include
or exclude
options.
Faceting and filtering look very similar. A good distinction is that faceting gives you an overview description of all of the data that is currently selected, while filtering allows you to select a subset of your data for analysis.
Exercise 2.2
Use
include/exclude
in aText Facet
to make the same combination ofdisease outcome
andsex
as in the previous exercise.Solution
- In the facet (left margin), hover on
recovered
. Notice that there are entries to the right foredit
andinclude
.- Click
include
. This will explicitly include those recovered, and exclude others that are not explicitly included. Notice that the option has now been changed toexclude
.- Another way to include entries is to click the name directly.
- Do the same thing with
sex
to limit your selection to a combination of the two states.- Click
include
andexclude
in both facets and notice how the selection in the data changes.
Remove the selections before moving on, so that you again have the full dataset of 91 records, either by closing all facets or pressing the Reset All
button in the top of the left margin.
Sort
You can sort the data in a column by using the drop-down menu available in that column.
You can sort by text
, numbers
, dates
or booleans
(TRUE
or FALSE
values). You can also specify in which order to put Blanks
and Errors
in the sorted results.
If this is your first time sorting this table, then the drop-down menu for the selected column shows Sort...
. Select what you would like to sort by (such as numbers
). Additional options will then appear for you to fine-tune your sorting.
Exercise 2.3
Sort all samples by host age. How can you ensure that ages are in numerical order?
Solution
- Press the down arrow in the
age
column, selectSort...
- In the pop-up that appears, tick
numbers
and selectsmallest first
.
If you try to re-sort a column that you have already used, the drop-down menu changes slightly, to > Sort
without the ...
, to remind you that you have already sorted this column. It will give you additional options:
-
Sort
>Sort...
- This option enables you to modify your original sort. -
Sort
>Reverse
- This option allows you to reverse the order of the sort. -
Sort
>Remove sort
- This option allows you to undo your sort.
Sorting by multiple columns
You can sort by multiple columns by performing sort on additional columns. The sort will depend on the order in which you select columns to sort. To restart the sorting process with a particular column, check the sort by this column alone
box in the Sort
pop-up menu.
If you go back to one of the already sorted columns and select > Sort
> Remove sort
, that column is removed from your multiple sort. If this is the only column sorted, the data reverts to its original order.