Finding ontologies

Licenced under CC-BY 4.0 and OSI-approved licenses, see licensing.

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • How can I find suitable ontologies to describe my data?

Objectives
  • To be aquainted with some different examples of how to find ontologies and ontology terms

As you noticed in the previous excercise, the ENA checklist did not specify any particular standard to use when assigning values to the different fields. However, if you can be more specific regarding how you describe the different metadata aspects for your data it will increase the FAIRness of your data. For this you should use controlled vocabularies or ontologies.

As we have said earlier, there are hundreds of metadata standards, controlled vocabularies and ontologies in the Life Sciences alone. So how do you find the ones that are appropriate for your data? Fortunately, there are tools and resources to help you with this.

FAIRsharing.org is a resource that aims at collecting and connecting existing standards and databases. Use it to find recommended standards for different research domains and databases.

The European Bioinformatics Institute (EMBL-EBI) makes availble a set of resources for ontologies, e.g.:

Exercise: Find suitable ontologies for your data

To be more specific about the terms used for some of the fields, try to find a suitable ontology and the terms to use for the values that will be in your data file (samples_metadata_lesson.csv), for these fields:

  • illness symptoms, using OLS
  • isolation source host-associated, using FAIRsharing.org

Add the name of the ontology and the terms you have selected to the data dictionary.

You don’t have to stick to the tools specified. Try different ones if you want to. As you will probably notice, there is no perfect way that always finds what you want. It will almost always involve a fair bit of trial and error.

Solution

  • For each field you will have to look into the data file to see what values are there now
  • Then try to find an ontology and the appropriate terms in that ontology
  • Add the name of the ontology and the terms you have choosen to the allowed values columns

illness symptoms

  • Go to OLS, search for one of the symptoms in the samples_metadata_lesson.csv file, e.g. fever
  • Choose one of the search results that seems appropriate, e.g. from the NCI Thesaurus OBO Edition - NCIT
  • The term for fever in NCIT is Fever. Also note that it has an identifier NCIT:C3038
  • Find terms for the other symptoms in NCIT, by either browsing the tree (click on the Show siblings button in the ontology tree to expand the contents of the tree), or by searching
  • See below for the NCIT terms for the symptoms

isolation source host-associated

  • Go to FAIRsharing.org
  • Click STANDARDS at the top of the page
  • Search for “anatomy” on the left hand side of the page
  • Find a suitable ontology, e.g. Foundational Model of Anatomy - FMA, and go to the information page
  • Scroll down to the ADDITIONAL INFORMATION section at the bottom, and click the link to BioPortal
  • Select the classes tab. Look through the tree of terms, or search for a term in the “Jump to” field
  • So FMA would be a suitable ontology. See below for the FMA terms for the tissues

The data dictionary could now look something like this:

Current variable name ENA Variable name Measurement unit Allowed values Definition Description
sample id          
patient id host subject id        
sex host sex   male, female, not collected Sex of the individual  
date collection date   format: YYYY-MM-DD, >=proj_start_date & <=today Date of sampling  
location geographic location (country and/or sea)      
location geographic location (region and locality)   , , ...    
age host age years   Age of individual at the time of sampling  
health state host health state   diseased, healthy, not applicable, not collected, not provided, restricted access Health state of individual at time of sampling  
symptoms illness symptoms   NCIT ontology:
Fever (NCIT:C3038), Sore Throat (NCIT:C50747), Fatigue (NCIT:C3036), Ageusia (NCIT:C116374), not applicable
Symptoms experienced in connection with illness  
disease outcome host disease outcome   recovered, dead Final outcome of disease  
tissue isolation source host-associated   FMA ontology:
Laryngopharynx (FMA:54880), Nasopharynx (FMA:54878), Lung (FMA:7195)
Tissue sampled  
isolate isolate     individual isolate from which the sample was obtained  

Below are list for some commonly used ontologies. Please note that this is in no way an exhaustive or a “standard” list.

“Upper ontologies”

“Domain ontologies”

“Other ontologies”