February 12, 2016

Ontology term extraction using MIREOT

  1. Introduction
  2. Terms
  3. Tools
  4. Steps
  5. References

Some time ago when I worked on an ontology for Infective Endocarditis, I put together a step-by-step process for extracting a term in an ontology along with its hierarchy using a process called Minimum Information to Reference an External Ontology Term (MIREOT). My old blog entry can be found here, but here are the steps again.

Introduction

When a term needs to be represented in an ontology, it is quite likely that it has already been defined somewhere. Just as in software development, ontologists also use the principle of don't re-invent the wheel, unless of course there are some proprietary/licensing issues.

Terms from one ontology can be added to another ontology in one of several ways (see my post Reuse, don't reinvent! for more on this):

  1. Single term only
  2. Single term along with its hierarchy
  3. Import a logical segment as a module
  4. Import the entire ontology

The important thing is reuse - the same (term) URIs are being used instead of minting new ones. There are pros and cons of each approach. The focus here is on #2 - extracting a term along with its hierarchy.

There are a couple of things we need before we jump into the term hierarchy extraction process - (1) a list of terms, and (2) a tool that facilitates this would be nice.

Terms

For this exercise, I will use a list of terms relevant for Infective Endocarditis - some examples below, and a longer list is in this spreadsheet. The terms are to be extracted from the NCBITaxon ontology.

URIName
http://purl.obolibrary.org/obo/NCBITaxon_78535Streptococcus viridans
http://purl.obolibrary.org/obo/NCBITaxon_1305Streptococcus sanguis
http://purl.obolibrary.org/obo/NCBITaxon_545774Streptococcus bovis
http://purl.obolibrary.org/obo/NCBITaxon_1309Streptococcus mutans

Tools

OntoFox is a service developed by the He group to extract term hierarchy using the MIREOT principle. There are probably other tools available (Protege plugins, etc.), but I find this to be simplest one to use.

Steps

Ontofox provides 3 options for extracting terms. Here I will be using option #1 Data input using web forms.
  1. Select one ontology:
    • I select NCBI organismal classification (NCBITaxon).
  2. Term specification: (This is bottom-up term specification)
    1. Include low level source term URIs (one URI per line):
      • entered the 35 NCBITaxon term URIs
    2. Include top level source term URIs and target direct superclass URIs (one URI per line):
      • entered http://purl.obolibrary.org/obo/NCBITaxon_131567 #cellular organisms for the top level term
    3. Select a setting for retrieving intermediate source terms:
      • selected includeAllIntermediates
  3. Annotation/Axiom Specification: Include source annotation URIs (one URI per line):
    • added includeAllAxioms
  4. Annotation/Axiom to be excluded (one URI per line):
    • left it blank
  5. URI of the OWL(RDF/XML) output file:
    • I will go with the recommendation here - added http://purl.obolibrary.org/obo/your_ontology/external/NCBITaxon_import.owl

And finally click the button Get OWL (RDF/XML) Output File. That's it!

In a few seconds the Results page was displayed with links to the output and input files. Downloaded the two files. NOTE: the output owl file may be named as .xml, so you may have to rename it back to .owl.

Opened the owl file in Protege 5.0 beta – shows a Class count of 109:

Infective Endocarditis Ontology metrics

and here is the class hierarchy for the extracted module:

Infective Endocarditis class hierarchy

No object or data properties were extracted, only the following annotation properties:

Infective Endocarditis annotation properties

References

  1. Xiang Z, Courtot M, Brinkman RR, Ruttenberg A, He Y. OntoFox: web-based support for ontology reuse. BMC Research Notes. 2010, 3:175. PMID: [20569493]
  2. MIREOT presentation from ICBO 2009
Tags: ontology