From ontology to knowledge graph with agile methods: the case of COVID-19 CODO knowledge graph

DOIhttps://doi.org/10.1108/IJWIS-03-2022-0047
Published date05 October 2022
Date05 October 2022
Pages432-452
Subject MatterInformation & knowledge management,Information & communications technology,Information systems,Library & information science,Information behaviour & retrieval,Metadata,Internet
AuthorMichael DeBellis,Biswanath Dutta
From ontology to knowledge
graph with agile methods:
the case of COVID-19 CODO
knowledge graph
Michael DeBellis
michaeldebellis.com, San Francisco, California, USA, and
Biswanath Dutta
Documentation Research and Training Centre, Indian Statistical Institute,
Bangalore, India
Abstract
Purpose The purpose of this paperis to describe the CODO ontology (COviD-19 Ontology) thatcaptures
epidemiological data about the COVID-19pandemic in a knowledge graph that follows the FAIR principles.
This study took information from spreadsheets and integrated it into a knowledge graph that could be
queried withSPARQL and visualized with the Gruff tool in AllegroGraph.
Design/methodology/approach The knowledge graph was designed with the Web Ontology
Language. Themethodology was a hybrid approach integrating the YAMO methodologyfor ontology design
and Agile methodsto dene iterations and approach to requirements, testingand implementation.
Findings The hybrid approach demonstratedthat Agile can bring the same benets to knowledge graph
projects as it has to other projects. The two-personteam went from an ontology to a large knowledge graph
with approximately5 M triples in a few months. The authorsgathered useful real-world experience on how to
most effectivelytransform from strings to things.
Originality/value This study is the only FAIR model (to the best of the authorsknowledge)to address
epidemiologydata for the COVID-19 pandemic. It also brought to light several practicalissues that generalize
to other studies wishing to go from an ontology toa large knowledge graph. This study is one of the rst
studiesto documenthow the Agile approach can be used for knowledge graph development.
Keywords Agile, COVID-19, ETL, Health care, Knowledge graph, OWL, SDLC, SPARQL, Protégé,
Triplestore, FAIR
Paper type Research paper
1. Introduction
At the beginning of the COVID-19pandemic (March 2020), we began to develop an ontology
called CODO (COviD-19 Ontology) for collectionand analysis of COVID-19 data (Dutta and
DeBellis, 2020). The ontology followed the FAIR model (Wilkinson et al.,2016)for
This work is executed under the research project entitled Integrated and Unied Data Model for
Publication and Sharing of prolonged pandemic data as FAIR Semantic Data: COVID-19 as a case
study,funded by Indian Statistical Institute Kolkata. This work was conducted using the Protégé
resource, which is supported by grant GM10331601 from the National Institute of General Medical
Sciences of the United States National Institutes of Health. Thanks to Franz Inc. (www.allegrograph.
com) for their help with AllegroGraph and Gru. Thanks to Dr Sivaram Arabandi, MD, for his
feedback on the CODO ontology.
IJWIS
18,5/6
432
Received1 March 2022
Revised30 June 2022
5 August2022
Accepted12 September 2022
InternationalJournal of Web
InformationSystems
Vol.18 No. 5/6, 2022
pp. 432-452
© Emerald Publishing Limited
1744-0084
DOI 10.1108/IJWIS-03-2022-0047
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1744-0084.htm
representing data. While otherCOVID-19 ontologies, such as CIDO, VIDO and CoVoc (more
detail provided in Section 1.2), focus on analyzing the virus at a biomolecular level, CODO
focuses on epidemiologicalissues, such as tracking how the virus was spread based on data
about social relations, geography, etc. We evolved what started as a small ontology in
Protégé to a large knowledgegraph in the AllegroGraph triplestore product from Franz Inc.
In this paper, we use theterm ontology to refer to the CODO Web Ontology Language (OWL)
model with only basic exampletest data (i.e. essentially what the ontology community refers
to as T-Box information). We use the term knowledge graph to refer to the ontology
populated with large amountsof real-world data (i.e. T-Box data with A-Box data from real
data sources).
1.1 Knowledge graphs and health care
There has been extensive work in the Semantic Web community and the health-care
domain. The Protégé ontology editor was developedin the Stanford medical school and the
original motivation for Protégé fundingwas as a tool to aid in gene research (Musen, 2015).
The major areas where knowledge graphsare used in health care are:
Drug Discovery and Repurposing Knowledge graphs are extensively used to
analyze the molecular structure and uses of various drugs to develop new
medications and to investigate the possible repurposing of existing medications in
novel ways to treat additional diseases (Zeng et al., 2022).
Harmonization Unlike most other domains there is already an abundance of open
vocabularies to describe medical concepts such as diseases, treatments and
symptoms. These vocabularies are used for medical research, for coding procedures
to report to insurance companies and for many other purposes. However, because of
various local and institutional requirements, a health-care organization may often
need to work with data based on different standards. For example, standards such
as SNOMED, LOINC, ICD and HL7 FHIR overlap considerably, and it is common for
large organizations to use two or more of these and other health-care standards.
Harmonization refers to the mapping of these various models to each other.
Knowledge graph technology can play a signicant role in this harmonization
process. First, because of the nature of the technology with built-in support for
relations such as sameAs in OWL. Second, because most of the health-care
vocabularies are implemented as OWL models. Examples of this type of research
are Bauer et al. (2021) and Visweswaran et al. (2021).
Modeling Genetic and Biomolecular Knowledge The rich semantics of OWL based
on Description Logic provide a powerful language to model biomolecular and
genetic knowledge. Such models can be used for articial intelligence (AI)
applications using both deep learning and semantic AI such as rule-based systems.
Some representative examples of this work are Callahan et al. (2020) and He et al.
(2020).
Statistics Knowledge graph technology is well suited to storing and visualizing
statistical data about various populations. This was used extensively to provide
information to the public about the COVID-19 pandemic. Some representative
examples of this work are provided in Section 1.2.
Semantic Search One of the rst applications for OWL was to facilitate a
meaningful search than a simple keyword search. For example, in a keyword
search, the keyword OWLwill primarily return information about the bird with
the same name. One can inuence the keyword search by adding additional
COVID-19
CODO
knowledge
graph
433

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT