Keyword-based faceted search interface for knowledge graph construction and exploration

DOIhttps://doi.org/10.1108/IJWIS-02-2022-0037
Published date25 October 2022
Date25 October 2022
Pages453-486
Subject MatterInformation & knowledge management,Information & communications technology,Information systems,Library & information science,Information behaviour & retrieval,Metadata,Internet
AuthorSamir Sellami,Nacer Eddine Zarour
Keyword-based faceted search
interface for knowledge graph
construction and exploration
Samir Sellami
LIRE Laboratory, Department of Software Technologies and Information Systems,
Abdelhamid Mehri Constantine 2 University Faculty of New Technologies
of Information and Communication, Constantine, Algeria and Department
of Mathematics and Computer Science, Higher Normal School of Technological
Education Skikda, Azzaba, Algeria, and
Nacer Eddine Zarour
LIRE Laboratory, Department of Software Technologies and Information Systems,
Abdelhamid Mehri Constantine 2 University Faculty of New Technologies
of Information and Communication, Constantine, Algeria
Abstract
Purpose Massive amounts of data, manifesting in various forms, are being produced on the Web
every minute and becoming the new standard. Exploring these information sources distributed in
different Web segments in a unied way is becoming a core task for a variety of usersand companies
scenarios. However, knowledge creation and exploration from distributed Web data sources is a
challenging task. Several data integration conicts need to be resolved and the knowledgeneeds to be
visualized in an intuitive manner. The purpose of this paper is to extend the authorsprevious
integration works to address semantic knowledge exploration of enterprise data combined with
heterogeneous social and linked Web data sources.
Design/methodology/approach The authors synthesize information in the form of a knowledgegraph
to resolve interoperability conicts at integration time. They begin by describing KGMap, a mapping model for
leveraging knowledge graphs to bridge heterogeneous relational, social and linked web data sources. The
mapping model relies on semantic similarity measures to connect the knowledge graph schema with the sources
metadata elements. Then, based on KGMap, this paper proposes KeyFSI, a keyword-based semantic search
engine. KeyFSI provides a responsive faceted navigating Web user interfaced esignedto facilitate the exploration
and visualization of embedded data behindthe knowledge graph. The authors implemented their approach for a
business enterprise data exploration scenario where inputs are retrieved on the y from a local customer
relationship management database combined with the DBpedia endpoint and the Facebook Web application
programming interface (API).
Findings The authors conducted an empirical study to test the effectiveness of their approach using different
similarity measures. The observed results showed better efciency when using a semantic similarity measure. In
addition, a usability evaluation was conducted to compare KeyFSI features with recent knowledge exploration
systems. The obtained results demonstrate the added value and usability of the contributed approach.
Originality/value Most state-of-the-art interfaces allow users to browse one Web segment at a time. The
originality of this paper lies in proposing a cost-effective virtual on-demand knowledge creation approach, a
method that enables organizationsto explore valuable knowledge across multiple Web segments simultaneously.
In addition, the responsive components implemented in KeyFSI allow the interface to adequately handle the
uncertainty imposed by thenature of Web information, thereby providing a better user experience.
Keywords Knowledge exploration, Faceted browsing, Keyword search, Responsive interface,
Virtual data exploration, Knowledge graphs, Linked data
Paper type Research paper
Keyword-
based faceted
search
interface
453
Received26 February 2022
Revised17 June 2022
14August 2022
Accepted3 September 2022
InternationalJournal of Web
InformationSystems
Vol.18 No. 5/6, 2022
pp. 453-486
© Emerald Publishing Limited
1744-0084
DOI 10.1108/IJWIS-02-2022-0037
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1744-0084.htm
1. Introduction
Over the past few decades, the Web has witnessed an unprecedented data avalanche
because of the widespread availability of high-speed internet and the remarkable
progress made in the eld of data management. The massive generated data in
different Web segments manifests in various types and formats and rapidly becomes
the new standard on the Web. With the advent of the linked data initiative (Hitzler,
2021), many structured data sources of almost any domain are publicly available. On
the other hand, social media services have grown in popularity, and social data is
becoming more prosperous and important. In many cases, the information scattered
from these various Web segments is complementary and could be connected with
internal enterprise data to nd better and meaningful insights. Enterprises need to
access in an integrated way these Web data sources to be able to make their best-
informed business decisions. However, most state-of-the-art interfaces allow users to
browse only one Web segment at a time. Furthermore, information integration and
exploration from distributed Web data sources is a challenging task, as data
providers publish data in various heterogeneous data models. They use different
structures and attribute names to encode the information. Some encode their
information in simple text or legacy relational databases; some produce semi-
structured JSON or XML objects, while others offer machine-readable resource
description framework (RDF) results. To achieve the creation and exploration of
integrated knowledge, several data heterogeneity problems such as technical,
structure or semantics conicts need to be resolved. Additionally, exploring multiple
Web segments adds a degree of uncertainty to the user interface (UI ), for example,
connectivity problems, longer query execution times and the size of the retrieved
data.
In this context, the recent enterprise knowledge graphs (EKGs) concept is
emerging as a leading solution, capable of mediating between different data models
and effectively supporting the business decision-making process (Hogan et al.,2021).
An EKG represents a linked graph of entities, concepts and facts as well as all the
relationships between them, providing a more precise and complete representation of
an enterprisesdata(Tiwari et al., 2021). The aim is to connect the concepts and the
elements in the knowledge graph to instance entities in a given collection of
heterogeneous sources to promote data discoverability. The dema nd for knowledge
graphs in enterprises is growing in recent years (Noy et al., 2019). Although the
enterprise domain might vary from marketing to manufacturing, knowledge graphs
can improve performance, reduce costs and establish an additional benettothe
companys services (Hislop et al.,2018). For example, Wikidata and DBpedia have
acquired a central position for the Web of Data (Hogan, 2020), while Googl e, acquired
Freebase and transformed it into its own proprietary Knowledge Graph (Nickel et al.,
2016). Building such a knowledge graph for an enterprise requires addressing data
integration challenges to tackle the variety and the increasing number of publicly
available data sets. Data integration aims to synthesize information from diverse data
sources, usually heterogeneous and independent of each other, into a unied
knowledge graph (Villaz
on-Terrazas et al., 2021). Equally important, the data under
the knowledge graph must be easily explored to be actionable, that is, capable of
providing valuable information on the y to support the decision-making process.
This work extends our paper submitted originally to the KGSWC 2021 conference
(Sellami et al., 2021). In the previous work, we presented a set of complementary tools
for leveraging EKGs by bridging between heterogeneous relational, social and linked
IJWIS
18,5/6
454
data. We suggested MidSemI, a general middleware framework for semantic
integration that resolves interoperability conicts among heterogeneous sources.
MidSemI denes a knowledge graph as a mediated schema and a metadata model that
allows us to describe source metadata in a exible way. In this work, we rst describe
the KGMap approach, an entity matching approach that exploits the knowledge graph
schema and the different sourcesmetadata elements to bridge heterogeneous data.
KGMap consists of three sub-models: the knowledge graph representation model,the
source metadata description model and the mapping model. KGMap handles these
models and species several possible mapping cases. The goal is to assist an expert to
identify mapping candidates by relying on an algorithm that computes a semantic
similarity measure to determine the relatedness between entities in the metadata of
the source that corresponds to elements in the knowledge graph schema. KGMap is
focused on unifying and integrating heterogeneous data into a knowledge graph.
However, querying remains the primary communication mechanism between the
enterprise knowledge graph and external users or applications. Therefore, based on
the KGMap model, we then propose KeyFSI, a keyword-based faceted search
interface. KeyFSI enables organizations to explore valuable knowledge across
multiple Web segments simultaneously. KeyFSI engine relies on a query-rewriting
algorithm that leverages Natural Language Processing (NLP) techniques to allow
simple keyword queries to be reformulated. The results are then retrieved from the
integrated data synthesized under the knowledge graph. KeyFSI is equipped with a
responsive faceted Web UI that handles adequately the uncertainty imposed by the
nature of Web information, with the goal to provide a better user experience. The
proposed solution implementationis a congurable middleware and a responsive Web
interface capable of exploring internal company data combined on the y with inputs
from various SPARQL endpoints and Web application programming interfaces
(APIs). Considering an example of a commercial enterpris e, we demonstrate how our
approach leverages enterprise knowledge graph by combining an instance of a
customer relationship management (CRM) database with the SPARQL endpoint of
DBpedia and the Facebook Web API. We conducted an experimental study to test and
compare the effectiveness of KGMap using different similarity me asures. The
observed results suggest that using a semantic-based similarity measure improves
the accuracy of the proposed approach in terms of precision and recall. Through our
use case, we show the value that can be applied to EKGs exploration using KeyFSI.
The implementation of KeyFSI is a multi-faceted navigation interface that uses
modern Web design patterns. We describe the implementation of the KeyFSI UI, then
we conducted a usability evaluation to test its features. The evaluation results
illustrate the feasibility and the ease of use of the proposed approach.
The remainder of this paper is structured as follows: We begin by summarizing
and discussing the limitations of relevant related works in Section2. In Section 3, we
describe a motivating scenario for integrating and exploring embedded knowledge
along with the emerging challenges in this area. Section 4 presents the KGMap model
and the algorithm used to integrate data by calculating direct mappings pairs fro m
heterogeneous internal and external Web sources. The query-rewriting algorithm and
the steps performed by KeyFSI semantic search engine are explained in Section 5.
Section 6 presents the implementation of our approach and provides an empirical
evaluation of our prototypes accuracy and usability. Finally, Section 7 wraps up and
outlines future work.
Keyword-
based faceted
search
interface
455

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT