Scheduling aspects in keyword extraction problem

Document

Cited in

Date	01 March 2018
Author	Jan Węglarz,Michał Zimniewicz,Krzysztof Kurowski
Published date	01 March 2018
DOI	http://doi.org/10.1111/itor.12368

Intl. Trans. in Op. Res. 25 (2018) 507–522

DOI: 10.1111/itor.12368

INTERNATIONAL

TRANSACTIONS

IN OPERATIONAL

RESEARCH

Scheduling aspects in keyword extraction problem

Michał Zimniewicza, Krzysztof Kurowskiaand Jan W˛eglarzb

aPoznan Supercomputingand Networking Center, Institute of Bioorganic Chemistry, Polish Academy of Sciences,

Poznan, Poland

bInstitute of Computing Science, Poznan University of Technology,Poznan, Poland

E-mail: Michal.Zimniewicz@man.poznan.pl [Zimniewicz]; Krzysztof.Kurowski@man.poznan.pl [Kurowski];

Jan.Weglarz@cs.put.poznan.pl [W˛eglarz]

Received 18 October 2016; accepted 23 October 2016

Abstract

The amount of big data collected during human–computer interactions requires natural languageprocessing

(NLP) applications to be executed efﬁciently, especially in parallel computing environments. Scalability and

performance are critical in many NLP applications such as search engines or web indexers. However, there

is a lack of mathematical models helping users to design and apply scheduling theory for NLP approaches.

Moreover, many researchers and software architects reported various difﬁculties related to common NLP

benchmarks. Therefore, this paper aims to introduce and demonstrate how to apply a scheduling model for

a class of keyword extraction approaches. Additionally, we propose methods for the overall performance

evaluation of different algorithms, which are based on processing time and correctness (quality) of answers.

Finally,we present a set of experiments performed in differentcomputing environments together with obtained

results that can be used as reference benchmarks for further research in the ﬁeld.

Keywords:keyword extraction; scheduling model; natural language processing; evaluation;big data; parallel computing

1. Introduction

Natural language processing (NLP) is a ﬁeld on the borderline between linguistics and artiﬁcial

intelligence. In the past, the processing of a sentence by an NLP application could last even up to

several minutes (Cambria and White, 2014), and it was still acceptable. Nowadays, the amount of

big data, especially the user-generated content spread over the Internet, to be processed by search

engines, web indexers, or other information retrieval solutions requires that NLP applications

generate answers or results in a relatively short processing time. In fact, for many NLP applications

the overall performance, near real-time, is even more important than accuracy of processing results

as end users are more interested in interactive applications and search engines (Banko and Brill,

2001; Norvig, 2007).

2017 The Authors.

International Transactionsin Operational Research C

2017 International Federation of OperationalResearch Societies

Published by John Wiley & Sons Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main St, Malden, MA02148,

USA.

508 M. Zimniewicz et al. / Intl. Trans. in Op.Res. 25 (2018) 507–522

There is an urgent need to address many challenging research problems related to this ﬁeld, in

particular new approaches in computational linguistics that could beneﬁt from scheduling theory.

Therefore, this paper attempts to pave the way for interdisciplinary research by introducing a new

scheduling model for the well-known NLP problem—keyword extraction.

The number of different approaches to solve the keyword extraction problem is constantly grow-

ing. However, it is still difﬁcult to select the best approach as it is not trivial to compare and assess

the overall performance of different algorithms. Therefore, we propose a new evaluation criterion

based on the quality of answers given by a keyword extraction algorithm. Moreover, we show how

to apply the scheduling model, and consequently consider a new performance measure based on

schedule length.

The rest of the paper is organized as follows. Section 2 presents basic assumptions about the

keywordextraction problem and related work. Section 3 deﬁnes a formal description of the keyword

extraction problem and introduces the scheduling model together with speciﬁc characteristics and

examples. Section 4 describes in detail proposed assessment and evaluation methods. Experiments

and obtained results are discussed in Section 5. Section 6 contains ﬁnal remarks and deﬁnes some

directions for future work.

2. Keyword extraction

2.1. Problem description

In a nutshell, the aim of the keyword extraction problem is to identify terms, words, or phrases—

keywords—that describe the subjects of a natural language document in the best possible way.

Keywords of a text document are representative words that give to a human a concise summary or

theme overviewof the content of the document. The large amount of textual information, mostly un-

structured and without any semantic description, availableon the Internet requires the development

of tools that help end users to efﬁciently process such amount of data and quickly search, classify,

and assess natural language documents according to the end users’ needs. The keyword extraction

is a main underlying task of many NLP tools as it is often an important step of many text min-

ing applications, for instance, relevance assessment, index generation, query reﬁnement, document

categorization, named-entity recognition, document similarity measurements, text summarization,

and so on (Beliga, 2014; Siddiqi and Sharan, 2015).

Therefore, in this paper we have selected the keyword extraction problem due to its complexity

and also as a good example from a wide range of NLP applications or their relevant parts.

The following terms are important for the problem description:

rLemma—the canonical form of a set of words, the glossary form. For instance, run is lemma of

run,runs,andran (Radziszewski, 2013).

rLexeme—an abstract unit of lexical meaning, which is shared among the set of all forms taken

by a word with speciﬁed meaning. For instance, words book and books are forms of the same

lexeme; words am,be,andwas are also forms of the same lexeme. All forms of a lexeme have

a common canonical form—a lemma. However, all words having a common lemma are not

necessarily forms of the same lexeme, because they might have different meanings. As a lexeme is

2017 The Authors.

International Transactionsin Operational Research C

2017 International Federation of OperationalResearch Societies

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Scheduling aspects in keyword extraction problem

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users