Approximation algorithms and heuristics for task scheduling in data‐intensive distributed systems

Document

Cited in

Author	Eduardo C. Xavier,Marcelo G. Póvoa
Published date	01 September 2018
DOI	http://doi.org/10.1111/itor.12527
Date	01 September 2018

Intl. Trans. in Op. Res. 25 (2018) 1417–1441

DOI: 10.1111/itor.12527

INTERNATIONAL

TRANSACTIONS

IN OPERATIONAL

RESEARCH

Approximation algorithms and heuristics for task scheduling

in data-intensive distributed systems

Marcelo G. P´

ovoaa,b and Eduardo C. Xavierb

aGoogle, Avenidados Andradas, 3000, Belo Hozitonte, 30260-070, Brazil

bInstitute of Computing, University of Campinas, Av. Albert Einstein, 1251, Campinas 13083-852, Brazil

E-mail: marcelogpovoa@gmail.com [P´

ovoa]; eduardo@ic.unicamp.br [Xavier]

Received 10 May2017; accepted 15 February 2018

Abstract

In this work, we are interested in the problem of task scheduling on large-scale data-intensive computing

systems. In order to achieve good performance, one must construct not only good task schedules but also

good data allocation across nodes on the system, since before a task can be executed, it must have access

to data distributed on the system. In this article, we present a general formulation of a static problem that

combines both scheduling and replication problems in data-intensive distributed systems. We show that

this problem does not admit an approximation algorithm. However, considering a restricted version of the

problem that considers some practical constraints, an approximation algorithm can be designed. From a

practical perspective, we introduce a novel heuristic for the problem that is based on nodes clustering. We

compare the heuristic with two adapted approaches from other works in the literature by computational

simulations using an extensiveset of instances based on real computer grids. We show thatour heuristic often

obtains the best solutions and also runs faster than other approaches.

Keywords:data grids; task scheduling; data replication; approximation algorithms

1. Introduction

Large-scale computing is an increasingly important topic that arises in research areas such as

bioinformatics, meteorology, and nuclear physics. All these ﬁelds have common needs for large

amounts of data and processing power to run long simulations and complex experiments. One

example of such large distributed computing system is a data grid thatis a geographically distributed

computing infrastructure for processing data-driven tasks and is often used for these purposes. The

grid infrastructure is mainly comprised of processing nodes and storage units, connected through

network links.

As in this kind of environments the tasks need a large volume of data ﬁles to process, replicas of

the ﬁles are usually created across the system to improve data access times. Consequently, this leads

2018 The Authors.

International Transactionsin Operational Research C

2018 International Federation of OperationalResearch Societies

Published by John Wiley & Sons Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main St, Malden, MA02148,

USA.

1418 M. G. P´

ovoa and E. C. Xavier / Intl. Trans.in Op. Res. 25 (2018) 1417–1441

to improved network usage and lower task-execution times (Mansouri et al., 2013). Hence, data

replication plays an essential role in the system performance and indeed has been an active topic of

study ( ˇ

Cibej et al., 2005; Baev et al., 2008; Raicu et al., 2009; Nukarapu et al., 2011).

Of course, one must also consider the related problem of task scheduling. System users submit

tasks for execution, and the scheduler must determine on which node to run each task. Generally,

the objective is to minimize the maximum execution time among all nodes (makespan). Each task

requires access to some data ﬁles available in the system. In this way, the data locality must be

considered when choosing a node to execute a task. Preferably, tasks should run at sites containing

replicas of the ﬁles needed.

In this paper, we study the static version of an integrated problem of task scheduling and data

replication in distributed systems. Our formulation is relatively general and can be used in many

scenarios. It turns out that this problem is NP-hard and even nonapproximable. However, we also

consider a restricted version of the problem and present an approximation algorithm for it. From a

practical perspective, wepropose a heuristic algorithm that partitions the system nodes into clusters

and creates replicas according to a replicationpriority function. Task scheduling is performed using

a greedy algorithm and local search is performed to improve the solution quality.

The rest of the paper is organized as follows. Section 2 provides an overview of previous re-

search papers related to data replication and task-scheduling problems, and also summarizes the

contributions of this paper. Section 3 presents a formal problem formulation. Section 4 discusses

the approximability of the problem and introduces an approximation algorithm for its restricted

version. Section 5 describes a novel heuristic algorithm along with two approaches from other au-

thors. Section 6 presents a performance comparison between these algorithms through an extensive

set of computational simulations. Finally, in Section 7 we draw some conclusions and some future

research topics related to this work.

2. Related works and contributions

The general problem of scheduling and data allocation can be found in a wide variety of formula-

tions. Most of the previous studies were concerned with only one of the two subproblems, that is,

solving the task-scheduling problem considering a ﬁxed data allocation, or vice versa (ˇ

Cibej et al.,

2005; Nukarapu et al., 2011; Gonzalez, 2012). Some studies (Chakrabartiand Sengupta, 2008; Raicu

et al., 2009; Mansouri et al., 2013) considered both problems simultaneously, generallyusing heuris-

tics, and without any theoretical guarantee about the quality of the solution (Anikode and Tang,

2011).

We ﬁrst discuss some of the previous studies that provide a theoretical analysis of the proposed

algorithms. Gonzalez (2012) considers a restricted version of a grid that is composed of a bipartite

graph, whose nodes are divided between storage and execution nodes. The tasks and their required

data should be jointly transferred from a node on the ﬁrst partition to the second one, taking into

consideration that the network links have a limited capacity. The problem is to create a scheduling

of transfers of tasks and data to processors nodes in order to minimize the overall completion time.

The author presented an off-line approximation algorithm with a constant ratio to the problem.

The problem of data placement was considered in Baev et al. (2008). The authors considered the

problem of assigning data objects to nodes and also clients to nodes in order to minimize storage

2018 The Authors.

International Transactionsin Operational Research C

2018 International Federation of OperationalResearch Societies

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Approximation algorithms and heuristics for task scheduling in data‐intensive distributed systems

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users