Minimum common string partition: on solving large‐scale problem instances

Document

Cited in

Author	Christian Blum
Published date	01 January 2020
Date	01 January 2020
DOI	http://doi.org/10.1111/itor.12603

Intl. Trans. in Op. Res. 27 (2020) 91–111

DOI: 10.1111/itor.12603

INTERNATIONAL

TRANSACTIONS

IN OPERATIONAL

RESEARCH

Minimum common string partition: on solving large-scale

problem instances

Christian Blum

Artiﬁcial Intelligence ResearchInstitute (IIIA-CSIC), Campus of the UAB,

Carrer de Can Planas, 08193 Bellaterra, Spain

E-mail: christian.blum@iiia.csic.es [Blum]

Received 8 February2018; received in revised form 9 July 2018; accepted 20 September 2018

Abstract

Minimum common string partition is an NP-hard combinatorial optimization problem from the bioinfor-

matics ﬁeld. The current state-of-the-art algorithm is a hybrid technique known as construct, merge, solve,

and adapt (CMSA). This algorithm combines two main algorithmic components: generating solutions in a

probabilistic way and solving reduced subinstances obtained from the tackled problem instances, if possible,

to optimality. However, the CMSA algorithm was not intended for application to very large problem in-

stances. Therefore, in this paper wepresent a technique that makes CMSA, and other availablealgorithms for

this problem, applicable to problem instances that are about one order of magnitude larger than the largest

problem instances considered so far. Moreover, a reduced variable neighborhood search (RVNS) for solving

the tackled problem, based on integer programming, is introduced. The experimental results show that the

modiﬁed CMSA algorithm is very strong forproblem instances based on rather small alphabets.With growing

alphabet size, it turns out that RVNS has a growing advantage over CMSA.

Keywords:minimum common string partition; large-scale problem instances; reduced variable neighborhood search

1. Introduction

Optimization problemsbased on strings are abundant in research ﬁelds such as bioinformatics (Gus-

ﬁeld, 1997; Blum and Festa, 2016) and text processing (Manning et al., 2008). In the context of

computer science, a string sis a datatype for representing and storing sequence information in terms

of a ﬁnite sequence of characters from a (generally ﬁnite) alphabet . Each string shas a length

denoted by |s|. Words, and even whole texts, may be stored by means of strings. Strings also play an

important role in bioinformatics because most of the genetic instructions involved in the growth,

development, functioning, and reproduction of living organisms are stored in deoxyribonucleic

acid (DNA) molecules, which can be represented by strings over the alphabet ={A,C,T,G}.

2018 The Authors.

International Transactionsin Operational Research C

2018 International Federation ofOperational Research Societies

Published by John Wiley & Sons Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main St, Malden, MA02148,

USA.

92 C. Blum / Intl. Trans. in Op. Res. 27 (2020) 91–111

Similarly, most proteins can be stored as strings of letters from an alphabet of size 20, representing

the 20 standard amino acids that are the building blocks of most proteins.

The minimum common string partition (MCSP) problem is a string-based combinatorial op-

timization problem from the bioinformatics ﬁeld. The problem input consists of two strings, s1

and s2. Both input strings are composed of letters from a ﬁnite alphabet . Moreover, they ful-

ﬁll the property of being related, which means that each letter of has the same number of

occurrences in each input string. The property of being related implies that s1and s2have the

same length n,thatis,|s1|=|s2|=n. A candidate solution to the MCSP problem is obtained by

cutting s1, respectively, s2, into pieces, resulting in a set P1, respectively, P2, of nonoverlapping

substrings. A candidate solution (P1,P2)is a valid solution, if P1=P2. The optimization goal

consists in ﬁnding a valid solution (P1,P2)that minimizes |P1|.1As an example, consider the fol-

lowing two DNA sequences: s1=AAGACTG and s2=ACTAGGA. Counting the number of the

occurrences of all letters in both strings, it is easy to conﬁrm that these two strings are related.

A trivial valid solution can be obtained for any problem instance by cutting both input strings

into substrings of length 1. In the case of the above-mentioned example, the trivial valid solution

is P1=P2={A,A,A,C,T,G,G}, with an objective function value of 7. Note that the optimal

solution in this example is P1=P2={ACT,AG ,G,A}, with objective function value 4. The MCSP

problem, as pointed out by Chen et al. (2005), is closelyrelated to the problem of sorting by reversals

with duplicates, whichis one of the key problems in genome rearrangement. It has been shown to be

NP-hard even in very restrictive cases (Goldstein et al., 2005). Moreover, Jiang et al. (2012) proved

the NP-completeness of the decision version of the MCSPcproblem—with cbeing the alphabet

size—for c≥2.

1.1. Existing works

Research work on the MCSP began with the development of approximation algorithms (see, e.g.,

Shapira and Storer, 2002; Chrobak et al., 2004; Kolman, 2005; Cormode and Muthukrishnan,

2007; Kolman and Wale ´

n, 2007). More recently, Goldstein and Lewenstein (2011) proposed a

greedy algorithm that runs in O(n)time. Another greedy algorithm obtaining better results (on

average) was presented in He (2007). The ﬁrst metaheuristics proposed in the related literature for

the MCSP problem were (a) the Min-Max Ant System from Ferdous and Sohel Rahman (2013,

2017) and (b) the probabilistictree search algorithm from Blum et al. (2014). Both works used a range

of artiﬁcial and real DNA instances from Ferdous and Sohel Rahman (2013) for the experimental

evaluation. However,it was discovered that the best results,at least for the smaller problem instances

used in Ferdous and Sohel Rahman (2013), are obtained by integer linear programming (ILP)

techniques. The ﬁrst ILP model was proposedin Blum et al. (2015), together with a deterministic 2-

phase heuristic based on the ILP model. Later on, improved ILP models were presented in Ferdous

and Sohel Rahman (2015) and Blum and Raidl (2016). The current state-of-the-art technique is a

construct, merge,solve, and adapt (CMSA) approach from Blum et al. (2016). This hybrid technique

sequentially applies an ILP solver to a reduced subinstance of the original problem instance. The

CMSA variant from Blum et al. (2016) is based on the ILP model from Blum et al. (2015).

1Note that minimizing |P1|issimilar to minimizing |P2|.

2018 The Authors.

International Transactionsin Operational Research C

2018 International Federation ofOperational Research Societies

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Minimum common string partition: on solving large‐scale problem instances

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users