Interactive evolutionary approaches to multiobjective feature selection

AuthorMüberra Özmen,Gülşah Karakaya,Murat Köksalan
Date01 May 2018
Published date01 May 2018
DOIhttp://doi.org/10.1111/itor.12428
Intl. Trans. in Op. Res. 25 (2018) 1027–1052
DOI: 10.1111/itor.12428
INTERNATIONAL
TRANSACTIONS
IN OPERATIONAL
RESEARCH
Interactive evolutionary approaches to multiobjective feature
selection
M¨
uberra ¨
Ozmena,G
¨
uls¸ah Karakayaband Murat K¨
oksalana
aDepartment of Industrial Engineering, Middle East Technical University, 06800 Ankara, Turkey
bDepartment of Business Administration, Middle East Technical University, 06800 Ankara, Turkey
E-mail: muberra.ozmen@gmail.com [ ¨
Ozmen]; kgulsah@metu.edu.tr [Karakaya]; koksalan@metu.edu.tr [K¨
oksalan]
Received 30 September 2016; receivedin revised form 20 April 2017; accepted 24 April 2017
Abstract
In feature selection problems, the aim is to select a subset of features to characterize an output of interest.
In characterizing an output, we may want to consider multiple objectives such as maximizing classification
performance, minimizing number of selected features or cost, etc. We develop a preference-based approach
for multiobjectivefeature selection problems. Finding all Pareto-optimal subsets mayturn out to be a compu-
tationally demanding problem and we still would need to select a solution. Therefore, we develop interactive
evolutionary approachesthat aim to converge to a subset that is highly preferredby the decision maker (DM).
We test our approaches on several instances simulating DM preferences by underlying preference functions
and demonstrate that they work well.
Keywords:feature selection; subset selection; interactive approach; evolutionary algorithm
1. Introduction
In classification problems, supervised learning algorithms, such as decision trees, support vector
machines (SVMs), neural networks, etc. are used to predict the class (or output variable) of an
instance by observing its feature (or input variables) values. Supervised learning algorithms train a
prediction model over a dataset, in which different feature and class values of some past observa-
tions are provided, by understanding the relationship between the features and classes. Hence, the
prediction model can be used to classify a new instance based on its features.
The classification performance of a learning algorithm depends on its ability to detect the rela-
tionship between input and output variables accurately. However, the presence of features that are
irrelevant to the class, or the redundancy within the features, may have a negative impact on the
classification performance of the learning algorithm (Kohavi and John, 1997). Yu and Liu (2004)
classify the features based on their relevance with respect to the output as strongly relevant,weakly
C
2017 The Authors.
International Transactionsin Operational Research C
2017 International Federation of OperationalResearch Societies
Published by John Wiley & Sons Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main St, Malden, MA02148,
USA.
1028 M. ¨
Ozmen et al. / Intl. Trans. in Op. Res. 25 (2018) 1027–1052
relevant,andirrelevant.Afeatureisstrongly relevant to class if its existence affects classification
performance independently from the other features used, weakly relevant if it affects the classifi-
cation performance depending on the other features used, and irrelevant if the feature does not
affect the classification performance at all. They argue that the optimal subset of features in terms
of classification performance includes all strongly relevant and weakly relevant and nonredundant
features.Selecting a subset that comprises strongly relevant, and weakly relevantand nonredundant
features to be used in the prediction model of the learning algorithm (or classifier), instead of using
them all, is called as feature selection problem.
Feature selection aims to improve the classification performance by eliminating irrelevant and
redundant features. The decrease in the number of features to be used in the prediction model is
also useful in terms of reducing storage requirements, improving the time efficiency, and simplifying
the prediction model itself (Guyon and Elisseeff, 2003). Therefore, feature selection methods are
used in many areas, such as handwritten digit recognition (Oliveria et al., 2003), medical diagnosis
(Chyzhyk et al., 2014), gene marker recognition (Banerjee et al., 2007), etc.
Even though reduction in the number of input variables seems to be a natural outcome of the
feature selection procedure that aims at maximizing the classification performance, it is possible to
consider minimizing the cardinality of subset as another objective. That is, one may be willing to
reduce the number of variables beyond the number of variables in the subset that gives the best
classification performance to enjoy the benefits of reducing cardinality. In that case, the problem is
converted into a multiobjective problem. Depending on the scope of the problem, other objectives
can also be defined. For example, in a medical diagnosis application,minimizing the screening costs
of medical tests that provide feature values or minimizing the health-related risks involved in those
tests for the patient could be set as objectives.
The algorithms developed for solving featureselection problem can be investigated in two dimen-
sions. First, since it is not straightforward to measure the impact of using a feature on classification
performance, different strategies have been developed for subset selection; which are filter and
wrapper approaches (Kohavi and John, 1997). Second, since the number of possible subsets grows
exponentially with the number of available features, the feature selection problem is combinatorial
in nature. Therefore, many optimization techniques are used to solve the feature selection problem,
such as sequential backward selection, branch-and-bound, best-first search, and genetic algorithms
(Kohavi and John, 1997).
In the literature, feature selection problem is usually treated as a biobjective problem in which
the objectives are maximizing the classification performance and minimizing the cardinality of the
subset. Most of the studies aim to find all nondominated solutions for these two objectives, which
refers to finding the subset with best classification performance for each cardinality level. However,
in the presence of more objectives, enumeration of all nondominated solutions is not practical and
useful because of the combinatorial nature of the problem. Instead of finding all nondominated
solutions, concentrating on solutions that are of more interest to the decision maker (DM) of the
problem is morepractical. Therefore, in this study, interactive evolutionaryalgorithms are developed
for multiobjective feature selection problems that aim to converge the most preferred solution by
guiding the search toward the regions that consists of appealing solutions for the DM.
Measuring the classification performance is an important part of feature selection problems,
and a number of supervised learning algorithms have been developed in the literature. We use an
existing supervised learning algorithm for this purpose. Our contribution is rather in developing
C
2017 The Authors.
International Transactionsin Operational Research C
2017 International Federation of OperationalResearch Societies

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT