A non‐compensatory approach for trace clustering

AuthorNikolaos Matsatsinis,Pavlos Delias,Evangelos Grigoroudis,Michael Doumpos
DOIhttp://doi.org/10.1111/itor.12395
Published date01 September 2019
Date01 September 2019
Intl. Trans. in Op. Res. 26 (2019) 1828–1846
DOI: 10.1111/itor.12395
INTERNATIONAL
TRANSACTIONS
IN OPERATIONAL
RESEARCH
A non-compensatory approach for trace clustering
Pavlos Deliasa, Michael Doumposb, Evangelos Grigoroudisband
Nikolaos Matsatsinisb
aDepartment of Accounting and Finance, Eastern Macedonia and Thrace Institute of Technology, Agios Loukas, Kavala
PC 65110, Greece
bSchool of Production Engineering and Management, Technical University of Crete, University Campus, Kounoupidiana,
Chania PC 73100, Greece
E-mail: pdelias@teiemt.gr [Delias]; mdoumpos@dpem.tuc.gr [Doumpos]; vangelis@ergasya.tuc.gr [Grigoroudis];
nikos@dpem.tuc.gr [Matsatsinis]
Received 27 June2016; received in revised form 13 November 2016; accepted 11 January 2017
Abstract
One of the main functions of process mining is the automated discoveryof process models from event log files.
However, in flexible environments, such as healthcare or customer service,delivering comprehensible process
models can be very challenging, mainly due to the complexity of the registered logs. A prevalent response
to this problem is trace clustering, that is, grouping behaviors and discovering a distinct model per group.
In this paper, we propose a novel trace clustering technique inspired from the outranking relations theory.
The proposed technique can handle multiple criteria with strongly heterogeneous scales, and it allowsa non-
compensatory logic to guide the creation of a similarity metric. To reach this, we use three key components:
We separate factors that are in favor of the similarity from those that are not, through discrimination
thresholds; we provide non-concordant factors with a “veto” power; and we aggregate all factors into an
overall metric. We evaluated this novel, non-compensatory approach against two of the most spotlighted
trace clustering functions: variants’ identification and model complexity reduction. Results suggest that the
proposed technique can be used at both functions with compelling performance.
Keywords:trace clustering; process mining; multiple criteria decision aid
1. Introduction
The idea of process mining is to discover, monitor, and improve real processes (i.e., not assumed
processes) by extracting knowledge from event logs readily available in today’s (information)
systems. The starting point for process mining is an event log. All process mining techniques
assume that it is possible to sequentially record events such that each event refers to an activity
(i.e., a well-defined step in some process) and is related to a particular case (i.e., a process instance).
Event logs may store additional information about events (such as the timestamp, the resource
C
2017 The Authors.
International Transactionsin Operational Research C
2017 International Federation ofOperational Research Societies
Published by John Wiley & Sons Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main St, Malden, MA02148,
USA.
P. Delias et al. / Intl. Trans. in Op. Res. 26 (2019) 1828–1846 1829
performing the activity, the case data attributes, etc.). In other words, each case is leaving a trace,
which corresponds to the observed behavior.
Process discovery is one of the three basic types of process mining. Based on an event log, a
process model is learned typically by identifying process patterns in collections of events. To get an
overall impression of the event log, a number of structural metrics have been proposed (G¨
unther,
2009, pp. 50–56), for example, log variety, support, structure, level of detail, etc. These metrics
provide a rough idea whether there is a need to summarize the log (i.e., cluster a log into sublogs
such that each sublog exhibits fewer variations) or not.
In flexible environments, such as healthcare or customer service, the observed behavior is ex-
pected to vary considerably, that is, there is no dominant flow path. Such a high variability ob-
structs the process discovery task since it regularly leads to “spaghetti” process models (Bose
and Aalst, 2009b; Veiga and Ferreira, 2010). In such cases, there are two ways to proceed: ei-
ther to discover rules rather than models, or if indeed process models are needed, to improve
the comprehensibility of the discovered models. To this end, there are the following possible
approaches: preprocess the event log (filtering, transformations) (Bose and Aalst, 2009a). Some
relevant and popular tactics are to remove cases (e.g., incomplete cases, cases of certain clients)
or to remove activities (e.g., keep just milestone activities, or eliminate spider activities—steps
of the process that can be performed at any point in time during the process). A second ap-
proach is abstraction (e.g., using special discovery techniques such as the fuzzy miner (G¨
unther
and Aalst, 2007), or just focusing on the frequent paths). A third approach is to horizontally par-
tition the event log file, in a way that parts of the process are discovered separately and analyzed
piecemeal (Delias and Lakiotaki, 2018). Finally, the most prominent approach is trace clustering
(Song et al., 2009; Bose and Aalst, 2010; De Weerdt et al., 2013). Trace clustering is about group-
ing behaviors, and discovering a distinct model per group, thus delivering more comprehensible
results.
The goal of this paper is to contribute to the latter approach (trace clustering), by proposing an
effective methodology. To this end, we propose a multiple criteria approach to create a similarity
metric. The main problem that we try to respond to is how to summarize a process event log, when
so much variability exists, thus to facilitate knowledge discovery. Knowledge discovery is facilitated
mainly via the reduction of complexity of the discovered models as well as, via the identification
and grouping of the process variants.
Trying to cluster traces using a single-criterion approach is equivalent to deliberately discarding
certain aspects of reality. Moreover, relying on a single criterion is prone to dictating an idiosyncrat-
ing point of view as objective (Roy, 1996). Therefore, to reach an effective clustering, it is necessary
to develop a family of criteria that preserves, for each of them, the original concrete meaning of the
objects’ similarity.
The trace clustering problem has a number of particularities, that set up some extra require-
ments for the multiple criteria aggregation method. First of all, traces similarity is expected to
be dependent on various criteria, some of them are numerically measured and some of them are
measured by an ordinal or a binary scale. These criteria (e.g., performance metrics, bag of activ-
ities, case data attributes) could be strongly heterogeneous with regard to their evaluation scales.
However, the method must be able to deliver results in cases where aggregating all the criteria in
a unique and common scale is challenging. In addition, the process analysts (or stakeholders),
while assessing the similarity of a pair of traces, may not wish to compensate the loss on a given
C
2017 The Authors.
International Transactionsin Operational Research C
2017 International Federation of OperationalResearch Societies

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT