Variable selection for collaborative filtering with market basket data

DOIhttp://doi.org/10.1111/itor.12518
AuthorWook‐Yeon Hwang
Date01 November 2020
Published date01 November 2020
Intl. Trans. in Op. Res. 27 (2020) 3167–3177
DOI: 10.1111/itor.12518
INTERNATIONAL
TRANSACTIONS
IN OPERATIONAL
RESEARCH
Variable selection for collaborative filtering with market basket
data
Wook-Yeon Hwang,
College of Global Business, Dong-A University, Busan, South Korea
E-mail: wyhwang@dau.ac.kr
Received 15 December 2016; receivedin revised form 25 October 2017; accepted 20 January 2018
Abstract
The marketbasket data in the form of a binary user–item matrix or a binary item–user matrixcan be modeled as
a binary classification problem, whichactually tackles collaborative filtering (CF) as well as target marketing.
Effective variable selection (VS) can increase the prediction accuracy as well as identify important users or
items in CF as well as target marketing.Therefore, we propose twonew VS approaches: a Pearson correlation-
based approach and a forward random forests regression-based approach, comparing the performance in a
variety of experimental settings. The experimentalresults show that the proposed VS approaches outperform
the conventional approaches in the examples. Furthermore, the experimental results are more reasonable
and informative than the previous experimental results because the binary misclassification error and Top-N
accuracy for the user CF, the item CF, the user modeling, and the item modeling are all considered in this
paper.
Keywords: market basket data; Pearson correlation; random forests; supervised learning-based collaborative filtering;
target marketing; variable selection
1. Introduction
Recommender systems can suggest to users which items can be more valuable for them. The main
drivers of recommender systems are collaborative filtering (CF) techniques. Although there have
been main challenges for CF, such as data sparsity, shilling attack, and privacy protection, re-
searchers have striven to overcome the difficulties. There are three categories of CF techniques:
memory-based, model-based, and hybrid CF techniques (Su and Khoshgoftaar, 2009). The appli-
cation areas of CF cover movies, books, images, and so on (Park et al., 2012). We mainly focus on
data sparsity, model-based CF techniques, and movies in this paper.
In general, voting scores aremissing, which is called data sparsity.The model-based CF techniques
struggle with data sparsity because they should depend on the information of votingscores. In order
Corresponding author.
C
2018 The Authors.
International Transactionsin Operational Research C
2018 International Federation ofOperational Research Societies
Published by John Wiley & Sons Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main St, Malden, MA02148,
USA.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT