A review of credit scoring research in the age of Big Data

Author:Ceylan Onay, Elif Öztürk
Position:Department of MIS, Boğaziçi University, Istanbul, Turkey

Purpose This paper aims to survey the credit scoring literature in the past 41 years (1976-2017) and presents a research agenda that addresses the challenges and opportunities Big Data bring to credit scoring. Design/methodology/approach Content analysis methodology is used to analyze 258 peer-reviewed academic papers from 147 journals from two comprehensive academic... (see full summary)

A review of credit scoring
research in the age of Big Data
Ceylan Onay and Elif Öztürk
Department of MIS, Bo
gaziçi University, Istanbul, Turkey
Purpose This paper aims to survey the credit scoring literature in the past 41 years (1976-2017) and
presentsa researchagenda that addresses the challenges and opportunities Big Data bringto creditscoring.
Design/methodology/approach Content analysis methodologyis used to analyze 258 peer-reviewed
academic papers from 147 journals from two comprehensive academic research databases to identify their
research themes and detect trends and changes in the credit scoring literature according to content
Findings The authors nd thatcredit scoring is going through a quantitativetransformation, where data-
centric underwriting approaches,usage of non-traditional data sources in credit scoring and their regulatory
aspectsare the up-coming avenues for further research.
Practical implications The papersndingshighlight the perils and benets of using Big Data in credit
scoring algorithms for corporates,governments and non-prot actors who develop and use new technologies
in credit scoring.
Originality/value This paper presents greater insight on how Big Data challenges traditional credit
scoring models and addresses the need to develop new credit models that identify new and secure data
sources and convertthem to useful insights that are in compliance with regulations.
Keywords Big Data, Financial inclusion, Credit scoring, Access to credit, Data privacy,
Discriminatory scoring
Paper type Research paper
1. Introduction
Credit scoring helps lenders evaluate the potential risk of new customers and also assess future
behavior of existing customers by using statistical models to transform relevant data into
numerical measures that guide credit decisions (Abdou and Pointon, 2011). Credit scoring
traditionally relies on consumersnancial history to generate a credit score, which indicates
the borrowerscredit risk[1]. However, Big Data is bringing disruptive change to credit scoring.
Campbell-Verduyn et al. (2017) discuss that Big Data is penetrating to nancial services
industry via credit bureaus and ntechs, who are using Big Data in their algorithms[2].
Big Data is initially dened as information assets with 3Vs:high-volume, high
velocity and high-variety, which stressthe importance of the volume of the data, the speed
it is collected, stored, analyzed and the diversity of data sources in generating insights for
better decision-making. This denition is then extended to include two additional Vs
veracity, referring to the quality of the data andvalue referring to the usefulness of the data
(Laney, 2001;Frizzo-Barkeret al., 2016). At the age of Big Data relevant data, once dened
mainly as the payment history of borrowers, is now extended to include data from social
networks (Wei et al.,2016;Geet al.,2017) and data from mobile phones and digital footprints
The authors would like to thank the anonymous referees for their constructive comments and
Journalof Financial Regulation
Vol.26 No. 3, 2018
pp. 382-405
© Emerald Publishing Limited
DOI 10.1108/JFRC-06-2017-0054
The current issue and full text archive of this journal is available on Emerald Insight at:
of users from smart apps (Jenkins, 2014;Dwoskin, 2015;Lohr, 2015)[3]. In fact, Wei et al.
(2016) and Kshetri (2016) show that Big Data enables creditworthiness assessment of
potential borrowers with limited nancial history and thereby increases access to nancial
services, particularlyfor low-income borrowers and micro-enterprises.
Yet, usage of Big Data and associated algorithms raise concerns on the enforcement and
adequacy of regulations that aim to prevent discriminatory scoring to protect consumers
rights to question their scores and consumersprivacy via regulations such as US Fair
Credit Reporting Act, Equal Credit OpportunityAct, Fair and Accurate Credit Transactions
Act (2003) and Privacy Guidelines of Organisation for Economic Co-operation and
Development (OECD) (Campbell-Verduyn et al., 2017). These algorithms are criticized for
being black boxesdue to their opacity for producing arbitrary results and for furthering
discrimination (Citron and Pasquale, 2014). Big Data also poses challenges to privacy and
security of personal information as revealed by the recent Equifax data breach, where
approximately 143 million Americanspersonal data were stolen by hackers. In a recent
statement Senator Mark Warner, the Senate Cybersecurity Caucus co-founder, called the
breach a real threat to the economic security of Americansand mentioned the need to
rethink data protectionpolicies(Mathews, 2017).
The interplay between nance, technology and regulation is not new. The development of
information and communication technologies has contributed to nancial innovation and
globalization of nancial services, accompanied with deregulations and re-regulations over time
(Cerny, 1994). In fact, Perez (2009,2013) discuss that technology revolutions create major
technology bubbles during the transition to the new paradigm. However, once the bubble
collapses, a golden age could be unleashed if the nancial system is restructured accordingly and
institutional governance and regulations are adequately developed. Big Data is now
revolutionizing how nancial services, particularly credit scoring, are created and delivered. The
very actors harnessing these new credit scoring technologies that use Big Data are banks, credit
bureaus, ntech companies and other non-bank nancial service providers such as telecom
companies. While Big Data may enable these actors to develop more accurate algorithms to
assess creditworthiness, predict failure and develop tailored pricing and products/services, it, at
the same time, brings challenges regarding data privacy and security as in the example of
Equifax. However, there is a research-practice gap as the academic research in this eld is scarce.
Accordingly, our study is motivated by the on-going developments regarding new data sources,
technologies and regulations in the credit scoring eld.
Our objective is to gain a better understanding of the main themes of credit scoring as
they relate to technological change and associated regulations over time. Accordingly, we
conducted a content analysis of credit scoringacross Proquest and Emerald research
databases over the past 41 years (1976-2017). Content analysis is a systematic review of
literature to make valid inferences about texts for knowledge building (Weber, 1990;
Finfgeld-Connett, 2014). Accordingly, we reviewed 258 articles that appeared in peer-
reviewed 147 different academic journals from 1976 to 2017, ranging from law journals to
nancial services, computer engineering and operations research journals. Our main
research questionsare:
RQ1. What are the main research themesin credit scoring literature?
RQ2. What is the directionand progression of credit scoring themes over time?
RQ3. What is the relative proportionof application, behavior and other scoring types?
RQ4. What types of statisticaltechniques and models are used in credit scoring?
Credit scoring

To continue reading