Open problems in medical federated learning
DOI | https://doi.org/10.1108/IJWIS-04-2022-0080 |
Published date | 20 September 2022 |
Date | 20 September 2022 |
Pages | 77-99 |
Subject Matter | Information & knowledge management,Information & communications technology,Information systems,Library & information science,Information behaviour & retrieval,Metadata,Internet |
Author | Joo Hun Yoo,Hyejun Jeong,Jaehyeok Lee,Tai-Myoung Chung |
Open problems in medical
federated learning
Joo Hun Yoo
Department of Artificial Intelligence, College of Computing and Informatics,
Sungkyunkwan University, Suwon, Republic of Korea, and
Hyejun Jeong,Jaehyeok Lee and Tai-Myoung Chung
Department of Computer Science and Engineering, College of Computing and
Informatics, Sungkyunkwan University, Suwon, Republic of Korea
Abstract
Purpose –This study aims to summarize the critical issues in medical federated learning and
applicable solutions. Also, detailed explanations of how federated learning techniques can be
applied to the medical field are presented. About 80 reference studies desc ribed in the field were
reviewed, and the federated learning frameworkcurrently being developed by the research team is
provided. This paper will help researchers to build an actual medical federated learning
environment.
Design/methodology/approach –Since machine learning techniquesemerged, more efficient analysis
was possible witha large amount of data. However, data regulationshave been tightened worldwide, and the
usage of centralized machinelearning methods has become almost infeasible. Federated learningtechniques
have been introduced as a solution. Even withits powerful structural advantages, there still exist unsolved
challenges in federatedlearning in a real medical data environment. This paper aims to summarizethose by
categoryand presents possible solutions.
Findings –This paper provides fourcritical categorized issues to be aware of when applying the federated
learning technique to the actual medical data environment, then provides general guidelines for building a
federatedlearning environment as a solution.
Originality/value –Existing studies have dealt with issues such as heterogeneity problems in the
federated learning environmentitself, but those were lacking on how these issues incur problems in actual
working tasks. Therefore, this paper helps researchers understand the federated learning issues through
examplesof actual medical machine learning environments.
Keywords Heterogeneity, Data security, Data privacy, Federated learning, Incentive mechanism,
Medical application
Paper type Research paper
© Joo Hun Yoo, Hyejun Jeong, Jaehyeok Lee and Tai-Myoung Chung. Published by Emerald
Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0)
licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for
both commercial and non-commercial purposes), subject to full attribution to the original publication
and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/
legalcode
This research is supported by Institute of Information & Communications Technology Planning &
Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2019-0-00421, AI Graduate School
Support Program(Sungkyunkwan University)) and funded by Institute of Information & Communications
Technology Planning & Evaluation (IITP) grand funded by the Korea government(MSIT)(No.2020-0-00990,
Platform Development and Proof High Trust & Low Latency Processing for Heterogeneous·Atypical·Large
Scaled Data in 5G-IoT Environment).
Medical
federated
learning
77
Received15 April 2022
Revised14 June 2022
Accepted14 June 2022
InternationalJournal of Web
InformationSystems
Vol.18 No. 2/3, 2022
pp. 77-99
EmeraldPublishing Limited
1744-0084
DOI 10.1108/IJWIS-04-2022-0080
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1744-0084.htm
1. Introduction
Machine learning has been widely studied in various research fields for its powerful
performance in data analysis. It was possible to derivebetter results through machine learning
methods by learning the hidden multi-dimensional characteristics of given data that were
difficult for humans to distinguish. This structure of machine learning in the medical imaging
field, where it is crucial to capture fine features in images, has been very helpful in
strengthening the existing diagnostic approaches. For example, support vector machines, deep
neural networks, convolutions and clustering techniques have been applied in the medical field
to effectively search those human-unidentifiable correlations from medical data.
Through the active use of machine learning approaches, the medical field was able to expand
its scope to specific medical fields such as radiology, pathology, neuroscience, genetics and even
mental disorders. However, the biggest issue in the field of medical artificial intelligence (AI) is
not the accuracy of diagnosis, but the protection of patients’personal information.
Federated learning, a machine learning algorithm based on the distributed data
environment, has emerged under stricter data regulations laws around the world. When the
concept of federated learning was first introduced, data privacy regulations such as the EU’s
General Data Protection Regulation, California’s(CA’s) Privacy Rights Act and China’s
Personal Information Protection were representative rules, but now more countries around the
world are implementing efficient regulations, such as Brazil’s Lei Geral de Prote,c~ao de Dados,
Canada’s Digital Charter Implementation and Singapore’s Personal Data Protection Act, to
protect their citizens’personal information. Thus, centralized machine learning methods, that
collect and learn based on the proper amount of data, are no longer applicable under the
personal data protection regulations. In particular, for medical data, researchers and business
providers should follow the Health Insurance Portability and Accountability Act (HIPAA),
which comprehensively protects the medical records and independently identifiable health
information of patients and medical information providers. With numerous increasing data
regulations, researchers have applied various solutions to prevent invasion of privacy.
First, the most common solutions to adopt for data privacy issues are to process and
import variables that can identify individual users when collecting their data. Primary
information leakage can be prevented through the measures such as secure aggregation,
pseudonymization, data reduction, data suppression and data masking in normal data
environments. However,personal health information (PHI) is difficult to apply these security
methods, as it contains any format of information thatcan identify the data owner. PHI is a
wider concept of personal identifiable information (PII), which means sensitive information
such as health insurance records, medical numbers, health status, medical images and
mental health records are included on top of basic user variables. Therefore, data security
methods for PHI are difficult to completely protect the information and to use medical data
efficiently and safely,structural solutions are required to learn without violations.
Federated learning is a structural solution for the existing data privacy violation
problems of machine learning methods. It has a unique structure and characteristics
compared to centralized machinelearning. Traditional machine learning approaches require
a large volume of training data collected from local data owners to the server for model
generation. Federated learning, a decentralized learning structure, generates and develops
deep neural network models without local data collection to the server. The core concepts
used in the neural networkmodel learning process are as below.
Individual clients’data stored in the local environment does not move, and the server
generates the initial training model and delivers it to each participating client. The transferred
initial training model goes through a model update process through learning within each
client’s data environment, and the server collects all of the corresponding results to create a
IJWIS
18,2/3
78
To continue reading
Request your trial