A deep neural network-based approach for fake news detection in regional language

DOIhttps://doi.org/10.1108/IJWIS-02-2022-0036
Published date27 July 2022
Date27 July 2022
Pages286-309
Subject MatterInformation & knowledge management,Information & communications technology,Information systems,Library & information science,Information behaviour & retrieval,Metadata,Internet
AuthorPiyush Katariya,Vedika Gupta,Rohan Arora,Adarsh Kumar,Shreya Dhingra,Qin Xin,Jude Hemanth
A deep neural network-based
approach for fake news detection
in regional language
Piyush Katariya
Department of Computer Science and Engineering,
Bharati Vidyapeeths College of Engineering, New Delhi, India
Vedika Gupta
Jindal Global Business School, O.P. Jindal Global University, Sonipat, India
Rohan Arora,Adarsh Kumar and Shreya Dhingra
Department of Computer Science and Engineering,
Bharati Vidyapeeths College of Engineering, New Delhi, India
Qin Xin
Faculty of Science and Technology, University of Faroe Islands,
Vestarabrygga, Faroe Islands, and
Jude Hemanth
Department of Electronics and Communication Engineering, Karunya University,
Coimbatore, India
Abstract
Purpose The current natural language processingalgorithms are still lacking in judgment criteria, and
these approaches oftenrequire deep knowledge of political or social contexts. Seeingthe damage done by the
spreading of fake news in various sectors have attracted the attention of several low-level regional
communities.However, such methods are widely developedfor English language and low-resource languages
remain unfocused.This study aims to provide analysis of Hindi fake news and develop a referral system with
advancedtechniques to identify fake news in Hindi.
Design/methodology/approach The technique deployed inthis model uses bidirectional long short-
term memory (B-LSTM) as comparedwith other models like naïve bayes, logistic regression, random forest,
support vector machine, decision tree classier, kth nearest neighbor, gated recurrent unit and long short-
term models.
Findings The deep learningmodel such as B-LSTM yields an accuracy of95.01%.
Originality/value This study anticipates that this model will be a benecial resource for building
technologiesto prevent the spreading of fake news and contribute to research with low resourcelanguages.
Keywords Natural language processing, Fake news, Machine learning, Gated recurrent unit,
Bidirectional LSTM (bi-LSTM), Hyperparameters, Fine tuning
Paper type Research paper
1. Introduction
In earlier times, people get their daily news and analysis from different sources like
newspapers, televisions, etc. But with the arrival of the internet, a new era of news sharing
came into existence. Many people nowadays get news updates from online sources and
IJWIS
18,5/6
286
Received19 February 2022
Revised4 May 2022
9 June2022
Accepted5 July 2022
InternationalJournal of Web
InformationSystems
Vol.18 No. 5/6, 2022
pp. 286-309
© Emerald Publishing Limited
1744-0084
DOI 10.1108/IJWIS-02-2022-0036
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1744-0084.htm
social media pages ,which is now in trendand most of the time it has become hard to decide
whether news read by us has any credibility or not, whether it is legitimate or not (Gupta
et al.,2016). In a democracy like India (Ludden, 2005), the right to be informed is a basic
human right. Seeking the right kind of information is essential before forming opinions.
False information or misleadingdetails can be disastrous in many aspects. News channels,
print media, broadcastmedia (Gaurav et al.,2020) and social media (Gaurav et al.,2020) play
a major role in the spread ofinformation and hence in forming the mindset of the nation.
Fake news (Quandt et al., 2019) refers to misleading information that could spread through
various modes and mediums. The news is called fake because it is spread generally todefame a
person or to spread out rumors to damage the reputation of an entity. The motive behind this is
to gain publicity or make money through advertising revenue. This can be disastrous to
society. Ensuring that everyone gets the right information is vital, as it is an important aspect
of the development of a healthy society. Fake news destroys credibility because it is formed on
misleading facts and gures (Sharma and Sharma, 2022a). Real news benets a person
whereas fake news harms. One must be an informed consumer and an awar e citizen to help in
the overall progress of the nation. Fake news can cause chaos all over the nation. In a country
like India, only about 125 million people (Parshad et al.,2016) know how to communicate in
English which constitutes about 10% of the total population. The majority of people read and
interact in their regional languages. These people often fall prey to misleading information. In
the natural language processing area, various initiatives had been taken to detect fake news in
several ways, ranging from language-based approaches to content-based verication.
Although such techniques have been predominantly developed for English, research in low
resources languages like Hindi has been limited (Sharma and Garg, 2021).
About 57.09% of the total population of India (McKibben-Greene, 2020) considers themselves
native Hindi speakers. As a result of which lots of fake and manipulative news are now posing a
huge risk in regional languages. One of the major issues faced during the development of this
project was the lack of resources (Harish and Rangan, 2020) on Indian regional languages. This
was sorted by manual data set creation. Next, was to nd the most accurate algorithmto use for
fake news detection. The data available on social media is enormous, but it is unlabeled a nd
hence it could not be used for training purposes. The need of the hour is to develop a technique for
managing the massive amount of fake and incorrect data present on the internet, which is
misleading the people to a great extent. Several computational methods (Reis et al.,2019)canbe
designed for detecting fake news in Indian regional languages. The analysis would have been
much easier if a labeled data set is accessible to us. As fake news propagation can be cross-
lingual, it is best to have data sets available in a wide variety of languages (Sharma et al., 2022b).
This paper provides an analysisof Hindi fake news on a manually created data set using
various Machine learning classication algorithms as well as using deep learning NLP
techniques like long short-term memory (LSTM) and gated recurrent unit (GRU). In this
work, our focus is shifted toward assembling feasible and reliable sources in the form of a
Hindi corpus for automatic fakenews detection. Social media (Wang et al.,2019) adds fuel to
the re by spreading ill-conceived news during an important event where public opinion
matters the most. Hence,through the analysis, our aim is to nd a possible Natural language
processing algorithm that can determine whether a source is trustworthy or politically
inclined with or withouthuman sentiments involved (Gupta et al.,2019). With large amounts
of data over the internet,checking the credibility of a source poses a huge challenge for us.
The main contributions of this work are the corpus for the Hindi languagewhich can be
used for research and analysison detecting fake news containing real news which is labeled
as 1and fake news labeled as 0,which is collected from several credible and legitimate
news agencies writtenby professional native Hindi-speaking journalists.
Deep neural
network-based
approach
287

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT