1631272.1631393

Fake news detection: deep semantic representation with enhanced feature engineering.

URL:

https://doi.org/10.1145/1631272.1631393

DOI

10.1145/1631272.1631393

Authors and Affiliations:

Mohammadreza Samadi Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran.

Saeedeh Momtazi Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran.

Abstract:

Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).

References:

10.5555/1170745.1171538

C. Chang and C. Lin . LIBSVM: a library for support vector machines , 2001 . Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm. C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.

10.1250/ast.29.247

D. Eck , P. Lamere , T. Bertin-Mahieux , and S. Green . Automatic generation of social tags for music recommendation . In Adv. in Neural Information Processing Systems , volume 20 , 2007 . D. Eck, P. Lamere, T. Bertin-Mahieux, and S. Green. Automatic generation of social tags for music recommendation. In Adv. in Neural Information Processing Systems, volume 20, 2007.

10.1016/j.patrec.2005.10.010

10.1007/978-3-540-24775-3_5

E. L. M. Law , L. V. Ahn , R. B. Dannenberg , and M. Crawford . Tagatune: A game for music and sound annotation . In Proc. Int. Conf. on Music Information Retrieval (ISMIR) , 2007 . E. L. M. Law, L. V. Ahn, R. B. Dannenberg, and M. Crawford. Tagatune: A game for music and sound annotation. In Proc. Int. Conf. on Music Information Retrieval (ISMIR), 2007.

M. Mandel and D. Ellis . Song-level features and support vector machines for music classification . In Proc. Int. Conf. on Music Information Retrieval (ISMIR) , 2005 . M. Mandel and D. Ellis. Song-level features and support vector machines for music classification. In Proc. Int. Conf. on Music Information Retrieval (ISMIR), 2005.

M. Mandel and D. Ellis . Multiple-instance learning for music information retrieval . In Proc. Int. Conf. on Music Information Retrieval (ISMIR) , 2008 . M. Mandel and D. Ellis. Multiple-instance learning for music information retrieval. In Proc. Int. Conf. on Music Information Retrieval (ISMIR), 2008.

10.1109/TASL.2008.2008734

J. C. Platt . Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods . In Advances in Large Margin Classifiers , pages 61 -- 74 . MIT Press , 1999 . J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers, pages 61--74. MIT Press, 1999.

10.1145/564376.564421

10.1109/ICME.2002.1035789

K. Trohidis , G. Tsoumakas , G. Kalliris , and I. Vlahavas . Multilabel classification of music into emotions . In Proc. Int. Conf. on Music Information Retrieval (ISMIR) , 2008 . K. Trohidis, G. Tsoumakas, G. Kalliris, and I. Vlahavas. Multilabel classification of music into emotions. In Proc. Int. Conf. on Music Information Retrieval (ISMIR), 2008.

10.2174/1874479610801010055

10.4018/jdwm.2007070101

D. Turnbull , L. Barrington , and G. Lanckriet . Five approaches to collecting tags for music . In Proc. Int. Conf. on Music Information Retrieval (ISMIR) , 2008 . D. Turnbull, L. Barrington, and G. Lanckriet. Five approaches to collecting tags for music. In Proc. Int. Conf. on Music Information Retrieval (ISMIR), 2008.

10.1109/TASL.2007.913750

G. Tzanetakis and P. Cook . Musical Genre Classification of Audio Signals. IEEE Trans. on Speech and Audio Processing , 10 ( 5 ), July 2002 . G. Tzanetakis and P. Cook. Musical Genre Classification of Audio Signals. IEEE Trans. on Speech and Audio Processing, 10(5), July 2002.

10.1109/MC.2006.196

10.1109/MMSP.2002.1203270

10.1016/S0893-6080(05)80023-1