Leveraging Big Data Analytics for Combating Fake News: A Supervised Learning Approach to Identifying Misinformation on Social Media
Keywords:
Fake news detection, Big data analytics, Machine learning, Natural language processing (NLP), Social media, XGBoost, Random Forest, Text classification, MisinformationAbstract
The rapid rise of social media has transformed how people consume and share information but has also accelerated the spread of misinformation that undermines public trust, public health, and democratic stability. Manual fact-checking and platform moderation often lag behind the speed of misinformation, highlighting the need for scalable, automated solutions. This study develops a supervised machine learning framework supported by Big Data analytics for fake news detection. Using the ISOT Fake News Dataset of 44,898 labeled articles, we implemented a structured pipeline that included text normalization, tokenization, stopword removal, stemming, and TF-IDF vectorization, followed by training four classifiers: Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). Evaluation was conducted using a stratified 80/20 train-test split with 10-fold cross-validation, applying Accuracy, Precision, Recall, and F1-score as performance metrics. Results show that ensemble models, particularly XGBoost and Random Forest, consistently outperformed LR and SVM, achieving accuracies near 99% with strong precision and recall across both classes. These findings demonstrate the strength of optimized ensemble methods in detecting misinformation and their scalability for real-world application. Beyond model performance, this work proposes a distributed architecture leveraging Apache Spark for real-time deployment, providing a foundation for practical and scalable misinformation detection systems.
References
[1] J. Gottfried and E. Shearer, "News use across social media platforms 2016," 2016.
[2] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, "Fake news detection on social media: A data mining perspective," ACM SIGKDD explorations newsletter, vol. 19, no. 1, pp. 22-36, 2017.
[3] J. A. Nasir, O. S. Khan, and I. Varlamis, "Fake news detection: A hybrid CNN-RNN based deep learning approach," International journal of information management data insights, vol. 1, no. 1, p. 100007, 2021.
[4] K. Stahl, "Fake news detection in social media," California State University Stanislaus, vol. 6, no. 1, pp. 4-15, 2018.
[5] W. Y. Wang, "" liar, liar pants on fire": A new benchmark dataset for fake news detection," arXiv preprint arXiv:1705.00648, 2017.
[6] I. K. Sastrawan, I. P. A. Bayupati, and D. M. S. Arsa, "Detection of fake news using deep learning CNN–RNN based methods," ICT express, vol. 8, no. 3, pp. 396-408, 2022.
[7] M. F. Mridha, A. J. Keya, M. A. Hamid, M. M. Monowar, and M. S. Rahman, "A comprehensive review on fake news detection with deep learning," IEEE access, vol. 9, pp. 156151-156170, 2021.
[8] R. K. Kaliyar, A. Goswami, and P. Narang, "FakeBERT: Fake news detection in social media with a BERT-based deep learning approach," Multimedia tools and applications, vol. 80, no. 8, pp. 11765-11788, 2021.
[9] P. Bahad, P. Saxena, and R. Kamal, "Fake news detection using bi-directional LSTM-recurrent neural network," Procedia Computer Science, vol. 165, pp. 74-82, 2019.
[10] T. Jiang, J. P. Li, A. U. Haq, A. Saboor, and A. Ali, "A novel stacking approach for accurate detection of fake news," IEEe Access, vol. 9, pp. 22626-22639, 2021.
[11] S. J. Rigatti, "Random forest," Journal of insurance medicine, vol. 47, no. 1, pp. 31-39, 2017.
[12] M. Belgiu and L. Drăguţ, "Random forest in remote sensing: A review of applications and future directions," ISPRS journal of photogrammetry and remote sensing, vol. 114, pp. 24-31, 2016.
[13] C. Starbuck, "Logistic regression," in The fundamentals of people analytics: With applications in R: Springer, 2023, pp. 223-238.
[14] A. Asselman, M. Khaldi, and S. Aammou, "Enhancing the prediction of student performance based on the machine learning XGBoost algorithm," Interactive Learning Environments, vol. 31, no. 6, pp. 3360-3379, 2023.
[15] T. E. Trueman and A. Kumar, "Attention-based C-BiLSTM for fake news detection," Applied Soft Computing, vol. 110, p. 107600, 2021.
[16] H. Padalko, V. Chomko, and D. Chumachenko, "A novel approach to fake news classification using LSTM-based deep learning models," Frontiers in big Data, vol. 6, p. 1320800, 2024.
[17] J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv, 2019, 1810.04805 v2," There is no corresponding record for this reference, 2021.
[18] R. Anggrainingsih, G. M. Hassan, and A. Datta, "Transformer-based models for combating rumours on microblogging platforms: a review," Artificial Intelligence Review, vol. 57, no. 8, p. 212, 2024.
[19] M. Luqman, M. Faheem, W. Y. Ramay, M. K. Saeed, and M. B. Ahmad, "Utilizing ensemble learning for detecting multi-modal fake news," IEEe Access, vol. 12, pp. 15037-15049, 2024.
[20] R. Sapkota, S. Raza, M. Shoman, A. Paudel, and M. Karkee, "Multimodal large language models for
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Kehinde Racheal Ilugbiyin, Damilola Nnamaka Ajobiewe

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who submit papers with this journal agree to the following terms.