A Comparison of Machine Learning Classifiers for Fake News Identification Using the ISOT Dataset: Xgboost and Random Forest Achieve 100% Accuracy
Authors
Department, of Engineering and informatics University of Bradford (United Kingdom)
Article Information
DOI: 10.51244/IJRSI.2025.1213CS005
Subject Category: Computer Science
Volume/Issue: 12/13 | Page No: 48-59
Publication Timeline
Submitted: 2025-09-21
Accepted: 2025-09-28
Published: 2025-10-31
Abstract
The swift diffusion of misinformation online is a great threat to public trust and credibility of information. This paper compares four supervised machine learning models: Logistic Regression, Support Vector Machine (SVM), Random Forest, and eXtreme Gradient Boosting (XGBoost) in binary classification of real and fake news on the ISOT false News Dataset. The dataset contains 44,898 news articles from trusted websites and fact-checking websites. After going through strict preprocessing, XGBoost, Random Forest, and SVM achieved 100% accuracy both on cross-validation and the test set, while Logistic Regression achieved an accuracy of 99.16%. This performance exceeds the previously reported performance on the same dataset, including deep learning methods like CNN-RNN (99.7%) and Bi-LSTM (99.95%). The work shows that meticulously crafted traditional machine learning models can achieve better performance than sophisticated deep learning architectures for fake news detection when used on high-quality, balanced datasets. The results confirm the use of ensemble and kernel-based techniques in interpretable, scalable, and high-accuracy misinformation detection systems. This study contributes to the literature on computational journalism, big data analytics, and artificial intelligence by demonstrating the effectiveness of non-deep learning methods in curbing digital disinformation.
Keywords
Machine Learning, Fake news detection, XGBoost, Random Forest, ISOT dataset, Text classification, NLP, Supervised learning, big data analytics, Misinformation, social media
Downloads
References
1. Ahmed, H., Traore, I., & Saad, S. (2017). Detecting opinion spams and fake news using text classification. Security and Privacy, 1(1), e9. https://doi.org/10.1002/spy2.9 [Google Scholar] [Crossref]
2. Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake news detection using machine learning ensemble methods. Complexity, 2020, 1–11. https://doi.org/10.1155/2020/8885861 [Google Scholar] [Crossref]
3. Bryanov, K., & Vziatysheva, V. (2021). Determinants of individuals’ belief in fake news: A scoping review determinants of individuals’ belief in fake news. PLOS ONE, 16(6), e0253717. https://doi.org/10.1371/journal.pone.0253717 [Google Scholar] [Crossref]
4. Cao, J., Guo, J., Li, J., Jin, Z., Guo, H., & Li, J. (2020). Exploring the role of visual content in fake news detection. Information Processing & Management, 57(2), 102025. https://doi.org/10.1016/j.ipm.2019.102025 [Google Scholar] [Crossref]
5. Castillo, C., Mendoza, M., & Poblete, B. (2011). Information credibility on Twitter. In Proceedings of the 20th international conference on World wide web (pp. 675–684). ACM. https://doi.org/10.1145/1963405.1963500 [Google Scholar] [Crossref]
6. Data Reportal. (2023). Digital 2023: Global overview report. https://datareportal.com/reports/digital-2023-global-overview-report [Google Scholar] [Crossref]
7. Dwivedi, Y. K., Hughes, L., Kar, A. K., Baabdullah, A. M., Grover, P., Abbas, R., ... & Wright, R. (2023). Climate change and COP26: Are digital technologies and information management part of the problem or the solution? An editorial reflection and call to action. International Journal of Information Management, 71, 102642. https://doi.org/10.1016/j.ijinfomgt.2022.102642 [Google Scholar] [Crossref]
8. Fayaz, M., Shahid, M., Shafiq, M., & Khattak, H. (2021). Ensemble framework for fake news detection in social media. PeerJ Computer Science, 7, e507. https://doi.org/10.7717/peerj-cs.507 [Google Scholar] [Crossref]
9. Khan, M. I., Moin, A., & Hong, J. (2021). The impact of confirmation bias on fake news detection. Journal of Computational Social Science, 4, 835–854. https://doi.org/10.1007/s42001-020-00083-5 [Google Scholar] [Crossref]
10. Nasir, J. A., Khan, O., & Varlamis, I. (2021). Fake news detection: A hybrid CNN-RNN based deep learning approach. International Journal of Information Management Data Insights, 1(1), 100007. https://doi.org/10.1016/j.jjimei.2020.100007 [Google Scholar] [Crossref]
11. Nguyen, T. T., Nguyen, G. N., Vo, D. M., & Hwang, D. (2024). A hybrid deep learning framework for fake news detection on social media. Applied Intelligence, 54, 2890–2908. https://doi.org/10.1007/s10489-023-05036-8 [Google Scholar] [Crossref]
12. Okoro, E., Lin, X., & Enyia, O. (2018). Fake news and alternative facts: Information literacy in a post-truth era. International Journal of Information, Diversity, & Inclusion, 2(2), 32–50. [Google Scholar] [Crossref]
13. Patel, J., & Parsania, M. (2024). Fake news detection using ensemble machine learning techniques. Journal of Intelligent Systems, 33(1), 117–128. https://doi.org/10.1515/jisys-2022-0112 [Google Scholar] [Crossref]
14. Pennycook, G., McPhetres, J., Zhang, Y., Lu, J. G., & Rand, D. G. (2020). Fighting COVID-19 misinformation on social media: Experimental evidence for a scalable accuracy-nudge intervention. Psychological Science, 31(7), 770–780. https://doi.org/10.1177/0956797620939054 [Google Scholar] [Crossref]
15. Pew Research Center. (2018). News use across social media platforms 2018. https://www.pewresearch.org/journalism/2018/09/10/news-use-across-social-media-platforms-2018/ [Google Scholar] [Crossref]
16. Qader, M. A., Qader, R. A., & Ismael, B. (2020). Big data and its characteristics: A review. International Journal of Research in Engineering and Innovation, 4(6), 354–357. https://doi.org/10.36037/IJREI.2020.4601 [Google Scholar] [Crossref]
17. Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017). Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2931–2937). ACL. https://doi.org/10.18653/v1/D17-1317 [Google Scholar] [Crossref]
18. Sastrawan, A. G., Aryuni, M., & Hidayatullah, R. (2022). Fake news detection using bidirectional LSTM with GloVe word embedding. Procedia Computer Science, 197, 92–99. https://doi.org/10.1016/j.procs.2021.12.121 [Google Scholar] [Crossref]
19. Shivhare, S., Sharma, A., & Yadav, V. (2024). Fake news detection using ensemble machine learning and BERT-based features. Neural Computing and Applications, 36, 16341–16355. https://doi.org/10.1007/s00521-023-08624-6 [Google Scholar] [Crossref]
20. Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2019a). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22–36. https://doi.org/10.1145/3137597.3137600 [Google Scholar] [Crossref]
21. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2019b). Fakenewsnet: A data repository with news content, social context, and dynamic information for studying fake news on social media. Big Data, 8(3), 171–188. https://doi.org/10.1089/big.2020.0062 [Google Scholar] [Crossref]
22. Stahl, B. C. (2018). Fake news and the role of the academic. Journal of Information, Communication and Ethics in Society, 16(2), 145–155. https://doi.org/10.1108/JICES-04-2018-0037 [Google Scholar] [Crossref]
23. Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146–1151. https://doi.org/10.1126/science.aap9559 [Google Scholar] [Crossref]
24. Wang, W. Y. (2017). "Liar, liar pants on fire": A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (pp. 422–426). ACL. https://doi.org/10.18653/v1/P17-2067 [Google Scholar] [Crossref]
25. Wang, Y., Ma, F., Jin, Z., Yuan, Y., Xun, G., Jha, K., ... & Gao, J. (2018). EANN: Event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 849–857). ACM. https://doi.org/10.1145/3219819.3219903 [Google Scholar] [Crossref]
26. Woolley, S. C., & Howard, P. N. (2018). Computational propaganda: Political parties, politicians, and political manipulation on social media. Oxford University Press. https://doi.org/10.1093/oso/9780190931407.001.0001 [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- What the Desert Fathers Teach Data Scientists: Ancient Ascetic Principles for Ethical Machine-Learning Practice
- Comparative Analysis of Some Machine Learning Algorithms for the Classification of Ransomware
- Comparative Performance Analysis of Some Priority Queue Variants in Dijkstra’s Algorithm
- Transfer Learning in Detecting E-Assessment Malpractice from a Proctored Video Recordings.
- Dual-Modal Detection of Parkinson’s Disease: A Clinical Framework and Deep Learning Approach Using NeuroParkNet