PSO-Weighted Ensemble of Bags-of-Word and N-Gram Classifiers for YouTube Spam Detection
Authors
Universiti Teknikal Malaysia Melaka (Malaysia)
Universiti Teknikal Malaysia Melaka (Malaysia)
Universiti Teknikal Malaysia Melaka (Malaysia)
Universiti Teknikal Malaysia Melaka (Malaysia)
Universitas Bandar Lampung (Indonesia)
Article Information
DOI: 10.47772/IJRISS.2025.91100545
Subject Category: Machine Learning
Volume/Issue: 9/11 | Page No: 7010-7022
Publication Timeline
Submitted: 2025-12-09
Accepted: 2025-12-16
Published: 2025-12-23
Abstract
YouTube spam comments degrade user experience as well as increasing security and monetization risks, highlights the need for resilient automated detection system. The YouTube spam detection system has progressed from relying on single classifiers to incorporating ensemble-based system. However, current YouTube spam ensemble system typically train all base classifiers on homogeneous feature representations and rely on equal or fixed weighting schemes, which limits error diversity and prevents the ensemble from adapting to the varying strengths of individual models. This study proposed a Particle Swarm Optimization weighted ensemble that combined multiple n‑gram and BoW classifiers to build spam detection models. Six single classifiers using 1‑gram to 5‑gram character features and BoW features were combined into ensemble configurations with equal weighting and PSO‑optimized weighting, then evaluated on five YouTube spam datasets spanning Eminem, Katy Perry, LMFAO, Psy, and Shakira datasets. Results demonstrated that PSO‑weighted ensembles consistently outperformed the best single classifier on every dataset, with improvements ranging from 1.0 to 1.5 percentage points and accuracies from 91.65% to 96.79%. The all n‑grams plus BoW with PSO‑optimized weights ensemble delivered robust performance across all datasets, with PSO gains over equal weighting of 0.2 to 0.7 percentage points. These findings confirmed that combining character n‑gram and BoW features captured complementary spam patterns, and that PSO‑based weighting provided an adaptive mechanism for classifier integration. The proposed approach offered a good, generalizable solution for automated spam detection across diverse YouTube comments and social media platforms without extensive manual tuning.
Keywords
Ensemble learning, N-gram features, Particle Swarm Optimization
Downloads
References
1. Aiyar, S., & Shetty, N. P. (2018). N-Gram Assisted Youtube Spam Comment Detection. Procedia Computer Science, 132(Iccids), 174–182. https://doi.org/10.1016/j.procs.2018.05.181 [Google Scholar] [Crossref]
2. Ansari, M. A., Prajapati, P., Dhotre, S., Kumar, S., & Chaudhari, S. (2023). Ensemble Learning based Efficient Spam Detection of YouTube Comments. 2023 6th International Conference on Advances in Science and Technology (ICAST), 448–453. https://doi.org/10.1109/ICAST59062.2023.10454921 [Google Scholar] [Crossref]
3. Beeman, S. P., Morrison, A. M., Unnasch, T. R., & Unnasch, R. S. (2021). Ensemble ecological niche modeling of West Nile virus probability in Florida. PLOS ONE, 16(10), e0256868. https://doi.org/10.1371/journal.pone.0256868 [Google Scholar] [Crossref]
4. Bose, S., Das, C., Banerjee, A., Ghosh, K., Chattopadhyay, M., Chattopadhyay, S., & Barik, A. (2021). An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples. PeerJ Computer Science, 7, e671. https://doi.org/10.7717/peerj-cs.671 [Google Scholar] [Crossref]
5. Chaudhary, V., & Sureka, A. (2013). Contextual feature based one-class classifier approach for detecting video response spam on YouTube. 2013 Eleventh Annual Conference on Privacy, Security and Trust, 195–204. https://doi.org/10.1109/PST.2013.6596054 [Google Scholar] [Crossref]
6. Hasan, M. N., Islam, M. M., Azim, R., & Biswas, J. (2025). YouTube Spam Comment Detection using Transfer Learning and Machine Learning algorithms. 2025 International Conference on Electrical, Computer and Communication Engineering (ECCE), February, 1–6. https://doi.org/10.1109/ECCE64574.2025.11013288 [Google Scholar] [Crossref]
7. Ibrahim, F., Mansour, K., Nasayreh, A., Samara, G., Bashkami, A., Smerat, A., & Nahar, K. M. O. (2025). Computer Methods and Programs in Biomedicine Update Optimized soft-voting CNN ensemble using particle swarm optimization for endometrial cancer histopathology classification. Computer Methods and Programs in Biomedicine Update, 8(August), 100217. https://doi.org/10.1016/j.cmpbup.2025.100217 [Google Scholar] [Crossref]
8. Lahmiri, S., Bekiros, S., Giakoumelou, A., & Bezzina, F. (2020). Performance assessment of ensemble learning systems in financial data classification. Intelligent Systems in Accounting, Finance and Management, 27(1), 3–9. https://doi.org/10.1002/isaf.1460 [Google Scholar] [Crossref]
9. Lin, H.-C., Wang, P., Chao, K.-M., Lin, W.-H., & Yang, Z.-Y. (2021). Ensemble Learning for Threat Classification in Network Intrusion Detection on a Security Monitoring System for Renewable Energy. Applied Sciences, 11(23), 11283. https://doi.org/10.3390/app112311283 [Google Scholar] [Crossref]
10. Mohandas, R., Prasanna, D. S. J. D., Meenakshi, N., R, K., Nathiya, S., & Arivazhagan, N. (2024). An Intelligent Machine Learning Approach to Detect Spam in Social Media. 2024 5th International Conference on Data Intelligence and Cognitive Informatics (ICDICI), 1149–1154. https://doi.org/10.1109/ICDICI62993.2024.10810782 [Google Scholar] [Crossref]
11. Mukherjee, S., Dey, S., & Acharya, A. (2026). YouTube Spam Comment Detection System. In Lecture Notes in Electrical Engineering: Vol. 1026 LNEE (pp. 189–203). https://doi.org/10.1007/978-981-96-6537-2_13 [Google Scholar] [Crossref]
12. Oh, H. (2022). A YouTube Spam Comments Detection Scheme Using Cascaded Ensemble Machine Learning Model. IEEE Access, 10, 40860–40860. https://doi.org/10.1109/ACCESS.2022.3166635 [Google Scholar] [Crossref]
13. Oluwole Ogini, N., Adigwe, W., & Oghenefego Ogwara, N. (2022). Distributed Denial of Service Attack Detection and Prevention Model for IoT based Computing Environment using Ensemble Machine Learning Approach. International Journal of Network Security & Its Applications, 14(4), 39–53. https://doi.org/10.5121/ijnsa.2022.14403 [Google Scholar] [Crossref]
14. Ostvar, N., & Eftekhari Moghadam, A. M. (2020). HDEC: A Heterogeneous Dynamic Ensemble Classifier for Binary Datasets. Computational Intelligence and Neuroscience, 2020, 1–11. https://doi.org/10.1155/2020/8826914 [Google Scholar] [Crossref]
15. R. Abinaya, B. N. E. and P. N. (2020). Spam Detection On Social Media Platforms. 2020 7th International Conference on Smart Structures and Systems (ICSSS), 1–3. https://doi.org/10.1109/ICSSS49621.2020.9201948 [Google Scholar] [Crossref]
16. Shabadi, L., Chaitra, Y. L., Srikanth, P., Vijay Kumar, L., & Kashyap, U. (2023). Youtube Spam Detection Scheme Using Stacked Ensemble Machine Learning Model. 2023 International Conference on Network, Multimedia and Information Technology, NMITCON 2023, 1–7. https://doi.org/10.1109/NMITCON58196.2023.10276002 [Google Scholar] [Crossref]
17. Sinhal, A., Kumar, P., & Aggarwal, G. (2024). Enhancing YouTube Spam Filtration Efficiency Through Deep Learning Based Techniques. Proceedings - 4th International Conference on Technological Advancements in Computational Sciences, ICTACS 2024, 1893–1896. https://doi.org/10.1109/ICTACS62700.2024.10841327 [Google Scholar] [Crossref]
18. Sinhal, A., & Maheshwari, M. (2022). YouTube: Spam Comments Filtration Using Hybrid Ensemble Machine Learning Models. International Journal of Emerging Technology and Advanced Engineering, 12(10), 169–183. https://doi.org/10.46338/ijetae1022_18 [Google Scholar] [Crossref]
19. Titiani, F., & Riana, D. (2022). Ensemble Learning for the Prediction of Marketing Campaign Acceptance. International Journal of Software Engineering and Computer Systems, 8(2), 67–76. https://doi.org/10.15282/ijsecs.8.2.2022.7.0104 [Google Scholar] [Crossref]
20. Tripathi, A., Bharti, K. K., & Ghosh, M. (2019). A Study on Characterizing the Ecosystem of Monetizing Video Spams on YouTube Platform. Proceedings of the 21st International Conference on Information Integration and Web-Based Applications & Services, 222–231. https://doi.org/10.1145/3366030.3366078 [Google Scholar] [Crossref]
21. Yadav, S., Jena, J. J., Prakash Singh, J., Gourisaria, M. K., Jain, S., & Kumar, V. (2025). Spam Detection in YouTube Comments: A Machine Learning Approach. International Conference on Intelligent Systems and Computational Networks, ICISCN 2025, 1–7. https://doi.org/10.1109/ICISCN64258.2025.10934150 [Google Scholar] [Crossref]
22. Yanto, J., Tandiono, R. D., Wulandari, L. A., & Nabiilah, G. Z. (2025). Spam Detection on YouTube Comment Section: Comparison Between Deep Learning and Machine Learning Methods. Proceedings of the 2025 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology, IAICT 2025, 753–759. https://doi.org/10.1109/IAICT65714.2025.11100517 [Google Scholar] [Crossref]
23. You, G., Shiue, Y., Yeh, W., Chen, X., & Chen, C. (2020). A Weighted Ensemble Learning Algorithm Based on Diversity Using a Novel Particle Swarm Optimization Approach. [Google Scholar] [Crossref]
24. Zhang, M., He, Z., Zhang, H., Tan, T., & Sun, Z. (2019). Toward practical remote iris recognition: A boosting based framework. Neurocomputing, 330, 238–252. [Google Scholar] [Crossref]
25. https://doi.org/10.1016/j.neucom.2017.12.053 [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- A Machine Learning Model for Predicting the Risk of Developing Diabetes - T2DM Using Real-World Data from Kilifi, Kenya
- AI-Powered Facial Recognition Attendance System Using Deep Learning and Computer Vision
- A Comprehensive Review on Brain Tumour Segmentation Using Deep Learning Approach
- A Scalable Retrieval-Augmented Generation Pipeline for Domain-Specific Knowledge Applications
- Predictive Maintenance in Semiconductor Manufacturing Using Machine Learning on Imbalanced Dataset