Predictive Maintenance in Semiconductor Manufacturing Using Machine Learning on Imbalanced Dataset
Authors
Department of Computer Science, National Research University Higher School of Economics, Moscow (Russia)
Department of Information Security & Artificial intelligence, National Research University Higher School of Economics, Moscow (Russia)
Department of Computer Software Engineering, University of Engineering and Technology (Pakistan)
Article Information
DOI: 10.51244/IJRSI.2025.1210000018
Subject Category: Machine Learning
Volume/Issue: 12/10 | Page No: 170-176
Publication Timeline
Submitted: 2025-09-14
Accepted: 2025-10-22
Published: 2025-10-28
Abstract
Semiconductor manufacturing produces complex high-dimensional data datasets that contain mostly operational records and show product failure occurrences only in a limited portion. Several research studies use machine learning algorithms for predictive maintenance but very few address the issue of SECOM (imbalanced dataset) which contain up to 93% successful outcomes. This paper explains the existing research gap regarding imbalanced data of SECOM dataset and presents an integrated approach with innovative feature reduction and oversampling algorithms and model optimization methods. Our experiments involving the SECOM Semiconductor Manufacturing process dataset with an initial 591 features were reduced to 63 and processed by PCA which led to the Support Vector Classifier (SVC) producing the most accurate results at 98.6% while maintaining robust calibration. The visualization includes both a correlation heatmap showing related features and pie charts showing class distribution before and after data balancing techniques are applied. This research presents implications for predictive maintenance within semiconductor fabs together with future work recommendations.
Keywords
Predictive Maintenance, Semiconductor Manufacturing, SECOM Dataset
Downloads
References
1. Susto, G. A., Schirru, A., Pampuri, S., McLoone, S., & Beghi, A. (2014). Machine learning for predictive maintenance: A multiple classifier approach. IEEE transactions on industrial informatics, 11(3), 812-820. [Google Scholar] [Crossref]
2. Thomas, J., Patidar, P., Vedi, K. V., & Gupta, S. (2022). An analysis of predictive maintenance strategies in supply chain management. Int J Sci Res Arch, 6(01), 308-17. [Google Scholar] [Crossref]
3. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. [Google Scholar] [Crossref]
4. He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). Ieee. [Google Scholar] [Crossref]
5. Van den Goorbergh, R., van Smeden, M., Timmerman, D., & Van Calster, B. (2022). The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. Journal of the American Medical Informatics Association, 29(9), 1525-1534. [Google Scholar] [Crossref]
6. Yan, P., Abdulkadir, A., Luley, P. P., Rosenthal, M., Schatte, G. A., Grewe, B. F., & Stadelmann, T. (2024). A comprehensive survey of deep transfer learning for anomaly detection in industrial time series: Methods, applications, and directions. IEEE Access, 12, 3768-3789. [Google Scholar] [Crossref]
7. Wang, S., & Chen, Y. (2024, July). Improved yield prediction and failure analysis in semiconductor manufacturing with xgboost and shapley additive explanations models. In 2024 IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA) (pp. 01-08). IEEE. [Google Scholar] [Crossref]
8. Lee, J., Wu, F., Zhao, W., Ghaffari, M., Liao, L., & Siegel, D. (2014). Prognostics and health management design for rotary machinery systems—Reviews, methodology and applications. Mechanical systems and signal processing, 42(1-2), 314-334. [Google Scholar] [Crossref]
9. Farrag, A., Ghali, M. K., & Jin, Y. (2024). Rare Class Prediction Model for Smart Industry in Semiconductor Manufacturing. arXiv preprint arXiv:2406.04533. [Google Scholar] [Crossref]
10. Chen, K., Huang, C., & He, J. (2016). Fault detection, classification and location for transmission lines and distribution systems: a review on the methods. High voltage, 1(1), 25-33. [Google Scholar] [Crossref]
11. Norvig, P. R., & Intelligence, S. A. (2002). A modern approach. Prentice Hall Upper Saddle River, NJ, USA: Rani, M., Nayak, R., & Vyas, OP (2015). An ontology-based adaptive personalized e-learning system, assisted by software agents on cloud storage. Knowledge-Based Systems, 90, 33-48. [Google Scholar] [Crossref]
12. Leksakul, K., Suedumrong, C., Kuensaen, C., & Sinthavalai, R. Predictive Maintenance in Semiconductor Manufacturing: Comparative Analysis of Machine Learning Models for Downtime Reduction. [Google Scholar] [Crossref]
13. El Mourabit, Y., El Habouz, Y., Zougagh, H., & Wadiai, Y. (2020). Predictive system of semiconductor failures based on machine learning approach. International journal of advanced computer science and applications (IJACSA), 11(12), 199-203. [Google Scholar] [Crossref]
14. Guo, P., & Chen, Y. (2024, July). Enhanced yield prediction in semiconductor manufacturing: Innovative strategies for imbalanced sample management and root cause analysis. In 2024 IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA) (pp. 1-6). IEEE. [Google Scholar] [Crossref]
15. Salem, M., Taheri, S., & Yuan, J. S. (2018). An experimental evaluation of fault diagnosis from imbalanced and incomplete data for smart semiconductor manufacturing. Big Data and Cognitive Computing, 2(4), 30. [Google Scholar] [Crossref]
16. Kim, J. K., Han, Y. S., & Lee, J. S. (2017). Particle swarm optimization–deep belief network–based rare class prediction model for highly class imbalance problem. Concurrency and Computation: Practice and Experience, 29(11), e4128. [Google Scholar] [Crossref]
17. Deb, S., Gao, X. Z., Tammi, K., Kalita, K., & Mahanta, P. (2020). Recent studies on chicken swarm optimization algorithm: a review (2014–2018). Artificial Intelligence Review, 53(3), 1737-1765. [Google Scholar] [Crossref]
18. Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197-227. [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- A Machine Learning Model for Predicting the Risk of Developing Diabetes - T2DM Using Real-World Data from Kilifi, Kenya
- AI-Powered Facial Recognition Attendance System Using Deep Learning and Computer Vision
- A Comprehensive Review on Brain Tumour Segmentation Using Deep Learning Approach
- A Scalable Retrieval-Augmented Generation Pipeline for Domain-Specific Knowledge Applications
- Multi-sensor remote sensing and machine learning for aboveground biomass mapping in Vietnam’s Melaleuca wetlands: A review