A Self-Evolving Transformer-Based Machine Vision Framework for Adaptive Industrial Defect Diagnosis under Non-Stationary Environments

Authors

Vincent Kibet

Higher Education Leadership Institute Australia Masters of Research (Australia)

Article Information

DOI: 10.47772/IJRISS.2026.100300271

Subject Category: Science

Volume/Issue: 10/3 | Page No: 3636-3660

Publication Timeline

Submitted: 2026-03-16

Accepted: 2026-03-21

Published: 2026-04-03

Abstract

The detection of industrial defect systems is usually challenging when dealing with non-stationary production processes, as they are faced with constantly changing lighting, material characteristics, and defect patterns. The standard Convolutional Neural Network (CNN)-based systems are unable to cope with such changes in distribution without retraining and manually re-labelling of the data. This study presented a self-evolving machine vision framework, which used transformers to adapt to changes in the environment based on continuous meta-learning and uncertainty-based pseudo-labelling. This was also integrated with six basic components, including a backbone of Vision Transformer (ViT) that learned multi-scale features. The adaptable memory module included episodic defect pattern storage, a distribution shift detector based on Maximum Mean Discrepancy (MMD), a meta-learning engine based on the Model-Agnostic Meta-Learning (MAML) algorithm, a self-supervised evolution mechanism coupled with confidence-driven sample selection, and an uncertainty quantification module that uses Monte Carlo Dropout. The proposed framework had a high precision of 94.7% when used in 10 labelled samples on three industrial datasets (steel surface defects, semiconductor wafer inspection, and textile fabric anomaly) with 8.3%-12.6% over the state-of-the-art mechanisms. Even in extreme lighting conditions (96.2%), the system was also able to adapt to new defects within 45 minutes without interrupting the production line. The architecture was 47 times faster in false-positive than ResNet-50, and at 42 FPS on edge devices, meaning that it will be possible to deploy in industry in real time. The self-improving mechanism enabled continuous improvement since 89.4% of pseudo-labels attained a confidence level of more than 95%, illustrating that it does not require constant human supervision.

Keywords

Defect Detection, Vision Transformer, Meta-Learning, Non-Stationary Environments

Downloads

References

1. Baysal, E., & Bayılmış, C. (2025). Overcoming class imbalance in incremental learning using an elastic weight consolidation-assisted common encoder approach. Mathematics, 13(11), Article 1887. https://doi.org/10.3390/math13111887 [Google Scholar] [Crossref]

2. Bhatnagar, P., Arora, T., & Chaujar, R. (2022). Semiconductor wafer map defect classification using transfer learning. In Proceedings of the IEEE Delhi Section Conference (DELCON) (pp. 1–4). IEEE. https://doi.org/10.1109/DELCON54057.2022.9753436 [Google Scholar] [Crossref]

3. Borde, S. (2023). Mitigating catastrophic forgetting in continual learning-based image classification. In Proceedings of the IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI) (pp. 1–16). IEEE. https://doi.org/10.1109/SOLI60636.2023.10425549 [Google Scholar] [Crossref]

4. Chen, Y., Chen, C. P., Han, B., & Yang, Y. (2025). Enhancement in three-dimensional depth with bionic image processing. Computers, 14(8), Article 340. https://doi.org/10.3390/computers14080340 [Google Scholar] [Crossref]

5. Chen, Y., et al. (2021). Surface defect detection methods for industrial products: A review. Applied Sciences, 11(16), Article 7657. https://doi.org/10.3390/APP11167657 [Google Scholar] [Crossref]

6. Cheng, M., Wang, H., & Long, Y. (2022). Meta-learning-based incremental few-shot object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(4), 2158–2169. https://doi.org/10.1109/TCSVT.2021.3088545 [Google Scholar] [Crossref]

7. Cheng, S. B., et al. (2023). Machine learning with data assimilation and uncertainty quantification for dynamical systems: A review. IEEE/CAA Journal of Automatica Sinica, 10(6), 1361–1387. https://doi.org/10.1109/JAS.2023.123537 [Google Scholar] [Crossref]

8. Chien, J.-C., Wu, M.-T., & Lee, J.-D. (2020). Inspection and classification of semiconductor wafer surface defects using CNN deep learning networks. Applied Sciences, 10(15), Article 5340. https://doi.org/10.3390/app10155340 [Google Scholar] [Crossref]

9. Contreras Ortiz, A., Santiago, R. R., Hernandez, D. E., & Lopez-Montiel, M. (2025). Multiclass evaluation of vision transformers for industrial welding defect detection. Mathematical and Computational Applications, 30(2), Article 24. https://doi.org/10.3390/mca30020024 [Google Scholar] [Crossref]

10. Duan, Y., et al. (2024). Learning to diagnose: Meta-learning for efficient adaptation in few-shot AIOps scenarios. Electronics, 13(11), Article 2102. https://doi.org/10.3390/electronics13112102 [Google Scholar] [Crossref]

11. Hao, Z., Chen, Y., Yu, Z., Qian, Y., & Zhao, L. (2025). Thermal imaging-based defect detection method for aluminum foil sealing using EAC-Net. Applied Sciences, 15(18), Article 9964. https://doi.org/10.3390/app15189964 [Google Scholar] [Crossref]

12. Jiang, J., et al. (2025). MetaTrans-FSTSF: A transformer-based meta-learning framework for few-shot time series forecasting in flood prediction. Remote Sensing, 17(1), Article 77. https://doi.org/10.3390/rs17010077 [Google Scholar] [Crossref]

13. Kačinskas, T., & Baskutis, S. (2025). Numerical method for internal structure and surface evaluation in coatings. Inventions, 10(4), Article 71. https://doi.org/10.3390/inventions10040071 [Google Scholar] [Crossref]

14. Khan, A., et al. (2023). A survey of the vision transformers and their CNN-transformer-based variants. Artificial Intelligence Review. https://doi.org/10.1007/s10462-023-10595-0 [Google Scholar] [Crossref]

15. Kim, D. (2025). Uncertainty-aware continual reinforcement learning via PPO with graph representation learning. Mathematics, 13(16), Article 2542. https://doi.org/10.3390/math13162542 [Google Scholar] [Crossref]

16. Li, H., He, W., & Lan, A. (2025). Swin transformer-based real-time multi-tasking image detection in industrial automation production environments. Machines, 13(10), Article 972. https://doi.org/10.3390/machines13100972 [Google Scholar] [Crossref]

17. Li, X., et al. (2025). TA-MSA: A fine-tuning framework for few-shot remote sensing scene classification. Remote Sensing, 17(8), Article 1395. https://doi.org/10.3390/rs17081395 [Google Scholar] [Crossref]

18. Liang, S., Xu, H., Liu, J., Li, J., & Pan, H. (2025). YOLOv8n-GSS-based surface defect detection method of bearing ring. Sensors, 25(21), Article 6504. https://doi.org/10.3390/s25216504 [Google Scholar] [Crossref]

19. Lim, B., & Zohren, S. (2021). Time-series forecasting with deep learning: A survey. Philosophical Transactions of the Royal Society A, 379(2194), 1–33. https://doi.org/10.1098/rsta.2020.0209 [Google Scholar] [Crossref]

20. Lopez-Cabrejos, J., Paixão, T., Alvarez, A. B., & Luque, D. B. (2025). An efficient and low-complexity transformer-based deep learning framework for high-dynamic-range image reconstruction. Sensors, 25(5), Article 1497. https://doi.org/10.3390/s25051497 [Google Scholar] [Crossref]

21. Mahmood, A., & Szabolcsi, R. (2025). A systematic review on risk management and enhancing reliability in autonomous vehicles. Machines, 13(8), Article 646. https://doi.org/10.3390/machines13080646 [Google Scholar] [Crossref]

22. Marín Díaz, G. (2025). Comparative analysis of explainable AI methods for manufacturing defect prediction: A mathematical perspective. Mathematics, 13(15), Article 2436. https://doi.org/10.3390/math13152436 [Google Scholar] [Crossref]

23. Meng, J. (2025). Enhancing game strategy optimization using deep reinforcement learning. IEEE Access, 13, 1–10. https://doi.org/10.1109/ACCESS.2025.3613207 [Google Scholar] [Crossref]

24. Mienye, I. D., & Swart, T. G. (2024). A comprehensive review of deep learning: Architectures, recent advances, and applications. Information, 15(12), Article 755. https://doi.org/10.3390/info15120755 [Google Scholar] [Crossref]

25. Mohammadi, S., Karganroudi, S. S., & Rahmanian, V. (2025). Advancements in smart nondestructive evaluation of industrial machines: A comprehensive review of computer vision and AI techniques for infrastructure maintenance. Machines, 13(1), Article 11. https://doi.org/10.3390/machines13010011 [Google Scholar] [Crossref]

26. Niaz, A., Umraiz, M., Soomro, S., & Choi, K. N. (2025). Vision transformer and Mamba-attention fusion for high-precision PCB defect detection. PLOS ONE, 20(9), 1–18. https://doi.org/10.1371/journal.pone.0331175 [Google Scholar] [Crossref]

27. Rihan, S. D. A., Anbar, M., & Alabsi, B. A. (2023). Meta-learner-based approach for detecting attacks on Internet of Things networks. Sensors, 23(19), Article 8191. https://doi.org/10.3390/s23198191 [Google Scholar] [Crossref]

28. Seidel, R. (2022). Textile defect detection using YOLOv5 on AITEX dataset. In Proceedings of the IEEE Conference. University of São Paulo (USP). [Google Scholar] [Crossref]

29. Semitela, Â., Pereira, M., Completo, A., Lau, N., & Santos, J. P. (2025). Improving industrial quality control: A transfer learning approach to surface defect detection. Sensors, 25(2), Article 527. https://doi.org/10.3390/s25020527 [Google Scholar] [Crossref]

30. Shi, X., Mo, R., & Fu, Y. (2023). Physics-informed deep learning for traffic state estimation: A survey and the outlook. Algorithms, 16(6), Article 305. https://doi.org/10.3390/a16060305 [Google Scholar] [Crossref]

31. Smith, A. D., Du, S., & Kurien, A. (2023). Vision transformers for anomaly detection and localization in leather surface defect classification based on low-resolution images and a small dataset. Applied Sciences, 13(15), Article 8716. https://doi.org/10.3390/app13158716 [Google Scholar] [Crossref]

32. Sun, Q., Liu, Y., Chua, T.-S., & Schiele, B. (2019). Meta-transfer learning for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 403–412). IEEE. https://doi.org/10.1109/CVPR.2019.00049 [Google Scholar] [Crossref]

33. Tang, Y., Li, G., Zhang, M., & Li, J. (2024). Few-shot learning based on dimensionally enhanced attention and logit standardization self-distillation. Electronics, 13(15), Article 2928. https://doi.org/10.3390/electronics13152928 [Google Scholar] [Crossref]

34. Tian, Z., & Zhang, D. (2025). Continual graph learning with knowledge-augmented replay: A case for Ethereum phishing detection. Electronics, 14(17), Article 3345. https://doi.org/10.3390/electronics14173345 [Google Scholar] [Crossref]

35. Vasan, V., Sridharan, N. V., Vaithiyanathan, S., & Aghaei, M. (2024). Detection and classification of surface defects on hot-rolled steel using vision transformers. Heliyon, 10(19), Article e38498. https://doi.org/10.1016/j.heliyon.2024.e38498 [Google Scholar] [Crossref]

36. Wang, Q., Wang, M., Sun, J., Chen, D., & Shi, P. (2025). Review of surface-defect detection methods for industrial products based on machine vision. IEEE Access, 13, 90668–90697. https://doi.org/10.1109/ACCESS.2025.3571297 [Google Scholar] [Crossref]

37. Wang, Y., Qing, L., Wang, Z., Cheng, Y., & Peng, Y. (2022). Multi-level transformer-based social relation recognition. Sensors, 22(15), Article 5749. https://doi.org/10.3390/s22155749 [Google Scholar] [Crossref]

38. Wang, Z., Yang, E., Shen, L., & Huang, H. (2025). A comprehensive survey of forgetting in deep learning beyond continual learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3), 1464–1483. https://doi.org/10.1109/TPAMI.2024.3498346 [Google Scholar] [Crossref]

39. Wu, Q. (2024). NEU-DET [Data set]. IEEE Dataport. https://doi.org/10.21227/j84r-f770 [Google Scholar] [Crossref]

40. Xu, C., Fu, C., & Jiang, X. (2025). Advances in vehicle safety and crash avoidance technologies. Applied Sciences, 15(11), Article 5955. https://doi.org/10.3390/app15115955 [Google Scholar] [Crossref]

41. Xu, R., et al. (2025). FSCA: Few-shot learning via embedding adaptation with corner multi-head attention. Electronics, 14(1), Article 130. https://doi.org/10.3390/electronics14010130 [Google Scholar] [Crossref]

42. Yang, L., Huang, B., Guo, S., Lin, Y., & Zhao, T. (2023). A small-sample text classification model based on pseudo-label fusion clustering algorithm. Applied Sciences, 13(8), Article 4716. https://doi.org/10.3390/app13084716 [Google Scholar] [Crossref]

43. Zhang, W., et al. (2025). Deep learning-based automated detection of welding defects in pressure pipeline radiograph. Coatings, 15(7), Article 808. https://doi.org/10.3390/coatings15070808 [Google Scholar] [Crossref]

44. Zhang, Y., Lu, Z., Zhang, F., Wang, H., & Li, S. (2023). Machine unlearning by reversing the continual learning. Applied Sciences, 13(16), Article 9341. https://doi.org/10.3390/app13169341 [Google Scholar] [Crossref]

45. Zhou, Y., Zhang, P., Ye, Y., & Yue, Z. (2025). FiTGAN: Content fusion with style transformation for few-shot image generation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1–5). IEEE. https://doi.org/10.1109/ICASSP49660.2025.10888773 [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles