Embedding Learning for Unsupervised Breast Cancer Images Clustering

Authors

Andriamasinoro Rahajaniaina

Department of Mathematics, Computer Science and Applications, University of Toamasina, Toamasina (Madagascar)

Adolphe Andriamanga Ratiarison

Department of Physics and Applications, University of Antananarivo, Antananarivo (Madagascar)

Article Information

DOI: 10.51584/IJRIAS.2026.110400082

Subject Category: Computer Science

Volume/Issue: 11/4 | Page No: 1172-1179

Publication Timeline

Submitted: 2026-04-14

Accepted: 2026-04-19

Published: 2026-05-08

Abstract

Early detection of breast cancer significantly reduces the number of deaths caused by this disease. In Africa where the number of new cases and deaths is constantly increasing. For Madagascar, very little information is available regarding the number of people affected by this disease. Advances in the application of artificial intelligence in medicine are improving the techniques for detecting this disease. Unfortunately, most of these techniques are cumbersome, complex, and very expensive. In this work, we propose a lightweight, hybrid approach to clustering breast cancer images. Our approach combines deep learning, ArcFace and unsupervised clustering. The architecture relies on the MobileNetV3Small convolutional network as a feature extractor. At the output of the backbone, a projection head is added to transform the feature maps into a compact embedding vector. The goal is to project the data into a low-dimensional (64-dimensional) latent space, where the discriminating properties between classes are strengthened. The use of ArcFace ameliorate intra-class compactness and inter-class separability, enhancing the quality of the learned representations. Two phases of training were adopted: firstly, only the projection layers and the ArcFace layer are trained, with the backbone remaining frozen to stabilize the learning process. Then, partial fine-tuning is performed by unfreezing the final layers of the convolutional neural network. Principal Component Analysis algorithm is used to facilitate the structuring of the embedding in a lower-dimensional space while preserving most of the discriminating information. A comparative study was conducted to evaluate the clustering capabilities of K-Means and HDBSCAN. The overall metrics results show that K-Means provides the best results for all metrics used. Despite the lightweight of our model (3,6 GFLOPs), it achieved a performance comparable to other state-of-the-art approach.

Keywords

Embedding learning, Unsupervised Clustering

Downloads

References

1. Ali et al. (2023). Breast Cancer Classification through Meta-Learning Ensemble Technique Using Convolution Neural Networks. Diagnostics 2023, volume 13. 19 pages, 2023. https://doi.org/10.3390/diagnostics13132242 [Google Scholar] [Crossref]

2. Bilal Ahmed Lodhi (2021). Unsupervised Method to Localize Masses in Mammograms. arXiv: 1904.06044v1[cs.CV] 12 Apr 2019.IEEE Access, vol. 9, pp. 99327-99338, 2021. [Google Scholar] [Crossref]

3. C. D. Manning, P. Raghavan, and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, 2008. [Google Scholar] [Crossref]

4. Ferlay, J., Ervik, M., Lam, F., et al. (2021). Global Cancer Observatory: Cancer Today. International Agency for Research on Cancer (IARC), 2021. [Google Scholar] [Crossref]

5. Jedy-Agba, E., McCormack, V., Adebamowo, C., & dos-Santos-Silva, I. (2016). Stage at diagnosis of breast cancer in sub-Saharan Africa: A systematic review. The Lancet Global Health, 4(12), e923–e935, 2016. [Google Scholar] [Crossref]

6. Jiankang Deng, Jia Guo, Jing Yang, Niannan Xue, Irene Kotsia, and Stefanos Zafeiriou (2015). ArcFace: Additive Angular Margin Loss for Deep Face Recognition. journal of latex class files, vol. 14, no. 8, august 2015. arXiv:1801.07698v4 [cs.CV] 4 Sep 2022 [Google Scholar] [Crossref]

7. Kashef, R., & Kamel, M. S. (2009). Cooperative clustering. Pattern Recognition, 42(10), pp. 2324-2349, 2009. [Google Scholar] [Crossref]

8. L. Hubert and P. Arabie (1995). Comparing Partitions, Journal of Classification, vol. 2, no. 1, pp. 193–218, 1985. https://doi.org/10.1007/BF01908075 [Google Scholar] [Crossref]

9. M. M. Eltoukhy and I. Faye (2014). An optimized feature selection method for breast cancer diagnosis in digital mammogram using multiresolution representation. Appl. Math, 8(6), pp. 2921–2928, 2014. [Google Scholar] [Crossref]

10. Nikhil Sanjay Suryawanshi (2023). Enhancing Breast Cancer Diagnosis Through Clustering: A Study of KMeans, Agglomerative, and Gaussian Mixture Models. International Journal of Innovative Science and Research Technology, Volume 8, Issue 7, pp 3497-3504, July 2023. [Google Scholar] [Crossref]

11. Peng, X., et al. (2017). Deep clustering via integrating sparse subspace clustering analysis and deep [Google Scholar] [Crossref]

12. representation. Pattern Recognition Letters, 98, pp. 74-83, 2017. [Google Scholar] [Crossref]

13. Ron Kohavi (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), 1995. [Google Scholar] [Crossref]

14. Somenath Chakraborty (2021). Beddhu Murali. Investigate the Correlation of Breast Cancer Dataset using Different Clustering Technique. arXiv:2109.01538v1[cs.CV], 2021. [Google Scholar] [Crossref]

15. https://doi.org/10.48550/arXiv.2109.01538 [Google Scholar] [Crossref]

16. Strehl, A., & Ghosh, J. (2002). Cluster ensembles a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research,3(Dec), 583-617, 2002. [Google Scholar] [Crossref]

17. Sulaiman Vesal, Nishant Ravikumar, Stephan Ellman, Andreas Maier (2018). Comparative Analysis of Unsupervised Algorithms for Breast MRI Lesion Segmentation. arXiv: 1802.08655v1[cs.CV] 23 feb 2018. 6 pages, 2018. https://doi.org/10.48550/arXiv.1802.08655. [Google Scholar] [Crossref]

18. Sung, H., Ferlay, J., Siegel, R. L., et al. (2021). Global cancer statistics 2020. CA: A Cancer Journal for Clinicians, 71(3), pp. 209–249, 2021. [Google Scholar] [Crossref]

19. T. M. Cover and J. A. Thomas (2006). Elements of Information Theory, 2nd ed., Wiley-Interscience, 2006. [Google Scholar] [Crossref]

20. Vanderpuye, V., Grover, S., Hammad, N., et al. (2017). An update on the management of breast cancer in Africa. Infectious Agents and Cancer, 12(13), 2017. [Google Scholar] [Crossref]

21. Wang, Y., et al. (2023). Graph Convolutional Clustering: A Deep Learning Approach to Graph Clustering. In Proceedings of the 13th ACM International Conference on Web Search and Data Mining, pp. 861-869, 2023. [Google Scholar] [Crossref]

22. World Health Organization (WHO). (2020). Global Health Observatory data repository. 2020. [Google Scholar] [Crossref]

23. Zaheer, M., Reddi, S., Sachan, D., Kale, S., & Kumar, S. (2019). Distributed Deep Clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9489-9498, 2019. [Google Scholar] [Crossref]

24. https://www.kaggle.com/datasets/vuppalaadithyasairam/ultrasound-breast-images-for-breast-cancer [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles