Comparison of Similarity Distance-Based Metrics for HODA and BANGLA Dataset for Enhanced Precision

Authors

Mgd Maaz Taha Yassin

Universiti Teknikal Malaysia Melaka (UTeM) (Malaysia)

Amirul Ramzani Radzid

Universiti Teknikal Malaysia Melaka (UTeM) (Malaysia)

Mohd Sanusi Azmi

Universiti Teknikal Malaysia Melaka (UTeM) (Malaysia)

Nur Atikah Arbain

Universiti Teknikal Malaysia Melaka (UTeM) (Malaysia)

Article Information

DOI: 10.47772/IJRISS.2025.91200013

Subject Category: Computer Science and Smart Tourism

Volume/Issue: 9/12 | Page No: 141-150

Publication Timeline

Submitted: 2025-12-09

Accepted: 2025-12-16

Published: 2025-12-30

Abstract

A similar metric is often used as a tool to measure the degree of similarity between two objects or pieces of data. It is essential in many areas of study including data analysis, machine learning and image processing, which provides a way to compare and evaluate the similarity of different entities. These metrics can be categorized into distance-based and similarity-based approaches, each with their strengths and applications. Therefore, this study is to do a comparison of various distance metrics on image classification performance using HODA and Bangla handwritten digit datasets. A comprehensive evaluation is conducted on eight different distance measures, namely Euclidean, Manhattan, Chebyshev, Canberra, Cosine, Minkowski, Jaccard, and Sorenson, within the Mean Average Precision (MAP) metric framework to evaluate their effectiveness in the context of handwritten digit recognition. Experimental results show that Chebyshev distance produces the highest classification accuracy of 71.6% on the HODA dataset, while Euclidean distance achieves the best performance on the Bangla dataset with 70.7% accuracy. In addition to quantitative analysis, a user study involving a structured questionnaire was conducted to qualitatively verify the MAP-based evaluation methodology. Results from user evaluations further reinforce the empirical findings. Therefore, the study underlines the importance of choosing an appropriate distance metric that is adapted to the specific properties of the dataset, highlighting its role in improving the performance of pattern recognition systems in computer vision applications.

Keywords

distance metric, image classification, HODA dataset, BANGLA dataset

Downloads

References

1. Yassin, M. M. T. (2023). Comparison on distance metric learning for enhanced precision (Master’s thesis). Universiti Teknikal Malaysia Melaka, Melaka, Malaysia. [Google Scholar] [Crossref]

2. Mulyana, D. I., Hafidz, A., Sumantri, D. B., & Nugroho, K. S. (2022). Identification of Buni fruit image using Euclidean distance method. SinkrOn: Jurnal dan Penelitian Teknik Informatika, 7(2), 392–398. https://doi.org/10.33395/sinkron.v7i2.11333. [Google Scholar] [Crossref]

3. Suwanda, R., Syahputra, Z., & Zamzami, E. M. Z. (2020). Analysis of Euclidean distance and Manhattan distance in the K-means algorithm for variations in number of centroids K. Journal of Physics: Conference Series, 1566(1), Article 012058. https://doi.org/10.1088/1742-6596/1566/1/012058. [Google Scholar] [Crossref]

4. Mridha, M. F., Ohi, A. Q., Ali, M. A., Emon, M. I., & Kabir, M. M. (2021). Bangla writing: A multi-purpose offline Bangla handwriting dataset. Data in Brief, 34, 106633. https://doi.org/10.1016/j.dib.2021.106633. [Google Scholar] [Crossref]

5. Hossein, K., & Ehsanollah, K. (2007). Introducing a very large dataset of handwritten Farsi digits and a study on their varieties. Pattern Recognition Letters, 28(10), 1133–1141. https://doi.org/10.1016/j.patrec.2007.01.002. [Google Scholar] [Crossref]

6. Arbain, N. A., Azmi, M. S., Ahmad, S. S. S., Muda, A. K., Jalil, I. E. A., & Tiang, K. M. (2017). Dynamic similarity distance with mean average precision tool. Pertanika Journal of Science & Technology, 25(S), 11–18. [Google Scholar] [Crossref]

7. Prapemrosesan klasifikasi algoritme kNN menggunakan K-means dan matriks jarak untuk dataset hasil studi mahasiswa. (2020). Jurnal Teknologi dan Sistem Komputer, 8(4), 311–316. [Google Scholar] [Crossref]

8. Alvarez-Melis, D. (2020). Geometric dataset distances via optimal transport. In Advances in Neural Information Processing Systems (Vol. 33, pp. 21428–21439). [Google Scholar] [Crossref]

9. Kim, J., Cho, S., & Choi, J. (2004). Iris recognition using wavelet features. Journal of VLSI Signal Processing – Systems for Signal, Image, and Video Technology, 38(2), 147–156. https://doi.org/10.1023/B:VLSI.0000040426.72253.b1. [Google Scholar] [Crossref]

10. Marinov, M., Valova, I., & Kalmukov, Y. (2019, May 16–17). Comparative analysis of existing similarity measures used for content-based image retrieval. In Proceedings of the 2019 X National Conference with International Participation (ELECTRONICA) (pp. 1–4). https://doi.org/10.1109/ELECTRONICA.2019.8825645. [Google Scholar] [Crossref]

11. Puram, V., Bobbili, R. R., & Thomas, J. P. (2024). Quantum algorithm for Jaccard similarity. arXiv, arXiv:2408.08940. [Google Scholar] [Crossref]

12. Nurnaningsih, D., Alamsyah, D., Herdiansah, A., & Sinlae, A. A. J. (2021). Identifikasi citra tanaman obat jenis rimpang dengan Euclidean distance berdasarkan ciri bentuk dan tekstur. Building of Informatics, Technology and Science (BITS), 3(3), 171–178. https://doi.org/10.47065/bits.v3i3.1019. [Google Scholar] [Crossref]

13. Prabiantissa, C. N., Ririd, A. R. T. H., & Asmara, R. A. (2017). Sistem identifikasi batik alami dan batik sintetis berdasarkan karakteristik warna citra dengan metode K Means clustering. Jurnal Informatika Polinema, 3(2), 26–34. https://doi.org/10.33795/jip.v3i2.10. [Google Scholar] [Crossref]

14. Hassan, M. R., Hossain, M. M., Bailey, J., & Ramamohanarao, K. (2008, September 15–19). Improving k nearest neighbour classification with distance functions based on receiver operating characteristics. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008 (LNCS, Vol. 5211, pp. 489–504). Springer. https://doi.org/10.1007/978-3-540-87479-9_50. [Google Scholar] [Crossref]

15. Arnab, M., Islam, Z., Mamun Al Imran, G., & Lasker, E. A. (2021, September 14–16). Iris recognition using wavelet features and various distance based classification. In Proceedings of the International Conference on Electronics, Communications and Information Technology (ICECIT) (pp. 1–6). Khulna University, Khulna, Bangladesh. [Google Scholar] [Crossref]

16. Alamri, S. S. A., Bin Sama, A. S. A., & Bin Habtoor, A. S. Y. (2016). Satellite image classification by using distance metric. International Journal of Computer Science and Information Security, 14(3), 1–4. https://doi.org/10.6084/m9.figshare.3153877. [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles