INTERNATIONAL JOURNAL OF RESEARCH AND INNOVATION IN SOCIAL SCIENCE (IJRISS)
ISSN No. 2454-6186 | DOI: 10.47772/IJRISS | Volume IX Issue XII December 2025
commonly used, they have limitations, especially in dealing with complex variations in data features. This study
and others have examined the comparison of various distance metrics, including Chebyshev, Euclidean,
Manhattan, and others, in handwritten data classification. Mean Average Precision (MAP) is widely used as the
main evaluation measure to measure classification effectiveness, providing a more accurate and holistic picture
of performance than traditional measures. There are also innovative approaches that propose the use of MAP-
based distance metrics to improve classification accuracy by overcoming the shortcomings of traditional metrics.
The selection of distance metrics needs to be tailored to the nature and structure of the dataset used.
Incorporating distance metrics with text line segmentation will help in clustering and grouping the text contents.
By leveraging these metrics, the system can more accurately determine spatial relationships between characters
and words, ensuring that elements belonging to the same line are grouped together. This approach also helps
minimize segmentation errors caused by irregular spacing, skewed writing, or overlapping strokes. Furthermore,
using distance-based analysis allows the segmentation method to adapt dynamically to variations in handwritten
or printed text layouts. The idea is to differentiate the distance between the lines of the text elements. This will
allow the segmentation of the text lines is a document. This study was done by Amirul et. el. in 2015 for datasets
Mushaf Al-Quran [5, 6]. Another study was done by using a method of text line segmentation using a hybrid
projection based neighboring properties for Mushaf Al-Quran text [7]. Other than that, the segmentation by using
multiphase level segmentation on Mushaf Al-Quran text also has been proposed [8]. By incorporating distance
metrics with text segmentation or text analysis will help to achieve accurate and efficient segmentation.
Problem Statement
Data classification, particularly in the domain of digitized handwritten images such as the HODA and Bangla
datasets, remains one of the central challenges in pattern recognition [5]. This difficulty arises from the high
degree of variability inherent in human handwriting, including differences in stroke thickness, writing
orientation, digit shape, and individual writing habits. Furthermore, the presence of noise, distortion, and uneven
pixel distribution adds another layer of complexity. In such conditions, selecting an appropriate distance metric
becomes essential, as it directly influences how similarity between samples is measured and, ultimately, how
accurately a classifier can distinguish between different digit classes.
Traditional distance metrics such as Euclidean and Manhattan, although widely implemented in many
recognition systems, often show inconsistent performance when applied to complex, high-variation datasets.
These metrics assume a relatively uniform and linear distribution of data points, which is not always the case in
real-world handwriting samples. As a result, their effectiveness decreases when confronted with nonlinear or
irregular feature spaces. This limitation underscores the need for a more thorough evaluation of distance metrics
beyond conventional approaches. To address this, the present study employs Mean Average Precision (MAP), a
ranking-based evaluation method capable of assessing the entire retrieval list rather than only the top prediction.
MAP provides a more comprehensive measure of classification effectiveness, especially in k-NN or similarity-
based models.
Evidence from prior research further highlights the shortcomings of traditional metrics. Arbain et al. [6]
demonstrated that the Euclidean distance can produce unstable classification results when dealing with nonlinear
feature distributions, reinforcing concerns about its reliability across diverse datasets. This disparity between
theoretical expectations and practical performance forms a central issue in the problem statement. It suggests
that improving classification accuracy requires not only testing multiple distance metrics but also understanding
their behavior in relation to specific dataset characteristics.
Thus, this study aims to identify and address the underlying causes of inaccuracy in distance-based classification
of handwritten digits. It seeks to determine whether these inaccuracies stem from limitations in the selected
similarity metrics, the nature of the datasets, or the ranking methodology used to assess performance. To
strengthen this investigation, the report integrates insights from existing literature, offering a broader perspective
on the challenges, methodological advancements, and evaluation strategies related to distance similarity
measures and their assessment using MAP. By doing so, the study contributes to a deeper understanding of the
relationship between distance metrics and classification outcomes, guiding future developments in pattern
recognition and metric-based learning.
Page 142