INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1990
www.rsisinternational.org
Disparities in Fertility Data Reporting: A Regional Perspective from
NFHS
Jagriti Gupta., Chander Shekhar
International Institute for Population Sciences, India
DOI: https://doi.org/10.51244/IJRSI.2025.120800179
Received: 26 Aug 2025; Accepted: 01 Sep 2025; Published: 18 September 2025
ABSTRACT
Background: The National Family Health Survey (NFHS) is a vital source of demographic and health statistics
in India, yet concerns remain about the accuracy of self-reported fertility data, particularly the number of living
children. Discrepancies in reporting can arise from recall bias, social desirability, and proxy reporting,
potentially distorting fertility and health indicators.
Methods: Using NFHS-IV (201516) and NFHS-V (201921), this study analyzed women who were both
household heads and eligible for the women’s questionnaire. Data from household and women’s files were
merged to compare the number of living children reported by household heads and individual women.
Matched and unmatched cases were categorized, and discrepancies were examined across age, residence,
education, religion, caste, and wealth index. Logistic regression was used to identify predictors of mismatches,
while spatial autocorrelation (Moran’s I and LISA cluster analysis) was applied to detect geographic patterns of
reporting inconsistencies.
Results: In NFHS-IV, 65.3% of reports matched, compared to 63.2% in NFHS-V, with mismatches increasing
with women’s age. Women aged 40 and above had over 20 times higher odds of mismatch compared to those
under 29. Rural women consistently showed higher odds of discrepancies than urban women (OR = 1.27 in
NFHS-IV; OR = 1.32 in NFHS-V). Education was a strong protective factor: women with higher education
had 6364% lower odds of mismatch compared to those with no education. Wealthier women reported more
accurately, while religion and caste showed only modest differences. Spatial analysis revealed clusters of high
mismatches in central and southern states, while districts in the Northeast and Jammu & Kashmir displayed
strong consistency.
Conclusion: Reporting discrepancies in NFHS fertility data are strongly associated with age, education,
residence, and wealth. Older, less educated, rural, and poorer women are particularly vulnerable to
misreporting. These findings underscore the need for targeted survey improvements, enhanced enumerator
training, simplified tools, and validation mechanisms to strengthen the reliability of fertility data and ensure
more equitable representation across demographic groups.
Keywords: NFHS, fertility data, reporting discrepancies, logistic regression, spatial analysis, data quality,
India
BACKGROUND
In India, the National Family Health Survey (NFHS) serves as a critical source of data. Despite its extensive
contributions, the NFHS, like many large-scale surveys, faces challenges related to data quality, particularly
concerning self-reported variables (IIPS& ICF, 2021). One persistent issue is the discrepancy in reporting of
the number of living children or births. The inconsistencies can arise from various factors, including recall
bias, social desirability bias, misinterpretation of survey questions, and interviewer effects (Singh, 2021;
Pullum, 2018). These discrepancies are especially problematic in secondary data analyses, where researchers
rely on the accuracy of pre-collected information without the opportunity for direct verification.
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1991
www.rsisinternational.org
These mismatches are not trivial, as they may distort key reproductive and demographic indicators such as the
total fertility rate (TFR), contraceptive prevalence, and unmet need for family planning. Several factors may
contribute to such reporting inconsistencies. Proxy reporting by the household head, who may not have
accurate knowledge of each woman's childbearing history, can introduce errors. Additionally, eligible women
may underreport or overreport their number of living children due to recall errors, social desirability bias,
misinterpretation of survey questions, or interviewer effects (Pullum, 2018; Singh, 2021). These errors are
especially common among older women, those with limited education, and those living in rural or
socioeconomically disadvantaged settings. In such contexts, where formal documentation of births may be
lacking and cultural norms influence openness about fertility, misreporting can be more pronounced.
Prior studies have noted these issues in NFHS and similar large-scale surveys. For instance, Singh and Sahu
(2021) identified irregularities in fertility reporting across survey rounds and highlighted that inconsistencies in
parity distributions may reflect deeper issues with data collection methodologies rather than genuine
demographic shifts. Furthermore, Jejeebhoy et al. (2010) found inconsistencies in reported fertility histories
within households, suggesting that even intra-household perceptions of reproductive outcomes can differ
significantly. While earlier research has often focused on mismatches between husbands and wives, the present
study shifts attention to mismatches between household heads and eligible womena less explored yet equally
important dimension of data quality.
These inconsistencies carry serious implications. Inaccurate reporting on the number of living children may
result in flawed estimates of fertility levels, misallocation of family planning resources, and misguided
evaluations of maternal and child health programs. This, in turn, can undermine the achievement of
Sustainable Development Goals (SDGs), particularly those related to reproductive health, gender equity, and
child survival. Moreover, discrepancies in fertility data limit the comparability of NFHS estimates across time
and regions, hindering longitudinal analyses and policy assessments.
Against this backdrop, the present paper aims to systematically investigate the discrepancies in the number of
living children reported by household heads and eligible women in the NFHS IV(201516) and NFHS V
(201921) datasets. Through descriptive statistics and logistic regression analysis, the study identifies the
socio-demographic correlates of mismatched cases, such as age, education, residence, caste, religion, and
wealth index. By examining patterns of underreporting and overreporting, the analysis contributes to a better
understanding of the structural and behavioral factors affecting data accuracy in large-scale demographic
surveys. Ultimately, the findings aim to inform improvements in survey design, enumerator training, and data
validation protocols, thereby enhancing the overall quality and reliability of national health statistics in India.
Comparisons between NFHS data and other demographic sources, such as the Sample Registration System
(SRS), have also revealed discrepancies. Bhat (2002) observed significant differences in TFR estimates
between NFHS and SRS data for the same reference years, particularly in certain states, indicating potential
issues in sampling, recall, or questionnaire design. Such inconsistencies can lead to distorted projections and
misaligned public health programs.
Regional variations in data quality add another layer of complexity. In some states, particularly in rural or low-
literacy settings, underreporting of births is more prevalent. This may be due to limited understanding of
survey questions, lack of formal birth records, or reluctance to disclose sensitive information. Conversely,
overreporting has been observed in regions where respondents feel pressure to conform to perceived family
size norms or expectations from field investigators (Singh, 2021). These variations compromise the
comparability of data across states and time, limiting the utility of NFHS data for longitudinal or cross-
regional studies. The implications of these inconsistencies are far-reaching. Poor data quality affects not only
academic research but also the effectiveness of national and international health programs. Inaccurate fertility
data may skew the assessment of unmet need for contraception, distort the relationship between fertility and
child mortality, or mislead efforts to achieve Sustainable Development Goals (SDGs) related to reproductive
health and gender equality. It is therefore imperative to systematically assess the reliability of fertility-related
data in the NFHS.
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1992
www.rsisinternational.org
This Paper aims to investigate the discrepancies in reporting the number of children or births by couples in
NFHS datasets, with a specific focus on mismatches between spouses’ responses, patterns of underreporting or
overreporting, and differences across survey rounds. Through statistical consistency checks, cross-validation
with external sources, and an exploration of demographic correlates, this study seeks to highlight systemic
issues in data quality. The broader goal is to contribute to the growing body of research advocating for
improved survey instruments, better training of enumerators, and enhanced data validation protocols in large-
scale surveys like the NFHS.
DATA AND METHODS
The person’s and women's files from NFHS IV and NFHS V were used for this paper. Women who were both
heads of their households and eligible for the women's questionnaire were included in the analysis. The
outcome variable considered was the number of living children. In NFHS IV, a total of 339,212 women were
household heads, while in NFHS V, this number was 401,618. The total number of eligible women was
699,686 in NFHS IV and 724,115 in NFHS V. For the analysis, the person and women's files were merged.
The total number of women who were both household heads and eligible for the questionnaire was 37,598 in
NFHS IV and 47,324 in NFHS V.
Questions were asked of the household head regarding the number of living children in the household, and the
individual file captured the number of living children that women reported as their own. After merging both
data sets, discrepancies were found in the total number of living children for some women. Some women
reported fewer children than the actual number, while others reported more. The number of living children was
categorized into matched (equal to) and unmatched (less than the actual number, 1 difference, 2 differences, or
3 or more differences). These two categories were then analyzed, considering predictor variables that may
influence the reporting of the total number of living children. The predictor variables included age, residence,
education, religion, caste, and wealth index. Bivariate analysis was performed to find the percentage
distribution of discrepancies in the reporting of living children. Additionally, binary logistic regression was
used to identify the odds of unmatched reporting based on different socio-demographic characteristics of
women.
Logistic Regression Analysis
A binary logistic regression model was applied, as the outcome variable is dichotomous. Predictor variables
included age, residence, education, religion, caste, and wealth index. The model estimates adjusted odds ratios
with 95% confidence intervals, allowing assessment of how sociodemographic factors influence the likelihood
of the outcome in NFHS-IV and NFHS-V (Long & Freese, 2014). The logistic regression equation can be
defined as follows:

󰇛

󰇜

󰇛
󰇜

Where p is the expected probability of the outcome variable, and
is the set of explanatory
variables, and
1,
2,
3, --------
k
are the regression coefficients to be estimated in the model (Ryan, 2008).
To analyze the spatial distribution and dependence patterns of matched and unmatched proportions of total
living children, a combination of univariate and bivariate Moran’s I statistics, significance maps, and Local
Indicators of Spatial Association (LISA) were employed. The univariate LISA map was utilized to detect
spatial clustering of individual variables across districts, while the bivariate LISA map examined spatial
associations between the predicted values and the spatially weighted average of explanatory variables. Moran’s
I, which ranges from −1 to +1, quantifies the degree of spatial autocorrelation: positive values indicate
clustering of similar values (high-high or low-low), negative values reflect clustering of dissimilar values
(high-low or low-high), and values close to zero suggest spatial randomness (Anselin, 1995; Getis, 2008).
Descriptive statistics, bivariate, and multilevel regression analyses were conducted using STATA version 16.0.
To produce district-level spatial visualizations and generate geographic shapefiles, ArcGIS software (version
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1993
www.rsisinternational.org
10.4) was employed. Additionally, GeoDa software was used to conduct spatial autocorrelation analysis and
generate LISA cluster maps, which are essential tools for identifying spatial outliers and local clusters (Anselin
et al., 2006; Chainey & Ratcliffe, 2005).
RESULTS
Percentage distribution of matched and unmatched number of living children by background
characteristics of the respondent in NFHS IV and NFHS V
Table 1 presents discrepancies between household and individual reports of the number of living children in
NFHS-IV. Approximately two-thirds of cases were consistent, while mismatches were disproportionately
concentrated among specific subgroups.
Age exhibited the most pronounced gradient: reporting consistency was nearly universal among women under
29 years (95%), but declined to below 50% among women aged 40 and above, with a substantial share
showing discrepancies of three or more children. Education demonstrated a similarly strong association, with
match rates rising from 58% among illiterate women to over 86% among those with higher education, and
large discrepancies virtually absent in the latter group. Residence and wealth also contributed to variation, with
higher accuracy observed among urban and wealthier women compared to their rural and poorer counterparts.
By contrast, differences across religion and caste were relatively modest.
Overall, reporting accuracy is most strongly conditioned by age, education, and socioeconomic status,
underscoring the influence of recall limitations and awareness rather than cultural or group-specific factors.
Table 1 Percentage of Matched and Unmatched Cases for Number of Living Children in NFHS IV
Background
Characteristics
Less than
Equal to
1 Difference
3 or more
Differences
Total
Age
Less than 29
1.19
94.91
3.08
0.18
100
30-39
1.2
78.79
14.39
1.09
100
40 and above
0.92
46.67
26.63
9.8
100
Residence
urban
1.15
69.31
17.58
3.84
100
rural
1.02
63.62
19.18
5.91
100
Education
Illiterate
1.25
57.9
21.25
7.46
100
primary
0.73
65.33
19.5
4.87
100
secondary
0.96
76.21
14.75
2.1
100
higher
0.61
86.03
10.22
0.83
100
Religion
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1994
www.rsisinternational.org
Hindu
1.06
65.1
18.89
5.18
100
Muslim
1.03
65.31
17.79
6.22
100
Christian
1.16
68.47
17.6
4.92
100
others
1.32
68.49
20.03
3.87
100
Caste
SC
1.2
63.99
18.77
5.81
100
ST
1.56
64.07
17.87
6.14
100
OBC
0.99
66.12
18.71
4.92
100
OTHERS
0.93
65.45
19.16
5.27
100
Wealth Index
poorest
1.04
66.19
17.3
6.41
100
poorer
1.17
62.02
19.44
6.07
100
middle
1.1
62.05
20.38
5.19
100
richer
1.23
66.88
19.19
4.07
100
richest
0.61
72.43
17.36
3.01
100
Total
1.08
65.3
18.73
5.31
100
Table 2 presents reporting consistency between household heads and women’s self-reports of the number of
living children in NFHS-V. Overall, 63% of cases matched, while over one-third displayed discrepancies, with
a notable share involving differences of two or more children. Age again showed the sharpest gradient: nearly
95% of women under 29 reported consistently, whereas accuracy declined to 42% among women aged 40 and
above, with more than one in ten reporting discrepancies of three or more children. Education was strongly
associated with reporting reliability, ranging from 53% consistency among women with no education to 86%
among those with higher education, with large mismatches almost absent in the latter group.
Residence and wealth followed similar patterns, with urban and wealthier women reporting more accurately
than their rural and poorer counterparts. Religion and caste displayed relatively modest variation, though
Scheduled Tribes and Muslims reported slightly higher rates of large discrepancies. Taken together, NFHS-V
results reinforce the patterns observed in NFHS-IV: age, education, residence, and economic status are the
most salient predictors of reporting accuracy, underscoring the importance of recall capacity and
socioeconomic context over cultural or group-specific factors.
Table 2 Percentage of Matched and Unmatched Cases for Number of Living Children in NFHS V
Background Characteristics
Less than
Equal
1 Difference
2 Difference
3 or More Differences
Total
Age
Less than 29
0.92
94.92
3.54
0.57
0.05
100
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1995
www.rsisinternational.org
30-39
0.8
77.58
15.48
4.8
1.34
100
40 and Above
0.71
42.49
28.36
17.31
11.12
100
Residence
Urban
0.84
67.21
19.47
8.4
4.09
100
Rural
0.75
61.64
20.01
10.96
6.64
100
Education
No Education
0.79
52.84
23.4
13.57
9.4
100
Primary
0.92
61.63
21.64
10.78
5.03
100
Secondary
0.71
73.93
15.99
6.74
2.63
100
Higher
0.62
85.97
9.61
3.22
0.59
100
Religion
Hindu
0.74
62.52
20.44
10.53
5.76
100
Muslim
1.05
65.4
17.15
9.5
6.89
100
Christian
0.62
68.61
17.31
8.16
5.3
100
Others
0.53
64.02
20.38
8.23
6.84
100
Caste
SC
0.82
62.18
20.14
10.58
6.27
100
ST
0.93
62.43
19.73
10.03
6.89
100
OBC
0.66
63.04
20.18
10.45
5.67
100
OTHERS
0.93
64.32
19.73
9.65
5.37
100
Wealth Index
Poorest
0.77
63.16
18.24
10.32
7.52
100
Poorer
0.77
60.7
19.65
11.69
7.2
100
Middle
0.78
60.95
21.84
10.71
5.73
100
Richer
0.75
63.65
21.74
9.76
4.1
100
Richest
0.83
71.96
17.96
6.94
2.31
100
Total
0.78
63.18
20.04
10.29
5.89
100
Table 3 presents the results of a binary logistic regression analysis examining the predictors of unmatched
cases in the number of living children reported by women in NFHS IV and NFHS V. The findings highlight
that age is the most significant predictor across both rounds. Compared to women under the age of 29, those
aged 3039 had over four times higher odds of mismatch (Odds Ratio [OR] = 4.45 in NFHS IV and 4.75 in
NFHS V), while women aged 40 and above had odds more than 20 times higher (OR = 20.74 in NFHS IV and
21.83 in NFHS V), suggesting that recall errors or data recording inconsistencies increase substantially with
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1996
www.rsisinternational.org
age. Residence also plays a notable role, with rural women exhibiting higher odds of mismatched reporting
than urban women. In NFHS-IV, the odds of mismatch for rural women were 1.27 times higher, which slightly
increased to 1.32 in NFHS V. This pattern indicates persistent rural-urban disparities in the quality or accuracy
of reporting.
Education level is inversely related to mismatched cases. Women with higher levels of education were
significantly less likely to report inconsistencies. Compared to women with no education, those with primary
education had around 1415% lower odds of mismatch, those with secondary education had about 40% lower
odds, and those with higher education had the lowest odds, approximately 6364% lower in both survey
rounds. This highlights the role of education in enhancing both awareness and accuracy in reporting.
Regarding religion, the NFHS IV data indicated that Muslim women had slightly higher odds of mismatch than
Hindus (OR = 1.14), but this difference was not significant in NFHS V, suggesting a possible improvement in
data accuracy over time. Christians in NFHS V had significantly lower odds of mismatch (OR = 0.88),
indicating relatively better consistency in reporting.
Caste did not emerge as a consistent predictor. Most caste groups, including Scheduled Castes (SC), Scheduled
Tribes (ST), and Other Backward Classes (OBC), had odds close to 1, implying no significant deviation from
the reference category. The “Other” caste group had slightly higher odds of mismatch in NFHS IV (OR =
1.08), but this was not statistically significant in NFHS V. In terms of wealth index, women from the richest
households had significantly lower odds of reporting unmatched cases compared to the poorest women. In
NFHS IV and NFHS V, the odds were 0.83 and 0.75, respectively, indicating better reporting among wealthier
women. Other wealth categories did not show significant differences.
Overall, the results consistently point toward the influence of age, education, residence, and to some extent,
wealth as key factors in predicting mismatched reporting of living children, emphasizing the need to target data
quality interventions toward older, less educated, and rural populations in future surveys.
Table 3: Binary logistic regression of unmatched cases of the number of living children
NFHS IV
NFHS V
Predictor Variables
Odds Ratio
95% Confidence Interval
Odds Ratio
95% Confidence Interval
Age
Less than 29
30-39
4.45***
3.94
5.02
4.75***
4.27
5.30
40 and above
20.74***
18.40
23.36
21.83***
19.63
24.28
Residence
Urban
Rural
1.27***
1.19
1.35
1.32***
1.24
1.40
Education
No Education
Primary
0.85***
0.79
0.91
0.86***
0.81
0.91
Secondary
0.60***
0.56
0.64
0.58***
0.55
0.62
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1997
www.rsisinternational.org
Higher
0.37***
0.31
0.43
0.36***
0.32
0.41
Religion
Hindu
Muslim
1.14**
1.05
1.23
1.00
0.93
1.08
Christian
0.93
0.84
1.03
0.88***
0.81
0.96
Others
0.88**
0.78
0.99
1.04
0.93
1.16
Caste
SC
ST
0.96
0.88
1.05
1.00
0.93
1.08
OBC
0.97
0.91
1.04
1.02
0.96
1.08
OTHERS
1.08*
1.00
1.17
1.04
0.96
1.12
Wealth Index
Poorest
Poorer
1.04
0.97
1.12
1.01
0.95
1.07
Middle
1.04
0.97
1.12
0.95
0.89
1.02
Richer
0.96
0.88
1.05
0.93
0.86
1.00
Richest
0.83***
0.75
0.93
0.75***
0.68
0.83
Pseudo R2
0.17
0.18
Spatial distribution of matched and unmatched number of living children by background characteristics
of the respondent in NFHS IV and NFHS V
Map 1 provides insights into the percentage of unmatched responses regarding the number of living children
among women who were both household heads and eligible respondents, highlighting variations in data
consistency across districts of India for NFHS IV. At the lower end of the spectrum, districts like East Siang
(7.67%) in Arunachal Pradesh, Chennai (8.41%) in Tamil Nadu, and Kupwara (9.06%) in Jammu & Kashmir
recorded the smallest discrepancies between household and individual reporting of the number of living
children. These low unmatched percentages indicate relatively high levels of consistency and accuracy in data
reporting in these areas. Slightly higher unmatched rates were seen in districts such as Mumbai (9.7%), Karnal
(10.46%), North Delhi (10.54%), Alappuzha (11.04%) in Kerala, and Dakshina Kannada (11.51%) in
Karnataka, which still reflect moderate reliability in data collection. However, a significant number of districts
reported unmatched in the range of 1320%, including Baramulla (13.59%), Faridkot (13.91%), Pashchim
Champaran (14.12%), and Badgam (14.93%), suggesting increasing inconsistencies that may arise from
discrepancies in survey interpretation or recall errors.
Moving to the higher end of the data, numerous districts fell within the 2030% of unmatched. Gwalior
(23.63%) in Madhya Pradesh, Patiala (23.83%) in Punjab, Lucknow (24.5%) in Uttar Pradesh, Bangalore
(24.71%) in Karnataka, Namakkal (24.75%) in Tamil Nadu, and Bahraich (24.7%) in Uttar Pradesh all
reported moderate to high levels of unmatched percentages. These figures point toward widespread issues in
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1998
www.rsisinternational.org
maintaining consistency across different data schedules in the survey process. At the extreme, unmatched
percentages crossed 30% in districts such as Allahabad (30.36%) and Deoria (30.15%) in Uttar Pradesh,
Porbandar (30.44%) in Gujarat, and Yanam (30.51%) in Puducherry, signaling a need for serious
improvements in data recording practices. These high rates of unmatched can reflect challenges such as a lack
of synchronization between interview modules, interviewer errors, and misunderstanding among respondents,
especially in rural or less literate populations.
Overall, the data from NFHS-IV reveals significant disparities in the accuracy and alignment of reported
numbers of living children across districts and states. While some regions demonstrate strong internal
consistency, many others show worrying levels of unmatched that could potentially compromise the reliability
of demographic and health statistics. These findings underscore the importance of refining survey
methodologies, improving training for enumerators, and ensuring clarity in questionnaire design to reduce data
inconsistencies in future rounds of health surveys.
Map 1: Unmatched percentage of the number of living children in NFHS IV
Map 2 shows the percentage of unmatched cases in the number of living children reported during the National
Family Health Survey V (NFHS-V) across districts in India. A notable observation is the exceptionally low
unmatched percentage in districts of Jammu & Kashmir, where eight districts, including Kupwara, Badgam,
Leh (Ladakh), Kargil, Punch, Rajouri, Kathua, and Baramula, reported a 0% mismatch, indicating highly
accurate data collection and/or excellent survey implementation. Other districts in the region, such as
Bandipore (2.11%) and Srinagar (2.12%), also exhibit very low mismatch rates, which further supports the
inference of high data quality in this area. Punjab and Haryana show a gradual increase in unmatched
percentages, ranging largely between 15% and 24%. For example, districts like Ludhiana (17.02%), Patiala
(18.13%), Faridkot (17.43%), and Amritsar (18.33%) illustrate moderate levels of discrepancy. In Haryana,
districts such as Hisar (22.65%), Rohtak (22.68%), and Rewari (22.81%) hover near the upper end of the scale,
suggesting growing inconsistencies that may require improvements in respondent engagement, questionnaire
clarity, or enumerator training.
The trend of increasing mismatch continues into Rajasthan, where the unmatched percentages reach their peak.
Districts like Sikar (24.69%), Jaipur (24.4%), Dausa (24.3%), and Karauli (24.19%) demonstrate the highest
recorded levels in this dataset, approaching or exceeding 24%. These figures indicate potential challenges in
maintaining data quality possibly due to a combination of factors such as larger populations, literacy gaps, or
complexities in household reporting.
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 1999
www.rsisinternational.org
Map 2: Unmatched percentage of the number of living children in NFHS V
The LISA (Local Indicators of Spatial Association) Cluster Map 5.3 for the unmatched percentage in NFHS-IV
reveals distinct spatial patterns in the discrepancies between household and individual schedule data across
various districts in India. The map identifies five key cluster types based on spatial correlation: High-High,
Low-Low, High-Low, Low-High, and Not Significant. High-High clusters, shown in red and comprising 37
districts, indicate regions with high mismatched percentages surrounded by similarly high-mismatch districts,
primarily concentrated in the southern states such as Andhra Pradesh, Telangana, Tamil Nadu, and parts of
Karnataka. These areas emerge as critical hotspots of data inconsistency, requiring immediate attention. In
contrast, the Low-Low clusters, marked in dark blue and also consisting of 37 districts, represent districts with
low unmatched percentages that are surrounded by others with similarly low values. These are mostly found in
parts of Kerala, Haryana, and the northeastern region, suggesting zones of strong data integrity. Additionally,
Low-High clusters (light blue, 16 districts) highlight districts with low mismatches amidst high-mismatch
neighbors, pointing to possible localized good practices. Conversely, High-Low clusters (pink, 17 districts)
reflect districts with high mismatches surrounded by low-mismatch regions, indicating possible anomalies or
localized data quality issues that merit targeted intervention. The vast majority of districts (533) fall under the
"Not Significant" category (gray), indicating no notable spatial clustering and possibly reflecting random
distribution or inconsistencies not concentrated in specific regions. One district remains undefined, potentially
due to data limitations. Overall, the LISA map underscores that while most of India does not show significant
spatial clustering, southern and parts of central India require focused efforts to improve data accuracy and
survey methodology, whereas certain northern and northeastern areas could serve as benchmarks for better
implementation.
Map 3: LISA cluster map for the number of living children in NFHS IV
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 2000
www.rsisinternational.org
The LISA Cluster Map 4 for the unmatched percentage in NFHS-V presents a spatial analysis of data
mismatches across Indian districts, indicating evolving trends in data consistency compared to NFHS-IV. The
map categorizes districts into five significant cluster types: High-High, Low-Low, High-Low, Low-High, and
Not Significant. High-High clusters (49 districts, in red) represent areas with high unmatched percentages
surrounded by similar districts, indicating zones of consistently poor data alignment primarily concentrated in
central Indian states like Madhya Pradesh, Chhattisgarh, and parts of Maharashtra and Telangana. These are
clear hotspots of concern, reflecting persistent or worsening issues in survey implementation and data
recording. Low-Low clusters (37 districts, dark blue) show districts with low mismatches surrounded by
similar low-performing neighbors, largely located in northern and northeastern states such as Punjab, Himachal
Pradesh, and parts of the Northeast, suggesting relatively strong data consistency in these areas. Low-High
clusters (light blue, 19 districts) and High-Low clusters (pink, 13 districts) indicate transitional or outlier
districts those that either outperform or underperform their neighbors pointing to local administrative or
methodological variations. The majority of the districts (589) fall into the Not Significant category (gray),
where no clear spatial clustering exists, potentially due to randomness or lack of spatial autocorrelation.
Additionally, two districts remain undefined (black), possibly due to data unavailability. Compared to NFHS-
IV, NFHS-V shows a geographic shift and intensification in the High-High clusters, particularly in central
India, signaling areas that require targeted quality assurance, while simultaneously reinforcing the role of
northeastern and select northern districts as zones of relatively higher data reliability.
Map 4: LISA cluster map for the number of living children in NFHS IV
DISCUSSION
The findings of this study highlight systematic differentials in the accuracy of reporting the number of living
children across sociodemographic groups in the NFHS. The results underscore that data quality is not uniform
but shaped by structural factors such as age, residence, education, and economic status. These variations are
consistent with evidence from previous demographic and health survey research, both in India and globally,
which has shown that the reliability of self-reported fertility histories depends on socioeconomic background,
literacy, and survey conditions (Pullum, 2006; Becker et al., 1998; Schoumaker, 2014).
A prominent finding was the significantly higher likelihood of mismatches among older women. Women aged
40 years and above had more than 20 times the odds of reporting inconsistencies compared to younger women.
Age-related recall bias has long been recognized as a challenge in retrospective fertility surveys, where
memory lapses, child mortality, and repeated survey participation may influence reporting (Bairagi & Amin,
1995; Potter, 1977). In the Indian context, where older cohorts often experienced higher child mortality, recall
of births and deaths can be particularly complex, explaining the greater inconsistencies observed among them
(Retherford & Choe, 2011).
Educational attainment emerged as one of the strongest predictors of reporting accuracy. Women with higher
education consistently demonstrated more reliable reporting, with match rates above 85% and substantially
reduced odds of mismatch compared to illiterate women. This supports prior studies which argue that
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 2001
www.rsisinternational.org
education enhances comprehension of survey questions, ability to recall and record life events, and familiarity
with written documentation (Bollen et al., 2001; Zulu & Dodoo, 1998). Higher education may also be
associated with better interaction with enumerators and increased access to official health records, which
further reduces reporting errors.
Economic status was another robust determinant of reporting consistency. Women from the richest households
reported more accurately, with the lowest proportions of large mismatches, while women from poorer
households exhibited significantly greater discrepancies. This association mirrors findings from earlier NFHS
assessments and other large-scale surveys in low- and middle-income countries (LMICs), where poverty often
correlates with lower literacy, weaker access to health infrastructure, and greater barriers in communicating
with survey fieldworkers (Curtis & Blanc, 1997; Pullum, 2006).
The ruralurban divide also played a notable role. Urban women demonstrated higher reporting accuracy than
rural women, with the gap exceeding 5 percentage points in NFHS-V. This pattern may be linked to higher
literacy, stronger health system penetration, and wider availability of medical records in urban areas, as
reported in other evaluations of survey data quality in India (Chandrasekhar et al., 2017; Guilmoto, 2012). In
rural and remote areas, logistical constraints, interviewer workload, and cultural barriers may increase the
likelihood of incomplete or inconsistent reporting.
By contrast, religion and caste displayed weaker associations with reporting quality. While some variations
were observed, for example, slightly higher mismatches among Muslims and Scheduled Tribes, these effects
were relatively modest. The attenuation of religious effects between NFHS-IV and NFHS-V suggests that data
collection practices may have become more standardized across groups, reducing disparities. However, the
persistently higher mismatch rates among Scheduled Tribes point to challenges linked to geographical
isolation, language barriers, and enumeration difficulties, which align with broader discussions on survey
undercoverage of marginalized populations in India (Desai & Dubey, 2011; Borooah, 2005).
In addition to these sociodemographic determinants, important spatial patterns were evident. Mismatches were
more prevalent in central and southern India, while reporting accuracy was comparatively higher in the
Northeast. Several explanations may account for these regional differences. Central and southern states such as
Madhya Pradesh, Chhattisgarh, and Andhra Pradesh have larger rural populations, higher proportions of
Scheduled Castes and Tribes, and greater socioeconomic inequality, all factors linked to weaker reporting
accuracy. These regions also experienced historically higher levels of fertility and child mortality, which may
compound recall difficulties, particularly among older women. By contrast, the Northeast is characterized by
smaller populations, stronger community-based networks, and comparatively higher literacy rates, particularly
among women (Dutta, 2020). Tighter kinship structures and smaller family sizes in many northeastern states
may also facilitate more accurate recall of fertility histories. Moreover, survey implementation in smaller states
may allow fieldworkers to provide closer supervision and adapt more effectively to local contexts, thereby
reducing enumeration errors. These findings align with prior studies showing that regional heterogeneity in
survey quality often reflects differences in administrative capacity, demographic histories, and social
organization (Casterline & el-Zeini, 2014).
Overall, these results reaffirm that survey data quality is not only a technical issue but also a reflection of
broader social inequalities and regional disparities. The groups most vulnerable to inconsistent reporting, older,
less educated, rural, and economically disadvantaged women in central and southern states, are also those
often-facing structural disadvantages in health and social outcomes. This has important implications for both
research and policy. Inaccuracies in reporting fertility histories can bias estimates of demographic indicators
such as fertility, mortality, and population projections, which in turn inform program design and resource
allocation (United Nations Population Fund [UNFPA], 2019; United Nations, 2017).
To address these challenges, targeted survey strategies are essential. Enhanced interviewer training, culturally
adapted tools, and simplified questionnaires have been shown to improve data quality in complex survey
contexts (Mensch et al., 2014; Groves et al., 2009). For older and less educated respondents, incorporating
visual aids, calendar methods, or community-based verification may help reduce recall error. In addition,
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 2002
www.rsisinternational.org
leveraging digital health records where available could complement survey data and reduce reliance on
memory-based reporting.
The study demonstrates that while NFHS data remain an invaluable resource for demographic research and
policymaking in India, attention must be given to both sociodemographic and regional disparities in reporting
accuracy. Ensuring equitable data quality across groups and geographies is critical to producing reliable
fertility and health statistics and to designing interventions that genuinely address the needs of vulnerable
communities.
Declaration
Author Contributions: JG was responsible for the conceptualization, data curation, formal analysis,
methodology, visualization, and preparation of the original draft. CS contributed through supervision and
provided critical review and editing of the manuscript. Both authors have read and approved the final version
of the manuscript.
Originality and Exclusivity: This manuscript is original and has not been published previously. It is not under
consideration for publication elsewhere and will not be submitted to another journal.
Conflict of Interest: The authors declare no conflict of interest.
Funding: Not applicable
Ethical Approval: This study is based on publicly available secondary data from the National Family Health
Survey (NFHS), which is anonymized and does not require separate ethical approval.
Data Availability Statement: The data used in this study are publicly available and can be accessed through
the Demographic and Health Surveys (DHS) Program website upon registration and approval. Specifically, the
National Family Health Survey (NFHS) data for Rounds I to V are available at:
https://dhsprogram.com/data/available-datasets.cfm.
REFERENCES
1. Anselin, L. (1995). Local indicators of spatial associationLISA. Geographical Analysis, 27(2), 93
115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x
2. Anselin, L., Syabri, I., & Kho, Y. (2006). GeoDa: An introduction to spatial data analysis. Geographical
Analysis, 38(1), 522. https://doi.org/10.1111/j.0016-7363.2005.00671.x
3. Bairagi, R., & Amin, S. (1995). Contraceptive failure, acceptor’s characteristics, and continuation of
use: Results from a longitudinal study. International Family Planning Perspectives, 21(1), 2127.
https://doi.org/10.2307/2133529
4. Becker, S., Feyisetan, K., & Makinwa-Adebusoye, P. (1998). The effect of the sex of interviewers on
the quality of data in a Nigerian family planning questionnaire. Studies in Family Planning, 29(2), 189
196. https://doi.org/10.2307/172157
5. Bhat, P. N. M. (2002). On the quality of fertility estimates from the 1991 Census of India. Demography
India, 31(1), 126.
6. Borooah, V. K. (2005). Caste, inequality, and poverty in India. Review of Development Economics,
9(3), 399414. https://doi.org/10.1111/j.1467-9361.2005.00284.x
7. Casterline, J. B., & el-Zeini, L. O. (2014). Unmet need and fertility decline: A comparative perspective
on prospects in sub-Saharan Africa. Studies in Family Planning, 45(2), 227245.
https://doi.org/10.1111/j.1728-4465.2014.00385.x
8. Chainey, S., & Ratcliffe, J. (2005). GIS and crime mapping. Wiley.
9. Chandrasekhar, S., Ladusingh, L., & Gupta, R. (2017). Fertility transition and data quality in India: An
assessment based on the National Family Health Survey. Asian Population Studies, 13(1), 3653.
https://doi.org/10.1080/17441730.2016.1245605
INTERNATIONAL JOURNAL OF RESEARCH AND SCIENTIFIC INNOVATION (IJRSI)
ISSN No. 2321-2705 | DOI: 10.51244/IJRSI |Volume XII Issue VIII August 2025
Page 2003
www.rsisinternational.org
10. Curtis, S. L., & Blanc, A. K. (1997). Determinants of contraceptive failure, switching, and
discontinuation: An analysis of DHS contraceptive histories. DHS Analytical Reports No. 6. Macro
International Inc.
11. Desai, S., & Dubey, A. (2011). Caste in 21st century India: Competing narratives. Economic and
Political Weekly, 46(11), 4049.
12. Dutta, S. (2020). Literacy and womens empowerment in Northeast India: A district-level analysis.
Journal of Social Inclusion Studies, 6(2), 171189. https://doi.org/10.1177/2394481120982942
13. Getis, A. (2008). A history of the concept of spatial autocorrelation: A geographers perspective.
Geographical Analysis, 40(3), 297309. https://doi.org/10.1111/j.1538-4632.2008.00738.x
14. Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009).
Survey methodology (2nd ed.). Wiley.
15. Guilmoto, C. Z. (2012). Skewed sex ratios at birth and future marriage squeeze in China and India,
20052100. Demography, 49(1), 77100. https://doi.org/10.1007/s13524-011-0083-7
16. International Institute for Population Sciences (IIPS), & ICF. (2017). National Family Health Survey
(NFHS-4), 201516: India. Mumbai: IIPS.
17. International Institute for Population Sciences (IIPS), & ICF. (2021). National Family Health Survey
(NFHS-5), 201921: India. Mumbai: IIPS.
18. Jejeebhoy, S. J., Sathar, Z. A., & Zavier, A. J. F. (2010). Fertility, contraceptive use, and reproductive
health in India: Findings from the 200506 National Family Health Survey. Studies in Family Planning,
41(3), 118.
19. Long, J. S., & Freese, J. (2014). Regression models for categorical dependent variables using Stata (3rd
ed.). College Station, TX: Stata Press.
20. Mensch, B. S., Hewett, P. C., & Erulkar, A. S. (2014). The reporting of sensitive behavior among
adolescents: A methodological experiment in Kenya. Demography, 40(2), 247268.
https://doi.org/10.1353/dem.2003.0017
21. Potter, J. E. (1977). Problems in using birth-history analysis to estimate trends in fertility. Population
Studies, 31(2), 335364. https://doi.org/10.2307/2173918
22. Pullum, T. W. (2006). An assessment of age and date reporting in the DHS surveys, 19852003. DHS
Methodological Reports No. 5. Macro International Inc.
23. Pullum, T. W. (2018). An assessment of age and date reporting in DHS surveys, 20002015 (DHS
Methodological Report No. 19). ICF.
24. Retherford, R. D., & Choe, M. K. (2011). Statistical models for causal analysis. Wiley.
25. Ryan, T. P. (2008). Modern regression methods. Wiley.
26. Schoumaker, B. (2014). Quality and consistency of DHS fertility estimates, 1990 to 2012. DHS
Methodological Reports No. 12. ICF International.
27. Singh, A. (2021). Quality of fertility data in large-scale surveys: Evidence from NFHS. Journal of
Population and Social Studies, 29, 512529.
28. Singh, S. K., & Sahu, D. (2021). Data quality issues in large-scale surveys: Evidence from parity
distribution in India. Demography India, 50(1), 2336.
29. United Nations Population Fund. (2019). State of world population 2019: Unfinished businessthe
pursuit of rights and choices for all. UNFPA.
30. United Nations. (2017). Principles and recommendations for population and housing censuses (Rev. 3).
United Nations.
31. Zulu, E. M., & Dodoo, F. N.-A. (1998). Sexual size dimorphism and reproductive strategies in human
populations. Journal of Biosocial Science, 30(4), 419433.
https://doi.org/10.1017/S0021932098004198