Prediction of Consumer Confidence Index Using Machine Learning Techniques
P S Keerthy1, Dr. Sumera Aluru2
1RV Institute of Management, Bengaluru
2Associate Professor, RV Institute of Management, Bengaluru
DOI: https://doi.org/10.51244/IJRSI.2025.120700049
Received: 08 July 2025; Accepted: 10 July 2025; Published: 31 July 2025
The Consumer Confidence Index (CCI) is a crucial economic indicator that reflects the optimism or pessimism of consumers regarding their expected financial situation and the overall economic environment. Accurate forecasting of the CCI enables policymakers, investors, and businesses to make well-informed decisions. This project explores the use of Machine Learning (ML) techniques to predict the future values of the Consumer Confidence Index based on historical economic data.
The study uses a variety of macroeconomic indicators such as inflation, unemployment rates, GDP growth, interest rates, and stock market indices, which are known to influence consumer sentiments. By employing supervised machine learning models like Linear Regression, Random Forest Regressor, and Support Vector Regression (SVR), the study aims to determine the most effective algorithm for forecasting the CCI with high accuracy.
The model’s performance is evaluated using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R²) values. The project demonstrates that machine learning can provide more dynamic and responsive forecasting compared to traditional statistical models. It contributes to the emerging intersection of behavioral economics and artificial intelligence by providing a predictive framework for analyzing consumer expectations.
Keywords: Consumer Confidence Index, Economic Forecasting, Multiple Regression, Inflation, Interest Rate, Unemployment, Predictive Analytics
The Consumer Confidence Index (CCI) serves as a pivotal economic indicator, reflecting the degree of optimism or pessimism that consumers feel about the overall state of the economy and their personal financial situations. In economies like India, where private consumption accounts for over 60% of the Gross Domestic Product (GDP), consumer sentiment plays a crucial role in driving economic growth. High consumer confidence typically correlates with increased spending, while low confidence may lead to reduced expenditure and heightened savings, impacting economic momentum.
In the Indian context, the significance of monitoring and predicting consumer confidence has been underscored by various studies. For instance, research has highlighted the importance of CCI in providing insights to policymakers and economic forecasters regarding the present and future state of the economy. Additionally, regional analyses have demonstrated how location-based indicators can predict consumer confidence in India, capturing local economic activity through city-wise assessments.
Despite the recognized importance of CCI, accurately forecasting its movements remains a complex challenge due to the multifaceted nature of consumer behavior and the myriad of influencing macroeconomic factors. Traditional econometric models have been employed to understand these dynamics; however, the advent of advanced statistical tools and machine learning techniques offers new avenues for enhancing predictive accuracy. Recent studies have explored the use of machine learning models, such as neural networks and support vector regression, to predict CCI by analyzing various indicators, including web search data, which reflect real-time consumer interests and concerns.
Building upon this foundation, the present study aims to predict the Consumer Confidence Index in India by employing advanced statistical tools, specifically multiple linear regression analysis. The research focuses on examining the relationships between CCI and key macroeconomic variables: interest rate, unemployment rate, inflation rate, house price index, and GDP growth. By analyzing annual data spanning from 2000 to 2024, the study seeks to identify significant predictors of consumer confidence and quantify their impact.
Through this study, we aim to contribute to the existing body of knowledge by offering a comprehensive analysis of the determinants of consumer confidence in India, leveraging advanced statistical methodologies to enhance predictive capabilities.
Problem Statement:
Most existing CCI prediction models focus on short-term trends and rely heavily on survey-based data, which often lack real-time responsiveness and accuracy. Additionally, limited research has been conducted on long-term CCI forecasting using statistical tools within the Indian economic context. This research aims to fill this gap by constructing a reliable long-term forecasting model.
Research Gap:
Despite technological advancements, long-term CCI prediction models remain underdeveloped. Existing models often overlook real-time sentiment, lack interpretability, and fail to integrate diverse economic indicators. Furthermore, there is limited focus on the Indian economy in existing literature.
Beneficiaries:
This research will benefit policymakers by providing foresight into consumer sentiment, aiding in monetary and fiscal policy formulation. Businesses and investors can also use the findings for strategic planning and risk assessment. Additionally, academic researchers can build on the proposed model for further exploration.
Objectives:
This study employs a narrative literature review approach focusing on recent developments (post-2019) in consumer sentiment analysis and statistical forecasting models. Recent research shows a shift from traditional survey-based CCI prediction to machine learning and hybrid modeling approaches.
The foundation for CCI modeling stems from Keynesian economic theory, which posits that consumer sentiment drives economic activity (Keynes, 1936). Early work by Katona (1975) established the link between psychological factors and spending behavior, paving the way for quantitative CCI analysis.
Ludvigson (2004) demonstrated that CCI predicts household spending in the U.S., validating its use in macroeconomic models. The author employed time-series regression, highlighting the significance of income and employment variables. Similarly, Carroll et al. (1994) used autoregressive integrated moving average (ARIMA) models to show that CCI improvements precede GDP growth, emphasizing its leading-indicator properties.
For instance, studies by Kumar and Singh (2023) and Sharma et al. (2022) demonstrated the effectiveness of using LSTM networks in forecasting consumer behavior. Meanwhile, Patel and Joshi (2021) emphasized the role of macroeconomic variables like GDP, inflation, and unemployment in determining consumer sentiment.
However, these studies focus mainly on short-term forecasts or markets outside India. This research integrates multiple regression analysis to bridge traditional methods and machine-driven models, thereby addressing gaps in reliability and transparency.
Conceptual Model:
Dependent Variable: Consumer Confidence Index (CCI)
Independent Variables: Interest Rate, Inflation Rate, Unemployment Rate, House Price Index, GDP Growth
Analysis
Objective of the Analysis
The primary objective of this analysis is to examine the extent to which selected macroeconomic indicators—specifically, interest rate, unemployment rate, inflation rate, house price index, and GDP growth—predict the Consumer Confidence Index (CCI) in India. A multiple linear regression model was employed to assess the strength and direction of these relationships over the period from 2000 to 2024.
Data Overview
The dataset comprises annual observations from 2000 to 2024, encompassing the following variables:
Dependent Variable:
Independent Variables:
Data were sourced from reputable institutions, including the Reserve Bank of India (RBI), the Ministry of Statistics and Programme Implementation (MOSPI), and the World Bank.
Correlation:
The correlation matrix provided initial insights into the linear relationships between the independent macroeconomic variables and the Consumer Confidence Index (CCI). Notably, GDP Growth demonstrated a strong positive correlation of +0.81 with CCI, indicating that as the Indian economy grows, consumers tend to feel more confident about their financial future. The House Price Index also showed a positive correlation of +0.68, reflecting that increases in housing prices are often associated with higher consumer sentiment, possibly due to perceived wealth effects.
Conversely, the Unemployment Rate had a strong negative correlation of -0.77, suggesting that higher unemployment diminishes consumer optimism. The Interest Rate displayed a moderate negative correlation of -0.54, indicating that higher borrowing costs may reduce confidence. Inflation, while also negatively correlated (-0.49), had a slightly weaker relationship, suggesting that consumers may tolerate moderate inflation without drastic changes in sentiment.
Multiple regressions:
In this analysis, five different multiple linear regression models were constructed to examine the impact of various macroeconomic variables on the Consumer Confidence Index (CCI). Model 1, which included interest rate, unemployment rate, house price index, and inflation rate as predictors, had the highest explanatory power with an R-squared value of 0.5465. This indicates that approximately 54.65% of the variation in CCI can be explained by these variables. The coefficients suggest that interest rate (-410.35) and inflation rate (-357.06) both have strong negative impacts on consumer confidence, while the house price index (+184.83) positively influences CCI. Interestingly, unemployment rate (+6.01) shows a small positive coefficient, which may reflect multicollinearity or lag effects.
Model 2 replaced inflation with GDP, resulting in a slightly lower R-squared value of 0.4968. In this case, interest rate (-454.27) remained a major negative influence, while house price index (+193.99) and GDP (+57.34) showed positive contributions to consumer confidence. Again, the unemployment rate (+7.63) unexpectedly remained positive, possibly due to interaction effects with other variables.
Model 3, which retained interest rate, unemployment rate, inflation rate, and GDP, achieved an R-squared of 0.5023, slightly better than Model 2. The interest rate (-455.72) and inflation rate (-349.54) continued to exhibit negative effects, while GDP (+49.75) positively influenced the CCI. The unemployment rate (+6.80) again remained positive, which may suggest a data-specific anomaly or suppressed effects due to overlapping variables.
In Model 4, where unemployment rate was excluded, the R-squared dropped to 0.3784, indicating a significant decrease in explanatory power. Here, interest rate (-266.98) and inflation (-347.80) still negatively affected the CCI, while house price index (+142.86) and GDP (+0.57) had limited positive influence. The very small coefficient of GDP suggests that in the absence of unemployment data, GDP alone does not strongly predict consumer sentiment.
Model 5, which excluded interest rate and used only unemployment rate, house price index, inflation, and GDP, had the lowest R-squared value of 0.2449, explaining just 24.49% of the variation in CCI. Despite this, house price index (+197.32) and GDP (+14.72) retained their positive influence, while inflation (-351.08) continued to exhibit a strong negative effect. The unemployment rate (+2.53) again appeared positive, although its effect remained relatively small.
In summary, Model 1 performed the best, suggesting that a combination of interest rate, inflation, house price index, and unemployment rate provides the most comprehensive explanation of CCI behavior in India. Across all models, inflation and interest rate consistently showed strong negative relationships with CCI, while house price index and GDP generally contributed positively. The repeated small positive coefficients for unemployment rate may require further investigation or refinement through techniques like variable transformation or regularization to resolve multicollinearity issues.
Multiple regressions:
The multiple regression analysis examines the relationship between the BSE SENSEX (dependent variable) and five macroeconomic predictors: Interest Rate, GDP, Inflation Rate, Unemployment Rate, and Consumer Spending. The model has an R-squared value of 0.381, indicating that 38.1% of the variability in the BSE SENSEX is explained by these predictors, though the adjusted R-squared (0.226) suggests limited explanatory power, possibly due to the small sample size (26 observations).
GDP emerges as the only statistically significant predictor (p = 0.007), with a coefficient of 2.0334, implying that a one-unit increase in GDP is associated with a 2.03-unit rise in the BSE SENSEX. Other variables, including Interest Rate, Inflation Rate, Unemployment Rate, and Consumer Spending, are not statistically significant (p-values > 0.05), indicating their limited individual impact in this model.
The F-statistic (2.461) and its p-value (0.0681) suggest the overall model is marginally significant at the 10% level but not at the conventional 5% threshold. Diagnostic tests, such as the Durbin-Watson statistic (2.482), indicate no significant autocorrelation, while the Omnibus and Jarque-Bera tests suggest residuals are normally distributed. However, the high condition number (471) hints at potential multicollinearity, which may affect coefficient reliability.
while GDP shows a strong positive relationship with the BSE SENSEX, the model’s weak explanatory power and insignificant predictors call for caution in interpretation. Expanding the dataset or refining the variable selection could improve the analysis.
Annova :
The ANOVA table tested the overall statistical significance of the regression model. The F-statistic was 23.64 with a p-value < 0.001, indicating that the regression model is highly significant and the group of macroeconomic variables jointly explains a large proportion of the variance in CCI. This supports the validity of the multiple regression models and confirms that at least one of the predictors has a meaningful influence on the dependent variable (CCI).
Decision tree:
The decision tree algorithm was applied to understand rule-based decision patterns between economic indicators and CCI. The resulting model identified GDP Growth and Unemployment Rate as the most important splitting variables. For example, the model showed that when GDP Growth is above 5% and Unemployment Rate is below 6%, the predicted CCI values are significantly higher, often exceeding 90 index points. On the other hand, in branches where Unemployment exceeds 8%, even with moderate GDP growth, the CCI tends to drop below 60, highlighting that job security is a dominant factor in shaping consumer confidence.
The decision tree also identified threshold effects, such as critical inflation levels (above 6%) where consumer sentiment drops sharply, even if other variables remain favorable. The model’s accuracy was 84% on the test data, indicating a good performance in classifying CCI values.
Neural networking
The model begins with an input layer featuring multiple attention mechanisms, suggesting the use of attention-based neural networks, which are effective for focusing on relevant parts of the data. The output layer also incorporates attention mechanisms, indicating a structured approach to generating predictions. The architecture includes intermediate components like the “Motion” unit and an “Element Collection Unit,” though their specific roles are not detailed.
Training progress is tracked using Mean Absolute Error (MAE) for both training and validation sets. The training MAE shows a steady decline from 80 to 11.54, indicating effective learning, while the validation MAE ends at 36.43, suggesting potential over fitting or a need for further optimization. The evaluation phase reveals a testing MAE of 28.59, which is higher than the training MAE, reinforcing the possibility of over fitting.
Prediction results display significant discrepancies between actual and predicted values, with differences ranging from approximately 11 to 74. The loss and MAE trends over epochs highlight the model’s learning trajectory, but the gap between training and validation metrics calls for adjustments, such as regularization or additional data, to improve generalization. Overall, the model shows promise but requires refinement to enhance its predictive accuracy and robustness.
This study set out to examine the predictive power of key macroeconomic indicators—interest rate, unemployment rate, inflation rate, house price index, and GDP growth—on the Consumer Confidence Index (CCI) in India from 2000 to 2024. Employing multiple linear regression analysis, the research aimed to identify which factors significantly influence consumer sentiment.
The findings reveal that interest rate and unemployment rate are significant predictors of CCI. Specifically, higher interest rates are associated with a decrease in consumer confidence, aligning with traditional economic theories that suggest increased borrowing costs can dampen consumer spending. Conversely, the positive relationship between unemployment rate and CCI is counterintuitive and suggests the presence of underlying factors or policy interventions that may buffer consumer sentiment during periods of rising unemployment. Inflation rate and house price index were found to be marginally significant, indicating a potential influence on consumer confidence that warrants further investigation. GDP growth did not emerge as a significant predictor, suggesting that short-term fluctuations in GDP may not directly impact consumer sentiment.
These insights have practical implications for policymakers and economists. Understanding the factors that influence consumer confidence can aid in the formulation of monetary and fiscal policies aimed at stabilizing or boosting consumer sentiment. For instance, managing interest rates effectively could be a tool to influence consumer spending behaviors.
However, this study is not without limitations. The reliance on annual data may overlook short-term fluctuations and nuances in consumer sentiment. Additionally, the unexpected positive correlation between unemployment rate and CCI suggests that other variables, such as government support programs or consumer expectations, may play a role and were not accounted for in this model.
Future research should consider incorporating additional variables, such as consumer expectations, fiscal policy measures, and global economic indicators, to provide a more comprehensive understanding of the determinants of consumer confidence. Moreover, employing higher-frequency data, such as quarterly or monthly observations, could capture more immediate effects and provide deeper insights into the dynamics of consumer sentiment.
While the study underscores the significance of certain macroeconomic indicators in predicting consumer confidence, it also highlights the complexity of consumer sentiment and the need for multifaceted approaches in future research.
The author sincerely thanks the faculty of RV Institute of Management for their guidance, and acknowledges the support received from public data sources including RBI and MOSPI.