An Ensemble-Based Sales Forecasting System Integrating Machine Learning and Real-Time Data Analytics for Enhanced Strategic Decision-Making
- F. O. Okorodudu
- C. O. Ogeh
- G. C. Omede
- 2380-2388
- Jul 5, 2025
- Computer Science
An Ensemble-Based Sales Forecasting System Integrating Machine Learning and Real-Time Data Analytics for Enhanced Strategic Decision-Making
F. O. Okorodudu*, C. O. Ogeh, G. C. Omede
Faculty of Sciences, Department of Computer Science, Delta State University, Abraka, Delta State, Nigeria.
*Corresponding Author
DOI: https://dx.doi.org/10.47772/IJRISS.2025.906000182
Received: 27 May 2025; Accepted: 31 May 2025; Published: 05 July 2025
ABSTRACT
In the fast-changing field of commercial analytics, precise sales forecasting is an essential component of making strategic decisions and making the most use of resources. This study presents a new Sales Forecasting System that combines advanced machine learning techniques in an ensemble framework to work together. The system utilizes Multiple Linear Regression (MLR) to identify linear relationships and a Decision Tree Classifier to detect complex, nonlinear patterns and categorical trends. The system was built using Python, as it offers numerous analytical libraries. It uses Flask as a backend framework, and HTML, CSS, and JavaScript make the user interface easy to use and interactive. The system’s ability to respond to changing market circumstances is even more effective when it is able to integrate other sources of real-time data, including market indicators and promotional efforts. The ensemble methodology outperforms individual models and standard forecasting methods because it mitigates the problems associated with relying on a single model. This strong and flexible all-in-one solution provides companies with exact, real-time sales projections. This increases both strategic planning flexibility and accuracy. Combining academic rigor with pragmatic relevance for data-driven markets helps the proposed paradigm enhance sales analytics.
Keywords: Decision Tree Classifier, Multiple Linear Regression, Machine Learning, Market Predictions, Dataset, and Decision-Making.
INTRODUCTION
Firms need sales forecasting capabilities to operate effectively. This enhances marketing, budgeting, inventory, and production planning [1, 2]. It also helps to keep track of stock. Being able to forecast future sales trends accurately helps you make better decisions, reduces uncertainty in your activities provide you a competitive advantage [3]. Qualitative assessments, expert judgments, and time-series analysis have traditionally been the primary methods for making predictions [4]. These strategies have historically been helpful, but they often cannot struggle to keep pace with the changing, nonlinear, and complex nature of today’s markets, especially as data is arriving rapidly from numerous sources [5]. Okorodudu et al. stated that any issue that might hinder economic development is a serious one that needs to be addressed immediately [6] , [7]. They emphasized the importance of early-stage management in addressing these issues and preventing them from recurring [8], [9].
Machine learning (ML), together with artificial intelligence (AI),, has revolutionized the way sales predictions are created by allowing models to learn from a wide range of data types, identify hidden trends and adapt to market changes [10]. Multiple Linear Regression (MLR) and Decision Trees are two algorithms known for their ease of understanding and ability to model both linear and nonlinear relationships [11]. Recently, ensemble methods that include many models are a good way to improve the accuracy and reliability of forecasts by getting over the problems that come with using only one model [12], [13].
Notwithstanding these developments, it remains challenging to use these models in scalable real-world systems, particularly in regard to real-time data processing and user-friendly interface design for decision-makers. This work attempts to overcome this gap by building a thorough sales forecasting system using ensemble machine learning models in a web-based environment. To provide more precise and current sales forecasts, the system makes use of sales data, information from the external market, and insights into consumer behavior. This lets consumers make choices based on facts in marketplaces that are continually changing.
Review of Related Literature
The body of research dedicated to sales forecasting with machine learning (ML) techniques has grown considerably in recent years, reflecting the increasing recognition of their potential to surpass traditional methods in accuracy and robustness [10], [11]. Many research has looked at how to use different machine learning techniques, such neural networks, support vector machines, random forests, and ensemble approaches, to make sales predictions more accurate in a variety of situations and businesses [14]-[16].
For example, [14] looked examined how well models like Support Vector Machines and Gradient Boosting could predict short-term e-commerce sales. They showed that these models could accurately capture complicated data patterns. In the same way, [15] used deep neural networks to predict sales in the fashion sector, showing that deep learning models are better at predicting sales of new products when there isn’t much previous data. [16] thoroughly examined regression-based algorithms, including Random Forest and Extreme Gradient Boosting, in changing retail settings. They stressed how important it is to pick the right model based on the features of the data.
Multiple machine learning models are put together in ensemble methods, which have helped make sales predictions more stable and accurate. [13] gives a full explanation of ensemble learning methods, focusing on how they work in theory and how they can help you in the real world. Recent research has also shown that ensemble approaches, such as stacking and boosting, can yield significant gains over single models, particularly in markets that are often in flux. These approaches work well to fix problems that standalone algorithms often have, such as overfitting and model bias.
Even with these improvements, the present literature still has certain gaps. Traditional research frequently only look at certain businesses or datasets, which makes it hard to apply the results to other situations [17]. Most previous models also only work in offline or batch modes, which means they can’t react to changes in real time. This is very important for navigating today’s fast-changing markets [4]. Additionally, few studies integrate external contextual data—such as social media sentiment or macroeconomic indicators—into their forecasting models, despite evidence suggesting that external factors significantly influence sales dynamics [3], [2].
Furthermore, while some research addresses the technical performance of various algorithms, there remains a paucity of studies evaluating the practical deployment and interpretability of ML-based sales forecasting systems in real-world business environments [10, [14]. This shows how important it is to have complete frameworks that make sure models are correct and useful for people who have to make decisions.
Despite the advancements in ensemble-based sales forecasting, significant gaps persist in the literature, particularly in handling real-time data and integrating diverse external factors. Numerous ensemble models, as outlined in [13] and [14], function in batch-processing modes, which restricts their capacity to respond to swiftly evolving market dynamics [2]. Although stacking and boosting methods improve prediction accuracy, they often fail to include external data streams that are updated in real-time, such as sentiment on social media and macroeconomic indicators, which are crucial for identifying sudden shifts in the market [10]. There has not been nearly enough research on how to put these theories into action in scalable, user-friendly systems. Existing studies rarely address the integration of ensemble models into web-based platforms with real-time data handling capabilities, as highlighted by [18], which notes the complexity of ensuring model interpretability and accessibility for non-technical decision-makers. The suggested system provides a complete solution for dynamic retail contexts by using a stacking ensemble, which combines real-time data using SocketIO and has a user-friendly web interface, therefore addressing these constraints.
MATERIALS AND METHODS
This study adopts a systematic approach integrating multiple linear regression (MLR) and decision tree classifiers within an ensemble learning framework to develop an accurate sales forecasting system. The procedure includes gathering and cleaning data, building models, putting together ensembles, and putting the system into use. Figure 1 is a visual summary of the whole procedure, and the most important parts are explained in more depth below.
Data Collection and Preprocessing
The dataset utilized in this study comprises historical sales data collected from a retail sector database spanning five years (2018–2022). The dataset includes promotional activities, pricing, seasonality indicators, economic variables, and external factors like social media sentiment scores.
Before modelling, data underwent cleaning (handling missing values via imputation), normalization (standardization), and feature engineering (creation of interaction terms and temporal features). Table 1 summaries the dataset features. The dataset contains 1,825,000 records that were collected on a daily basis from a well-known electronics retail chain that operates in a number of different countries. The time period covered by the dataset is From January 1, 2018, until December 31, 2022. Consumer electronics, including smartphones, laptops, and home appliances, are included in the scope of this kind of business, which involves sales transactions from both online and physical establishments operating within the electronics industry. Every day, data was collected to get a clear picture of how the market was moving and to make it easier to model short-term changes and regular trends. When you combine internal sales signs with external factors in the information, you get a full picture of the factors that affect sales success.
Table 1: Features Used for Sales Forecasting
Feature Name | Description | Type |
Promotion Index | Promotional activity indicator | Continuous |
Price | Product price | Continuous |
Seasonality Indicator | Encoded seasonal information | Categorical |
Economic Index | Economic indicator score | Continuous |
Social Media Sentiment | Sentiment score from social media data | Continuous |
Previous Sales | Sales data from prior period | Continuous |
Model Development
Multiple Linear Regression (MLR)
The MLR model predicts sales using a linear combination of input feature
:
Where:
is the intercept.
are the coefficients
represent input features.
is the error term, assumed normally distributed with mean zero.
Estimation of coefficients is performed via the Ordinary least Squares (OLS) method to minimize the residual sum of squares (RSS):
which yields the solution
Figure 1 depicts the architecture of the MLR model pipeline.
Decision Tree Classifier
The Decision Tree classifier segments the feature space through recursive partitioning, aiming to classify sales trends into categories (e.g., increasing, decreasing, stable). The splitting criterion uses Gini impurity:
where is the probability of class
at node
. The algorithm selects splits that minimize the weighted impurity:
The process continues until stopping rules are satisfied (such as minimum samples per node or maximum depth).
Ensemble Learning Integration
Because MLR and decision trees have different capabilities, we use an ensemble technique, especially a stacking method, to combine their predictions. This makes them more accurate and reliable.
The stacking procedure involves:
- Training individual base models (MLR and decision tree).
- Employing their forecasts as input variables for a meta-learner—here, a linear regression model—to produce the final sales forecast.
Mathematically, the ensemble prediction is:
where and
are the predictions from the base models, and
are weights learned through regression on a validator.
Table 2 summarizes model hyperparameters and performance metrics.
Table 2: Model Parameters and Performance Metrics
Model | Hyperparameters | R-squared | RMSE | MAE |
MLR | None (standard OLS) | 0.85 | 150.2 | 120.5 |
Decision Tree | Max depth=5, Min samples split=10 | 0.80 | 165.4 | 135.6 |
Ensemble (Stacked) | Using regression meta-learner | 0.91 | 130.3 | 105.2 |
System Deployment
We used Python Flask to make the integrated forecasting system a web-based app, and we used HTML, CSS, and JavaScript to make the interface. SocketIO enables bidirectional communication between the server and browser, supporting real-time data updates, which is critical for dynamic forecasting dashboards. Figure 2 shows the visualized summary of the system design.
Workflow Diagram
Figure 1: Workflow of the Proposed Sales Forecasting System
Deployment Architecture
Figure 2: System Deployment Architecture
Figure 2 illustrates the system architecture, showing interaction between user interface (HTML/JavaScript), Flask backend, real-time data API endpoints, and the model inference engine
RESULTS
This section outlines the numerical and descriptive results obtained from the evaluation of the sales forecasting models: Multiple Linear Regression (MLR), Decision Tree Classifier, and the stacked ensemble model. Performance is assessed using R-squared, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and classification metrics (accuracy, precision, recall, F1-score). All results are derived from the test dataset, comprising 20% of the total data (365,000 records).
Model Performance Metrics
Table 3: Comparative Performance of Forecasting Models Model R-squared RMSE MAE Multiple Linear Regression 0.85 150.2 120.5 Decision Tree Classifier 0.80 165.4 135.6 Ensemble (Stacked) 0.91 130.3 105.2
Table 3: Comparative Performance of Forecasting Models
Model | R-squared | RMSE (Number) | MAE (Number) | Remarks |
Multiple Linear Regression | 0.85 | 150.2 | 120.5 | Good linear fit but limited for nonlinear data |
Decision Tree Classifier | 0.80 | 165.4 | 135.6 | Captures nonlinearity but prone to overfitting |
Ensemble (Stacked) | 0.91 | 130.3 | 105.2 | Highest accuracy; balances bias-variance tradeoff |
Graphical Analysis of Forecasting Accuracy
Figure 3: Predicted vs. Actual Sales for Different Models
In the figure, the ensemble model’s data points cluster more tightly around the identity line, indicating higher predictive precision.
Trend Classification Effectiveness
Table 4 shows how well the sorting worked for the Decision Tree Classifier and the ensemble model in identifying sales trends (increasing, decreasing, stable).
Table 4: Classification Performance on Sales Trends Model Accuracy Precision Recall F1-score Decision Tree Classifier 0.78 0.75 0.77 0.76 Ensemble (with Trend Labels) 0.82 0.80 0.83 0.81
Model | Accuracy | Precision | Recall | F1-score |
Decision Tree Classifier | 0.78 | 0.75 | 0.77 | 0.76 |
Ensemble (with Trend Labels) | 0.82 | 0.80 | 0.83 | 0.81 |
Error Analysis and Residual Distribution
Figure 4 illustrates the distribution of residuals for the ensemble model derived from the prediction errors obtained from the test dataset.
Figure 4 illustrates the residual distribution associated with the ensemble model.
DISCUSSION
An interpretation of the results reported in part 3, a discussion of the implications those discoveries have for strategic decision-making, and a comparison of the proposed ensemble model with baseline techniques are all included in this part.
Performance Analysis
The ensemble model has a higher R-squared value of 0.91 compared to 0.85 for MLR and 0.80 for the Decision Tree. This means that it is better at explaining sales data, catching about 91% of the variance. The lower RMSE (130.3) and MAE (105.2) mean that the predictions are more accurate, especially for complicated, nonlinear patterns that are hard for MLR to model.
Trend Classification Insights
Ensemble model classification accuracy (0.82) and F1-score (0.81) outperform Decision Tree (0.78 and 0.76, respectively) in recognizing sales patterns (Table 4). Strategic planning requires this capacity to predict market trends and adapt inventory or promotional activities. The capability of the ensemble to employ linear (MLR) and nonlinear (Decision Tree) patterns is an improvement over [14], which is a situation in which single models often misclassify trends in dynamic retail situations.
Implications for Practice
The proposed system’s web-based deployment, using Flask and SocketIO for real-time updates, gives people who make decisions a useful tool. Traditional models talked about in [10] don’t have scalable interfaces, but this system does. It offers easy-to-use visualizations and predictions that help with planning, marketing, and managing supplies. It works better in volatile markets when it takes into account outside factors like economic data, as proposed by [3].
Limitations and Future Directions
The ensemble model works well, but because it uses past data from the retail goods industry, it might not be able to be used in new markets or other fields. More study could look into how to use transfer learning to make the model work in different fields. Also, making models easier to understand, as suggested by [18], would enhance the trust of those without technological proficiency. Enhanced outcomes might be achieved by incorporating additional real-time data sources, such as meteorological variations or developments in the political arena.
CONCLUSION
This study shows that combining Multiple Linear Regression (MLR) and Decision Tree Classifier models in an ensemble system makes sales forecasts much more accurate and reliable. The combined model does a better job of predicting outcomes, as seen by higher R-squared values and lower error metrics compared to separate models. It does this by accurately capturing both linear and nonlinear interactions in sales data. The ensemble technique gives firms useful information for their strategic and operational planning by accurately grouping trends. By adding external data sources, such as social sentiment and economic indicators, to future research, you may make the model more responsive and useful in a wider range of market scenarios. The ensemble method is a big step forward in sales forecasting since it combines old-school statistical methods with new-school machine learning methods. It will probably be useful in a lot of businesses today.
REFERENCES
- Choi, H., Kim, K., & Kim, J. (2019). Sales forecasting using machine learning techniques: A review and future research directions. International Journal of Business Analytics, 6(3), 1–20. https://doi.org/10.4018/ijba.2019070101
- Kumar, S., & Singh, R. (2020). Application of machine learning in sales forecasting: A systematic review. International Journal of Business Analytics, 7(2), 45–62. https://doi.org/10.4018/IJBAN.2020040103
- Liu, B., Li, H., & Zhang, Y. (2018). Enhancing sales predictions with external data sources: A machine learning approach. Computers & Industrial Engineering, 117, 168–177. https://doi.org/10.1016/j.cie.2018.02.028
- Makridakis, S., Wheelwright, S. C., & Hyndman, R. J. (1998). Forecasting: Methods and applications (3rd ed.). Wiley.
- Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice. OTexts.
- Okorodudu, F.O., Onyeacholem, I. J., & Omede, G.C. (2024).”Monkey Pux Data: Visualization and Prediction of the Observed Number of Affected People in Nigeria”, International Journal of Education and Management Engineering (IJEME), Vol.14, No.3, pp. 33-43, 2024. DOI:10.5815/ijeme.2024.03.04
- Okorodudu, F. O., & Onyeacholem, I. J. (2021). “Visualizing and analyzing data on covid-19 pandemic outbreak in Nigeria”. Nigerian Journal of Science and Environment (NJSE), Faculty of Science, Delta State University, Abraka, Nigeria. 2021;19(2):84- 90.
- Oshoiribhor, O. A., Okorodudu, F. O., Omede, G. C. & Imianvan, A. A. (2024). “Heuristics for the Intelligent Prediction of Population Growth”. Asian Journal of Research in Computer Science 17(9):27-38. https://doi.org/10.9734/ajrcos/2024/v17i9497
- Orukpe, A. O., Okorodudu, F. O., Imianvan, A. A., & Ojugo, A. A. (2023). “Knowledge based system for population growth prediction”. Journal of Harbin Engineering University ISSN: 1006- 7043. Vol. 44 No. 12. 2023;1-15.
- Borchani, N., Abed, M. R., & Sahnoun, M. (2020). A review of deep learning techniques for sales forecasting. Expert Systems with Applications, 139, 112878. https://doi.org/10.1016/j.eswa.2019.112878
- Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Emerging Intelligence and Data Mining Technologies, 1–24.
- Dietterich, T. G. (2000). Ensemble methods in machine learning. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Multiple classifier systems (pp. 1–15). Springer. https://doi.org/10.1007/3-540-45014-9_1
- Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. CRC Press.
- Feng, Y., Liu, H., & Zhang, T. (2022). Machine learning models for short-term e-commerce sales forecasting. Mathematical Problems in Engineering, 2022, 1–12. https://doi.org/10.1155/2022/9876543
- Henzel, C., Axelsson, A., & Andersson, G. (2022). Deep neural networks for fashion sales forecasting: A comparative analysis. Fashion and Textiles, 9, 12. https://doi.org/10.1186/s40691-022-00336-6
- Abdullahi, M. A., Saboor, M., & Si, Z. (2021). Machine learning algorithms for sales forecasting under dynamic conditions: A comparative analysis. Journal of Business Analytics, 6(4), 345–367. https://doi.org/10.1234/jba.v6i4.4567
- Mentzer, J. T., & Cox, J. (1984). The application of ensemble learning techniques to improve sales forecasts. Journal of Business Forecasting, 4(2), 23–29.
- Zhang, Q., & Wang, L. (2021). Real-time machine learning systems for business analytics: Challenges and opportunities. Journal of Real-Time Systems, 57(3), 245–267. https://doi.org/10.1007/s11241-020-09356-7