F1-score, ensuring a multidimensional view of classification effectiveness. Precision and recall offered insights
into the model’s ability to correctly identify spending categories without misclassification, while the F1 -score
balanced these two metrics for scenarios involving category imbalance. Confusion matrices were generated to
visualize misclassification patterns, particularly in categories that historically exhibited overlap such as
Clothing vs. Other and Fruits vs. Food. Receiver Operating Characteristic (ROC) curves and Area Under the
Curve (AUC) values were analyzed to measure the classifier's ability across different threshold levels. For
forecasting tasks using LSTM, evaluation metrics such as Mean Squared Error (MSE), Root Mean Squared
Error (RMSE), and Mean Absolute Error (MAE) were employed to quantify prediction accuracy. The
evaluation results confirmed that the Random Forest classifier and LSTM predictor significantly outperformed
the baseline models, achieving high reliability in both category classification and future spending predictions.
Dataset Description
The expanded dataset contains 1,250 transactions collected over 90 days from 27 anonym zed users aged 18–
45. Data sources include digital receipts, POS logs, bank statements, and self entries. Attributes include:
transaction date, amount, category, merchant name, payment mode, and user demographic group. Category
breakdown: Food (310), Clothing (180), Utilities (160), Entertainment (140), Fruits (120), Other (340).
Because the 'Other' category disproportionately dominated classification, k-means clustering was applied to
create refined subcategories such as Transport, Gifts, and Household Supplies, improving model clarity.
Model Tuning
Model tuning was performed to optimize predictive performance and ensure that both the classification and
forecasting components of Smart Pocket generalized effectively across diverse spending patterns. Hyperpa-
rameter optimization was carried out using a combination of grid search, random search, and cross-validation
to systematically explore optimal parameter configurations for each model. For traditional machine learning
classifiers such as Random Forest, the number of trees, maximum depth, minimum sample split, and feature
selection strategies were fine-tuned to balance accuracy and computational efficiency. Similarly, tuning of
SVM involved identifying the most effective kernel function, regularization parameter (C), and gamma values.
For the LSTM forecasting model, architectural refinements were explored including variations in the number
of hidden layers, number of units per layer, dropout rates, and activation functions (ReLU, tanh). Sequence
window lengths and batch sizes were also optimized to capture temporal dependencies more effectively. Early
stopping and learning rate scheduling were incorporated to stabilize training and avoid overfitting. Feedback
from validation metrics, confusion matrices, and domain insights on transaction behavior guided iterative ad-
justments to the model architecture.
These tuning strategies collectively resulted in improved classification accuracy, reduced forecasting error, and
enhanced robustness, ensuring that the final models performed reliably across different spending categories
and time-series patterns.
Deployment
The final machine learning models were seamlessly integrated into the Smart Pocket application through a se-
cure, scalable, and modular deployment architecture. Containerization using Docker ensured consistent
runtime environments across development, testing, and production stages, eliminating dependency conflicts
and enabling reproducible builds. The machine learning components—responsible for expense classification
and spending prediction—were deployed as independent microservices, allowing efficient scaling and mainte-
nance.
A RESTful API layer built using Flask connected the machine learning modules with the Next.js frontend, en-
abling real-time predictions and interactive financial insights for users. These APIs facilitated smooth commu-
nication of input features, category predictions, budget utilization metrics, and time-series forecasts. To opti-
mize performance, caching strategies and load balancing mechanisms were incorporated to handle concurrent
user requests while maintaining low latency.