A Data Mining Model for Clustering Food Consumption Patterns

Authors

Bennett, E. O.

Department of Computer Science, Rivers State University, Port Harcourt, Rivers State (Nigeria)

Queen A. Dan-Jumbo

Department of Computer Science, Rivers State University, Port Harcourt, Rivers State (Nigeria)

Article Information

DOI: 10.51244/IJRSI.2025.12110037

Subject Category: Data Mining

Volume/Issue: 12/11 | Page No: 401-411

Publication Timeline

Submitted: 2025-11-21

Accepted: 2025-11-28

Published: 2025-12-04

Abstract

Object clustering frequently encounters formation of artificial clusters, which compromises data quality and reduces clustering accuracy, limited data understanding, and degraded performance metrics; and high computational time. This paper addresses these limitations by proposing an optimized system for robust food consumption pattern analysis across Nigeria. The method leverages Principal Component Analysis (PCA) to mitigate the challenges, particularly single cluster formation and high dimensionality. The system utilizes a MiniBatchKMeans algorithm. Extensive evaluation of the system was conducted through a direct comparison against a baseline MiniBatchKMeans and DBSCAN, assessing performance across critical metrics including runtime, memory consumption, and internal cluster validation scores (Silhouette, Davies-Bouldin, Calinski-Harabasz). Results demonstrate that the system achieves better high-quality clustering scores than the baseline while maintaining a significant advantage in computational efficiency, with a runtime improvement of nearly 50%.

Keywords

Data Mining, Clustering, Mini Batch K Means, High-Dimensional Data

Downloads

References

1. Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques. Morgan Kaufmann 8(9), 118-132. [Google Scholar] [Crossref]

2. Xu, R., & Wunsch, D. (2009). Clustering. IEEE Conference on Artificial Intelligence 7(3) 245-258) . [Google Scholar] [Crossref]

3. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys (CSUR), 31(3), 264-323. [Google Scholar] [Crossref]

4. Pelleg, D., & Moore, A. W. (2000). X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In Proceedings of the 17th International Conference on Machine Learning (pp. 727-734). [Google Scholar] [Crossref]

5. Pham, D. T., Dimov, S. S., & Nguyen, C. D. (2005). The C-means algorithm revisited. Applied Soft Computing, 5(2), 173-181. [Google Scholar] [Crossref]

6. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281-297). [Google Scholar] [Crossref]

7. Zimek, A., Schubert, E., & Kriegel, H. P. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), 374-397. [Google Scholar] [Crossref]

8. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 226-231). [Google Scholar] [Crossref]

9. Aggarwal, C. C., Hinneburg, A., & Keim, D. A. (2001). On the surprising behavior of distance metrics in high dimensional space. Database Theory—ICDT, 420-434. [Google Scholar] [Crossref]

10. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37-54. [Google Scholar] [Crossref]

11. Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques. Morgan Kaufmann. [Google Scholar] [Crossref]

12. Xu, R., & Tian, Y. (2015). A review of clustering methods in text mining. Information Sciences, 324, 219-242. [Google Scholar] [Crossref]

13. Jahan, M., & Javed, M. M. (2016). A review on clustering techniques in data mining. International Journal of Computer Science and Engineering, 4(1), 1-8. [Google Scholar] [Crossref]

14. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37-54. [Google Scholar] [Crossref]

15. Mabogunje, A. L. (2008). The farm-to-fork model: A new perspective on food security. The United Nations University. [Google Scholar] [Crossref]

16. Eneji, S. A., & Adaji, J. D. (2012). The Economic Theory of Demand and Supply. Malthouse Press. [Google Scholar] [Crossref]

17. Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666. [Google Scholar] [Crossref]

18. Sculley, D. (2010). Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web (pp. 1177-1178). [Google Scholar] [Crossref]

19. Zaharia, M., Chowdhury, M., Franklin, M. J., Jordan, M. I., & Stoica, I. (2016). Spark: Cluster computing with working sets. Hot Topics in Cloud Computing. [Google Scholar] [Crossref]

20. Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster Analysis. John Wiley & Sons. [Google Scholar] [Crossref]

21. Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: An efficient data clustering method for very large databases. ACM SIGMOD Record, 25(2), 103-114. [Google Scholar] [Crossref]

22. McInnes, L., Healy, J., & Astels, S. (2017). HDBSCAN: Hierarchical density-based spatial clustering of applications with noise. The Journal of Open Source Software, 2(11), 205. [Google Scholar] [Crossref]

23. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 226-231). [Google Scholar] [Crossref]

24. Wang, W., Yang, J., & Muntz, E. (1997). STING: A statistical information grid approach to spatial data mining. In Proceedings of the 23rd International Conference on Very Large Data Bases (pp. 186-195). [Google Scholar] [Crossref]

25. Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In ACM SIGMOD Record 27(2), 94-105. [Google Scholar] [Crossref]

26. McLachlan, G., & Peel, D. (2000). Finite Mixture Models. John Wiley & Sons. [Google Scholar] [Crossref]

27. Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. ACM SIGMOD Record, 29(2), 1-12. [Google Scholar] [Crossref]

28. Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning (pp. 577-584). [Google Scholar] [Crossref]

29. Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241-254. [Google Scholar] [Crossref]

30. Hartigan, J. A. (1975). Clustering Algorithms. John Wiley & Sons. [Google Scholar] [Crossref]

31. Pelleg, D., & Moore, A. W. (2000). X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning (pp. 727-734). [Google Scholar] [Crossref]

32. Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 1027-1035). [Google Scholar] [Crossref]

33. Paparrizos, J., Das, A., & Lee, C. (2024). Contrastive learning for unsupervised clustering. Journal of Machine Learning Research, 25, 1-28. [Google Scholar] [Crossref]

34. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281-297). [Google Scholar] [Crossref]

35. Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (pp. 49-60). [Google Scholar] [Crossref]

36. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification. Wiley. [Google Scholar] [Crossref]

37. Jolliffe, I., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society , 374(2065), 20150202. [Google Scholar] [Crossref]

38. Catanzaro, B., Cantin, J., & Keutzer, K. (2008). Fast, parallel k-means using GPU hardware. In Proceedings of the 2008 Joint Conference on Learning and Intelligent Optimization (pp. 177-185). [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles