Performance Analysis of Machine Learning Models for Sales Forecast
DOI:
https://doi.org/10.53896/ijc.v41i1.1899Keywords:
Machine Learning, Sale Forecast, Models, PerformanceAbstract
Many supermarkets today do not have a strong forecast of their yearly sales. This is mostly due to the lack of the skills, resources and knowledge to make sales estimation. At best, most supermarket and chain store use adhoc tools and processes to analyze and predict sales for the coming year. The use of traditional statistical method to forecast supermarket sales has met a lot of challenges unaddressed and mostly results in the creation of predictive models that perform poorly. The era of big data coupled with access to massive compute power has made machine learning model the best for sales forecast. In this paper, we investigated the forecasting of sales with three machine learning algorithms and compare their predictive ability. Three different methods used are K-Nearest Neighbor, Gradient Boosting and Random forest. The data used to train the machine learning models are data provided by Data Science Nigeria on the Zindi platform, the data were collected from a supermarket chain called “Chukwudi Supermarkets”. The results show that the Random Forest algorithm performs slightly better than the other two models, we saw that Gradient Boosting models were prone to over-fitting easily and that K-Nearest Neighbor even though fast, performs poorest among the three.
References
Kim Brynjolfsson Hitt. “Strength in Numbers: How Does DataDriven Decisionmaking Affect Firm Performance”. In: (2011). URL: http://ebusiness.mit.edu/research/papers
Orinna Cortes and Vladimir Vapnik. “Support-vector networks”. In: Machine Learning 20(3) (1995), pp. 273–297.
Nari Sivanandam Arunraj and Diane Ahrens. “A hybrid seasonal autoregressive integrated moving average and quantile regression for daily food sales forecasting”. In: Internation Journal Production Economics 170 (2015), pp. 321–335.
Philip Doganis et al. “Time series sales forecasting for short shelflife food products based on artificial neural networks and evolutionary computing”. In: Journal of Food Engineering 75 (2006), pp. 196–20.
Maike Krause-Traudes et al. Spatial data mining for retail sales forecasting. Tech. rep. Fraunhofer-Institut Intelligente Analyse- und Informationssysteme (IAIS), 2008.
L. Breiman. Consistency For a Simple Model of Random Forests. Technical Report 670, UC Berkeley, 2004. URL http://www.stat.berkeley.edu/˜breiman.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Chapman & Hall, New York, 1984.
V. Svetnik, A. Liaw, C. Tong, J. Culberson, R. Sheridan, and B. Feuston. Random forest: A classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Computer Sciences, 43:1947–1958, 2003.
Diaz-Uriarte and S.A. de Andres. Gene selection and classification of microarray data using ´ random forest. BMC Bioinformatics, 7:1471–2105, 2006.
Y. Freund and R. Shapire. Experiments with a new boosting algorithm. In L. Saitta, editor, Machine Learning: Proceedings of the 13th International Conference, pages 148–156, San Francisco, 1996. Morgan Kaufmann
Z.-H. Zhou and M. Li. Ensemble Methods.(2012). Foundations and Algorithms, -13: 978-1-4398-3005 -5.
Jerome H. Friedman . (1999). Greedy Function Approximation: A Gradient Boosting Machine, IMS 1999 Reitz Lecture.
Cover T., Hart P., (1967), Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27
Downloads
Published
Issue
Section
License
Authors who submit papers with this journal agree to the following terms.
