Performance Analysis of Machine Learning Models for Sales Forecast

Omogbhemhe Izah Mike; Odegua Rising

Authors

Omogbhemhe Izah Mike Department of Computer Science, Ambrose Alli University, Ekpoma, Nigeria
Odegua Rising Department of Computer Science, Ambrose Alli University, Ekpoma, Nigeria

Keywords:

Machine Learning, Sale Forecast, Models, Performance

Abstract

Many supermarkets today do not have a strong forecast of their yearly sales. This is mostly due to the lack of the skills, resources and knowledge to make sales estimation. At best, most supermarket and chain store use adhoc tools and processes to analyze and predict sales for the coming year. The use of traditional statistical method to forecast supermarket sales has met a lot of challenges unaddressed and mostly results in the creation of predictive models that perform poorly. The era of big data coupled with access to massive compute power has made machine learning model the best for sales forecast. In this paper, we investigated the forecasting of sales with three machine learning algorithms and compare their predictive ability. Three different methods used are K-Nearest Neighbor, Gradient Boosting and Random forest. The data used to train the machine learning models are data provided by Data Science Nigeria on the Zindi platform, the data were collected from a supermarket chain called “Chukwudi Supermarkets”. The results show that the Random Forest algorithm performs slightly better than the other two models, we saw that Gradient Boosting models were prone to over-fitting easily and that K-Nearest Neighbor even though fast, performs poorest among the three.

References

Kim Brynjolfsson Hitt. “Strength in Numbers: How Does DataDriven Decisionmaking Affect Firm Performance”. In: (2011). URL: http://ebusiness.mit.edu/research/papers

Orinna Cortes and Vladimir Vapnik. “Support-vector networks”. In: Machine Learning 20(3) (1995), pp. 273–297.

Nari Sivanandam Arunraj and Diane Ahrens. “A hybrid seasonal autoregressive integrated moving average and quantile regression for daily food sales forecasting”. In: Internation Journal Production Economics 170 (2015), pp. 321–335.

Philip Doganis et al. “Time series sales forecasting for short shelflife food products based on artificial neural networks and evolutionary computing”. In: Journal of Food Engineering 75 (2006), pp. 196–20.

Maike Krause-Traudes et al. Spatial data mining for retail sales forecasting. Tech. rep. Fraunhofer-Institut Intelligente Analyse- und Informationssysteme (IAIS), 2008.

L. Breiman. Consistency For a Simple Model of Random Forests. Technical Report 670, UC Berkeley, 2004. URL http://www.stat.berkeley.edu/˜breiman.

L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Chapman & Hall, New York, 1984.

V. Svetnik, A. Liaw, C. Tong, J. Culberson, R. Sheridan, and B. Feuston. Random forest: A classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Computer Sciences, 43:1947–1958, 2003.

Diaz-Uriarte and S.A. de Andres. Gene selection and classification of microarray data using ´ random forest. BMC Bioinformatics, 7:1471–2105, 2006.

Y. Freund and R. Shapire. Experiments with a new boosting algorithm. In L. Saitta, editor, Machine Learning: Proceedings of the 13th International Conference, pages 148–156, San Francisco, 1996. Morgan Kaufmann

Z.-H. Zhou and M. Li. Ensemble Methods.(2012). Foundations and Algorithms, -13: 978-1-4398-3005 -5.

Jerome H. Friedman . (1999). Greedy Function Approximation: A Gradient Boosting Machine, IMS 1999 Reitz Lecture.

Cover T., Hart P., (1967), Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27

Performance Analysis of Machine Learning Models for Sales Forecast

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Make a Submission

Information

Browse

Current Issue