A Hybrid Machine Learning Approach for Credit Scoring Using PCA and Logistic Regression

Authors

  • Sylvester Walusala W School of computing Jomo Kenyatta University of Agriculture and Technology Nairobi
  • Dr. Richard Rimiru Senior Lecturer, School of computing Jomo Kenyatta University of Agriculture and Technology Nairobi
  • Dr. Calvin Otieno Senior Lecturer, School of computing Jomo Kenyatta University of Agriculture and Technology Nairobi

Keywords:

Credit scoring, Machine learning, PCA, Multinomial logistic regression.

Abstract

Credit scoring is one mechanism used by lenders to evaluate risk before extending credit to credit applicants. The method helps distinguish credit worthiness of good credit applicants from the bad credit applicants.  Credit scoring involves a set of decision models and with their underlying techniques helps aid lenders in issuing of consumer credit. Logistic regression (LR) is an adjustment of linear regression with flexibility on its preposition of data and is also able to handle qualitative indicators. The major shortcoming of Logistic regression model is the inability to deal with cooperative (over fitting) effect of the variables. PCA is a feature extraction model that is used to filter out irrelevant un-needed features and hence, it lowers model training time and costs and also increases model performance. This study evaluates the shortcomings of simple models and proposes to develop an efficient and robust machine learning technique combining Logistic and PCA models to evaluate firms in the deposit taking SACCO sector. To achieve this, experimental methodology is adopted.  The proposed hybrid model will be two staged. First stage will be to transform the original variables to get new uncorrelated variables. This will be done using Principal Component Analysis (PCA). Stage two is the use of LR on the principal component values to compute the credit scores. Inferences and conclusions were made based on the analysis of the collected data using Matlab. 

References

Abdou, HAH , 'Genetic programming for credit scoring : the case of Egyptian public sector banks' , Expert Systems with Applications, 36 (9) , pp. 11402-11417. 2009

Abdou .H. and Pointon. J. “Credit scoring statistical techniques and evaluation criteria: A review of literature.” Intelligent systems in accounting, finance and management; 18 (2-3) pp 59-88. 2011

Adedeji. Elijah. “A Tool for Measuring Organization Performance using Ratio Analysis” Research Journal of Finance and Accounting Vol.5, No.19. pp 16-22. 2014

Aduda J, Peterson O. M. & Githinji M .W. (2012). “The Relationship between Credit Scoring Practices by Commercial Banks and Access to Credit by Small and Medium Enterprises in Kenya” International Journal of Humanities and Social Science Vol. 2 No. 9. pp 203-213. 2012

Altman, E.I, Danovi, A. and Falini, A. “Z-Score Models’ Application to Italian Companies Subject to Extraordinary Administration”. Journal of Applied Finance, 23(1): pp. 128-137.2013

Alireza H, Mohana O, Marthandan .G , Wan F, Wan Y , Sasan K “ Statistical and data mining methods in credit.” Proceedings of the Asia Pacific Conference on Business and Social Sciences 2015, Kuala Lumpur (in partnership with The Journal of Developing Areas). 2015 pp. 448-458

Ayushi Sharma & Akshit Chopra “Artificial Neural Networks: Applications in management” Journal of Business and Management (IOSR-JBM) Volume 12, Issue 5 PP 32-40. 2013

Asuri Venkata Madhavi & Radhamani .G, “ Improving the credit scoring model of microfinance institutions by support vector machines” International Journal of Research In Engineering and Technology: Volume: 03 pp: 29-33. 2014

Beaver, William H. “Financial Ratios as Predictors of Failure,” Empirical Research in Accounting: Selected Studies, Vol. 4 pp 71 – 111. 1996

Carlos.Cubaque-Zorro and Juan. C. Figueroa-García. "A fuzzy logic system for evaluating financial profit ratios," IEEE Conference on Norbert Wiener in the 21st Century (21CW), Boston, MA, 2014 pp. 1-7.

Chao. S. and Yu Zhang. “Using decision tree in business collaborator” 8th SMEs in a global economy conference 2011 pp. 172-186.

Chaiwut .K, W. Rueangsirarak and R. Chaisricharoen. "Factor analysis on student loan consideration in higher education level," International Conference on Digital Arts, Media and Technology (ICDAMT), 2017, pp. 296-301.

Chen, X. Lu, and Z. Du(2014) “RBF neural network modeling based on PCA clustering analysis,” in 2014 IEEE International Conference on Granular Computing (GrC), 2014 pp. 35–38.

Chon Sern Tan, Chin Khian Yong and Yong Haur Tay. "Modeling financial ratios of Malaysian plantation stocks using Bayesian Networks," 2012 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (STUDENT), Kuala Lumpur, 2012 pp. 7-12.

Christopher M Bishop et al. “Pattern recognition and machine learning”. Springer, New York. 2006 pp 1-200

Cristián B, Lyn C & Richard W (2015). “Improving credit scoring by differentiating defaulter behavior” Journal of the Operational Research Society Volume 66, pp 771–781 2015

David J. Forgarty “Using genetic Algorithms for credit scoring systems maintenance functions” International Journal of Artificial intelligence and Applications Vol.3, No.6, November 2012

Devi R. and R. M. Chezian. "A relative evaluation of the performance of ensemble learning in credit scoring," IEEE International Conference on Advances in Computer Applications (ICACA), Coimbatore. 2016 pp. 161-165.

Dhage, S. N., Raina, C. K. “A review on Machine Learning Techniques”. International Journal on Recent and Innovation Trends in Computing and Communication, vol. 4, no. 3, pp. 395-399. 2016

Dinesh Bacham and Janet Zhao “Machine Learning: Challenges, Lessons, and Opportunities in Credit Risk Modeling” Moody’s analytics risk perspectives/managing disruptions/Vol IX pp 1-5 2017

Domingos Pedro(2012). “A few useful things to know about Machine learning” Communications of ACM 55.10 pg 78-87 2012

D. Durand, “Risk Elements in Consumer Installment Financing,” National Bureau of Economy Research, New York, 1941, pp. 189-201.

Ekkarat. B and Khanita. D, "Digital disease detection: Application of machine learning in community health informatics," 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), 2016 pp. 1-5.

Fang. K. & Huang H. “Variable Selection for Credit Risk Model Using Data Mining Technique” Journal of computer Vol 6 No 9 pp 1868-1874 2011

Fisher .R. A. “The use of multiple measurements in taxonomic problems” pp 466-475, 1936

Gabriela Mircea, Marilen Pirtea, Mihaela Neamtu and Sandra Băzăvan “Risk software application using a credit scoring model” International journal of applied mathematics and informatics Issue 1, Volume 6. Pp 1-8 2012

Genriha. I and Voronova. I. “Methods for Evaluating the Creditworthiness of Borrowers” SCEE. Conference proceedings. RTU Publishing House. 2012 Vol.22 pp.42-49

Halde, R.R "Application of Machine Learning algorithms for betterment in education system," International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), Pune, 2016 pp. 1110-1114.

Hens .A.B and Tiwari .M. (2012). Computational time reduction for credit scoring: an integrated approach based on support vector machine and stratified sampling method” Expert systems with applications. Volume 39 Issue 8, Pp 6774-6781 2012

Huang. X, Cai. W. Lin. X. and Zhong .H. “A genetic algorithm model for personal credit scoring in: In: Proceedings of the International Conference on Computational Intelligence and Software Engineering, Wuhan, China, 2009 pp 1–4.

Ivica P & Tamara K “The relative importance of financial and non- financial variables in predicting insolvency” Croatian Operational Research Review (CRORR), Vol. 4 pp 187-198 2013

Jung. H. Oh, R. Al-Lozi and I. E. Naqa "Application of Machine Learning Techniques for Prediction of Radiation Pneumonitis in Lung Cancer Patients," International Conference on Machine Learning and Applications, Miami Beach, FL, 2009 pp. 478-483.

Kalamkas N. & Gulna .B. “Algorithmic Scoring Models.” Applied Mathematical Sciences, Vol. 7 pp. 12, 571 – 586. 2013

Khashei, M.; Bijari, M.& Hejazi, S. “Combining seasonal ARIMA models with computational intelligence techniques for time series forecasting”. Soft Comput. Vol 16, pp 1091–1105, 2012

Khiem. Tran, T. Duong and Q. Ho, "Credit scoring model: A combination of genetic programming and deep learning," Future Technologies Conference (FTC), San Francisco, CA,2016,pp.145-149.

Ladha. L. & Deepa. T. “Feature selection methods and algorithms” International Journal on Computer Science and Engineering (IJCSE) Vol. 3 No. 5 pp 1787-1797. 2011

Lee M. & Evans M. “Learning by numbers” Predictions Technology Supplement Vol Autumn pp 1-2 2016.

Lin. S. L. “A new two stage hybrid approach of credit risk in banking industry” Expert systems with applications, Vol 36(4), pp 33-41. 2009

Marcos. M. and Reginaldo.S “Credit Analysis using data mining: Application in the case of Credit union” JISTEM, Brazil Vol. 11, No.2, pp. 379-396 2014

Marques A. I, Garcia. V and Sanchez J.S. (2013) “A literature review on the application of evolutionary computing to credit scoring” Journal of operational research society. Vol 64, Issue 9 pp 1384–1399, 2013

Mehmet. Demirci"A Survey of Machine Learning Applications for Energy-Efficient Resource Management in Cloud Computing Environments," IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, 2015 pp. 1185-1190.

Meera R. & Tulasi. B. “Credit scoring process using banking detailed data store” `International Journal of Applied Information Systems (IJAIS) –Foundation of Computer Science FCS, New York, USA Vol 8– No.6. pp 13-20 2015

Mehak. Usmani, S. H. Adil, K. Raza and S. S. A. Ali. "Stock market prediction using machine learning techniques," 3rd International Conference on Computer and Information Sciences (ICCOINS), Kuala Lumpur, 2016, pp. 322-327.

Mehdi K & Akram M. “A Soft Intelligent Risk Evaluation Model for Credit Scoring Classification” Int. J. Financial Stud. Vol 3, pp 411-422. 2015

Ming-Chang Lee. “Enterprise Credit Risk Evaluation models: A Review of Current Research Trends” International Journal of Computer Applications, Vol 44. Pp 1-5 2012

Mircea G., M.Pirtea, M.Neamiu & S. Bazavan. “Discriminant analysis in a credit scoring model” Recent Advances in Applied & Biomedical Informatics and Computational Engineering in Systems Applications. Pp 257-262 2011

Pejic. Bach, J. Zoroja, B. Jaković and N. Šarlija "Selection of variables for credit risk data mining models: Preliminary research," 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, 2017, pp. 1367-1372.

Muca M and Puka L. “R, Matlab and SPSS for factor analysis purposes: Some practical considerations and an” European Scientific Journal vol.10, No.3 pp 233-246 2014

Mehdi Khashei & Akram Mirahmadi “A Soft Intelligent Risk Evaluation Model for Credit Scoring Classification” Int. J. Financial Stud. Vol 3, pp 411-422. 2015

Marcos. M. &Reginaldo.S. “Credit analysis using data mining: Application in the case of credit union” Journal of Information Systems and Technology Management Vol. 11, No. 2, pp. 379-396. 2014

Nanni L, Lumini A (2009). “An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring.” Expert Systems with Applications Vol 36 pp 3028–3033. 2009

Nataša .Š, Kristina Š & Silvija.V, “Logistic regression and multi-criteria decision making in credit scoring” Proceedings of the 10th International Symposium on Operational Research SOR ' 2009 pp 1-10

Norazuaniza. M. Yunus and S. A. Malik "Development of financial model using financial ratios in predicting business performance of IBS Construction Company," International Conference on Statistics in Science, Business and Engineering (ICSSBE), Langkawi, 2012 pp. 1-6

Novakovic J, P. Strbac, D. Bulatovic(2011) “Toward Optimal Feature Selection” Yugoslav Journal of Operations Research Vol 21, Number 1, pp 119-135. 2011

Ouertani Nadia and Rangau L. Ureche “Corporate Default Analysis in Tunisia Using Credit Scoring Techniques.” International Journal of Business, 15(2) pp 198-220. 2010

Paleologo G, Elisseeff A and Antonini G.“ Subbaging for credit scoring models” European journal of operational research. Vol 201(2). 490-499. 2010

Pedro Domingos. “A Few Useful Things to Know about Machine Learning” Commun. ACM Vol 55, pp 78–87 2012

Psillaki. M, Tsolas, I & Margaritis D. “Evaluation of credit risk based on firm performance.” European journal of operational research, 201(3), 873-88. 2010

Radek Silhavy, Petr Silhavy& Zdenka Prokopova “Analysis and selection of a regression model for the Use Case Points method using a stepwise approach” The Journal of Systems and Software Vol 125 pp 1–14. 2017

Raghavendra B and Simba B “Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring” Int. J. Advanced Networking and Applications Volume:02, Issue: 03, Pages: 714-718 2010

Ramlee R, Azah K. and Sharifah S “PCA and LDA as Dimension Reduction for Individuality of Handwriting in Writer Verification” 13th International Conference on Intelligent Systems Design and Applications (ISDA) IEEE.2013 pp 104-108

Regina. E. Turkson, E. Y. Baagyere and G. E. Wenya "A machine learning approach for predicting bank credit worthiness," Third International Conference on Artificial Intelligence and Pattern Recognition (AIPR), Lodz, 2016, pp. 1-7.

Ricardas Mileris and Vytautas Boguslauskas. “Data Reduction Influence on the Accuracy of Credit Risk Estimation Models” Inzinerine Ekonomika-Engineering Economics, Vol 21(1) pp 5-11, 2010

Ricardas Mileris & Vytautas Boguslauskas “Credit Risk Estimation Model Development Process: Main Steps and Model.” Inzinerine Ekonomika-Engineering Economics, Vol , 22(2), pp126-133. 2011

http://support.sas.com/documentation/cdl/en/statugbayesian/61755/PDF/default/satugbayesian.pdf accessed 11th November 2015

Sahin. Y. and Duman .E. (2011)”Detecting credit card fraud by decision trees and support vector machines” International Multiconference of Engineers and computer scientists, 2011. Pp 1-6

Seyed S, Mohammad G, & Kamran S. “Combination of Feature Selection and Optimized Fuzzy Apriori Rules: The Case of Credit Scoring” The International Arab Journal of Information Technology, Vol. 12, No. 2 2015

Steven, R: “Operational Risk” (10th Edition), Securities and Investment Institute, 24 Monument Street, London. 2006

http://www.sasra.go.ke/index.php/welcome-to-sasra#.VfJ7c9Kqqko accessed on 11th May 2015

Vinh Vo Xuan. “Using Accounting Ratios in Predicting Financial Distress: An Empirical Investigation in the Vietnam Stock Market” Journal of Economics and Development, Vol.17, No.1, pp. 41-49. 2015

Xiao-Lin Li, Yu Zhong, “An overview of personal credit scoring: Techniques and future work” International Journal of Intelligence Science, Vol 2, pp 181-189. 2012

Yair Levy and Timothy J. Ellis. “A Guide for Novice Researchers on Experimental and Quasi-Experimental Studies in Information Systems Research” Interdisciplinary Journal of Information, Knowledge, and Management Vol 6, pp 152-161 2011

Yufeng et al “The porosity and permeability prediction methods for carbonate reservoirs with extremely limited logging data: Stepwise regression vs. N-way analysis of variance.” Journal of Natural Gas Science and Engineering Vol 42 pp, 99-119 2017

Zhou, L., Lai, K. K., & Yen, J. “Credit scoring models with AUC maximization based on weighted SVM.” International journal of information technology & decision making. Volume 08, Issue 04 pp 5859-5865. 2009

Zhou, X., Jiang, W. & Shi, Y. “Credit risk evaluation by using nearest subspace method.” Procedia computer science, 1(1), 2443-2449. 2010

Zoroja, J. Pejić B, M. & Ćurko, K. “Data mining applications framework for business organizations: Business functions approach” The Business Review Cambridge Vol 22(1), 119-126, 2014

Zurada .J. & Kunene .N. K. “Comparison of the performance of computational intelligence methods for loan granting decisions” Proceedings of the 44th Hawaii International Conference on System Sciences IEEE 2011 pp 1-10

Downloads

Published

2017-09-28

How to Cite

Walusala W, S., Rimiru, D. R., & Otieno, D. C. (2017). A Hybrid Machine Learning Approach for Credit Scoring Using PCA and Logistic Regression. International Journal of Computer (IJC), 27(1), 84–102. Retrieved from https://www.ijcjournal.org/index.php/InternationalJournalOfComputer/article/view/1077

Issue

Section

Articles