An Improved BP Neural Network Credit Default Prediction
Model Based on SMOTE and Random Forests

Yongxi Zhou

doi:10.70711/frim.v3i7.6820

An Improved BP Neural Network Credit Default Prediction Model Based on SMOTE and Random Forests

Yongxi Zhou

Abstract

In response to the urgent need for financial institutions to accurately predict credit defaults and enhance risk management capabilities, this study proposes an improved backpropagation (BP) neural network model for credit default prediction, integrating the SMOTE technique for data balancing and utilizing Random Forest for feature selection. Based on credit data analysis, the proposed model is benchmarked
against several baseline classifiers, including logistic regression, XGBoost, LightGBM, CatBoost, and a stacking ensemble method. The
outcomes of the final test data indicate that the proposed method surpasses the alternatives regarding AUC, KS, and accuracy, with an AUC of
0.714, a KS value of 0.821, and an accuracy of 80.3%. The results demonstrate that the model markedly improves the capacity to distinguish
between default and non-default instances, while preserving elevated overall predictive accuracy.

Keywords

SMOTE; Random Forest; BP neural network; Credit default forecasts

Full Text:

PDF

Included Database

References

[1] D. J. Hand and W. E. Henley, Statistical classification methods in consumer credit scoring: a review, Journal of the Royal Statistical

Society: Series A (Statistics in Society), vol. 160, no. 3, pp. 523541, 1997.

[2] R. Anderson, The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford,

U.K.: Oxford University Press, 2007.

[3] L. Thomas, J. Crook, and D. Edelman, Credit Scoring and Its Applications, 2nd ed. Philadelphia, PA: Society for Industrial and Applied

Mathematics, 2017.

[4] A. Ampountolas, T. N. Nde, P. Date, and C. Constantinescu, A machine learning approach for micro-credit scoring, Risks, vol. 9, no. 3,

p. 50, 2021.

[5] S. Lessmann, B. Baesens, H. V. Seow, and L. C. Thomas, Benchmarking state-of-the-art classification algorithms for credit scoring: An

update of research, European Journal of Operational Research, vol. 247, no. 1, pp. 124136, 2015.

[6] Q. Tan, Loan default prediction based on PSO-BP-AdaBoost fusion model, M.S. thesis, Chongqing Univ., Chongqing, China, 2022.

[7] L. Breiman, Bagging predictors, Machine Learning, vol. 24, pp. 123140, 1996.

[8] Q. Li, An interpretable credit risk prediction model based on knowledge distillation, M.S. thesis, Southwestern Univ. of Finance and

Economics, Chengdu, China, 2023.

[9] C. Sun, Research on personal credit loan default risk prediction based on ensemble learning, M.S. thesis, Huazhong Agricultural Univ.,

Wuhan, China, 2022.

[10] Q. Guo, Credit default risk prediction based on GAN-Stacking ensemble algorithm, M.S. thesis, Zhongnan Univ. of Economics and Law,

Wuhan, China, 2023.

[11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, vol. 16, pp. 321357, 2002.

[12] L. Breiman, Random forests, Machine Learning, vol. 45, pp. 532, 2001.

[13] X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, in Proc. 14th Int. Conf. Artificial Intelligence and Statistics (AISTATS), 2011, pp. 315323.

[14] D. M. Green and J. A. Swets, Signal Detection Theory and Psychophysics, vol. 1, New York, NY: Wiley, 1966.

[15] K. An, Sulla determinazione empirica di una legge di distribuzione, Giorn. dellInst. Ital. degli Att., vol. 4, pp. 8991, 1933.

[16] N. Smirnov, On the estimation of the discrepancy between empirical distribution for two independent samples, Bull. Math. Univ.

Mosc., vol. 2, no. 2, 1939.

[17] J. R. Beck and E. K. Shultz, The use of relative operating characteristic (ROC) curves in test performance evaluation, Archives of Pathology & Laboratory Medicine, vol. 110, no. 1, pp. 13-20, 1986.

[18] J. A. Hanley and B. J. McNeil, A method of comparing the areas under receiver operating characteristic curves derived from the same

cases, *Radiology*, vol. 148, no. 3, pp. 839-843, 1983.

DOI: http://dx.doi.org/10.70711/frim.v3i7.6820

Refbacks

There are currently no refbacks.