An Improved BP Neural Network Credit Default Prediction Model Based on SMOTE and Random Forests
Abstract
against several baseline classifiers, including logistic regression, XGBoost, LightGBM, CatBoost, and a stacking ensemble method. The
outcomes of the final test data indicate that the proposed method surpasses the alternatives regarding AUC, KS, and accuracy, with an AUC of
0.714, a KS value of 0.821, and an accuracy of 80.3%. The results demonstrate that the model markedly improves the capacity to distinguish
between default and non-default instances, while preserving elevated overall predictive accuracy.
Keywords
Full Text:
PDFReferences
[1] D. J. Hand and W. E. Henley, Statistical classification methods in consumer credit scoring: a review, Journal of the Royal Statistical
Society: Series A (Statistics in Society), vol. 160, no. 3, pp. 523541, 1997.
[2] R. Anderson, The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford,
U.K.: Oxford University Press, 2007.
[3] L. Thomas, J. Crook, and D. Edelman, Credit Scoring and Its Applications, 2nd ed. Philadelphia, PA: Society for Industrial and Applied
Mathematics, 2017.
[4] A. Ampountolas, T. N. Nde, P. Date, and C. Constantinescu, A machine learning approach for micro-credit scoring, Risks, vol. 9, no. 3,
p. 50, 2021.
[5] S. Lessmann, B. Baesens, H. V. Seow, and L. C. Thomas, Benchmarking state-of-the-art classification algorithms for credit scoring: An
update of research, European Journal of Operational Research, vol. 247, no. 1, pp. 124136, 2015.
[6] Q. Tan, Loan default prediction based on PSO-BP-AdaBoost fusion model, M.S. thesis, Chongqing Univ., Chongqing, China, 2022.
[7] L. Breiman, Bagging predictors, Machine Learning, vol. 24, pp. 123140, 1996.
[8] Q. Li, An interpretable credit risk prediction model based on knowledge distillation, M.S. thesis, Southwestern Univ. of Finance and
Economics, Chengdu, China, 2023.
[9] C. Sun, Research on personal credit loan default risk prediction based on ensemble learning, M.S. thesis, Huazhong Agricultural Univ.,
Wuhan, China, 2022.
[10] Q. Guo, Credit default risk prediction based on GAN-Stacking ensemble algorithm, M.S. thesis, Zhongnan Univ. of Economics and Law,
Wuhan, China, 2023.
[11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, vol. 16, pp. 321357, 2002.
[12] L. Breiman, Random forests, Machine Learning, vol. 45, pp. 532, 2001.
[13] X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, in Proc. 14th Int. Conf. Artificial Intelligence and Statistics (AISTATS), 2011, pp. 315323.
[14] D. M. Green and J. A. Swets, Signal Detection Theory and Psychophysics, vol. 1, New York, NY: Wiley, 1966.
[15] K. An, Sulla determinazione empirica di una legge di distribuzione, Giorn. dellInst. Ital. degli Att., vol. 4, pp. 8991, 1933.
[16] N. Smirnov, On the estimation of the discrepancy between empirical distribution for two independent samples, Bull. Math. Univ.
Mosc., vol. 2, no. 2, 1939.
[17] J. R. Beck and E. K. Shultz, The use of relative operating characteristic (ROC) curves in test performance evaluation, Archives of Pathology & Laboratory Medicine, vol. 110, no. 1, pp. 13-20, 1986.
[18] J. A. Hanley and B. J. McNeil, A method of comparing the areas under receiver operating characteristic curves derived from the same
cases, *Radiology*, vol. 148, no. 3, pp. 839-843, 1983.
DOI: http://dx.doi.org/10.70711/frim.v3i7.6820
Refbacks
- There are currently no refbacks.