pisco_log
banner

An Ensemble Learning Approach with Explainability for Early Prediction of Student Academic Performance

Lin Chai

Abstract


This paper proposes an interpretable ensemble learning framework that integrates multi-source student features including demographic attributes, family background, behavioral indicators, and prior academic records to predict final academic performance using the
widely adopted UCI Student Performance dataset. Five machine learning classifiers are systematically compared: Logistic Regression, Decision Tree, Random Forest, XGBoost, and LightGBM. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE)
is applied during preprocessing. Experimental results demonstrate that XGBoost achieves the best overall performance, with an accuracy of
87.3%, F1-score of 0.851, and AUC-ROC of 0.923. Furthermore, SHapley Additive exPlanations (SHAP) are employed to provide featurelevel interpretability, revealing that prior grades, study time, and number of past course failures are the most influential predictors. The proposed framework not only advances predictive accuracy but also offers actionable insights for educators and policymakers seeking to implement data-driven early warning systems.

Keywords


Student Performance Prediction; Ensemble Learning; Xgboost; Shap; Educational Data Mining

Full Text:

PDF

Included Database


References


[1] C. Romero and S. Ventura, "Educational data mining: A review of the state of the art, " IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 40, no. 6, pp. 601618, 2010.

[2] M. Hussain, W. Zhu, W. Zhang, and S. M. R. Abidi, "Student engagement predictions in an e-learning system and their impact on student course assessment scores, " Computational Intelligence and Neuroscience, vol. 2018, 2018.

[3] J. Xu, K. H. Moon, and M. Van Der Schaar, "A machine learning approach for tracking and predicting student performance in degree

programs, " IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 5, pp. 742753, 2017.

[4] C. Romero and S. Ventura, "Data mining in education, " WIREs Data Mining and Knowledge Discovery, vol. 3, no. 1, pp. 1223, 2013.

[5] N. T. Nghe, P. Janecek, and P. Haddawy, "A comparative analysis of techniques for predicting academic performance, " in Proc. 37th

ASEE/IEEE Frontiers in Education Conference, 2007.




DOI: http://dx.doi.org/10.70711/aitr.v3i12.9464

Refbacks

  • There are currently no refbacks.