Predicting Employee Turnover Using Machine Learning: 
Evidence from Organizational Data

Huitong Li

doi:10.70711/memf.v2i7.7172

Predicting Employee Turnover Using Machine Learning: Evidence from Organizational Data

Huitong Li

Abstract

The employee turnover is a major threat to an organization in terms of cost, productivity, and stability of employees. This paper
employs the machine-based learning method to foresee the staff retention by employing parameters like the level of satisfaction, performance
assessment, work pressure, length of service, and pay scale. We fix the problem of class imbalance by employing SMOTE to increase the
performance of models. Although Logistic Regression is interpretable and recognizes the most important predictors of the turnover, the Random Forest is more accurate and shows better F1-score. These results indicate that job satisfaction and performance feedback are significant
determinants of turnover than compensation on its own. Concluding, we provide recommendations on the policy that will help in enhancing
employee retention by applying data-based strategies.

Keywords

Development economics; Machine learning; Logistic regression; Random forest; SMOTE; Industry; Class imbalance; Predictive modeling

Full Text:

PDF

Included Database

References

[1] [Becker, 1964] Becker, G. S. (1964). Human Capital: A Theoretical and Empirical Analysis, with Special Reference to Education. University of Chicago Press. The foundational eco- nomic theory on human capital investment, highlighting the costs of employee attrition

and the value of retention.

[2] [Boushey and Glynn, 2012] Boushey, H. and Glynn, S. J. (2012). There are significant busi- ness costs to replacing employees. Center

for American Progress. This study quantifies the financial burden of employee turnover, reinforcing the economic significance of predictive retention strategies.

[3] [Breiman, 2001] Breiman, L. (2001). Random forests. Machine Learning, 45(1):532. The foundational work on Random Forests, explaining why they perform well in structured prediction tasks.

[4] [He and Garcia, 2009] He, H. and Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data

Engineering, 21(9):12631284. Discusses SMOTE and other methods to address class imbalances in machine learning, relevant to improving recall for leavers.

[5] [King and Zeng, 2001] King, G. and Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2):137163. Provides

insight into improving recall in logistic regression, particularly for imbalanced datasets like employee turnover prediction.

[6] [Lundberg and Lee, 2017] Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in

Neural Information Processing Systems (NeurIPS). Introduces SHAP (SHapley Additive exPlanations), a useful interpretability tool that

could further refine turnover analysis.

[7] [Mortensen, 1986] Mortensen, D. T. (1986). Job search and labor market analysis. Handbook of Labor Economics, 2:849919.

[8] [Stewart, 2024] Stewart, M. (2024). Employee turnover dataset. The primary dataset used in this study.

DOI: http://dx.doi.org/10.70711/memf.v2i7.7172

Refbacks

There are currently no refbacks.