A Machine Learning Framework for Student Retention Policy Development: A Case Study
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Rights
Abstract
Student attrition at tertiary institutions is a global challenge with significant personal and social consequences. Early identification of students at risk of dropout is crucial for proactive and preventive intervention. This study presents a machine learning framework for predicting and visualizing students at risk of dropping out. While most previous work relies on wide-ranging data from numerous sources such as surveys, enrolment, and learning management systems, making the process complex and time-consuming, the current study uses minimal data that are readily available in any registration system. The use of minimal data simplifies the process and ensures broad applicability. Unlike most similar research, the proposed framework provides a comprehensive system that not only identifies students at risk of dropout but also groups them into meaningful clusters, enabling tailored policy generation for each cluster through digital technologies. The proposed framework comprises two stages where the first stage identifies at-risk students using a machine learning classifier, and the second stage uses interpretable AI techniques to cluster and visualize similar students for policy-making purposes. For the case study, various machine learning algorithms-including Support Vector Classifier, K-Nearest Neighbors, Logistic Regression, Na & iuml;ve Bayes, Artificial Neural Network, Random Forest, Classification and Regression Trees, and Categorical Boosting-were trained for dropout prediction using data available at the end of the students' second semester. The experimental results indicated that Categorical Boosting with an F1-score of 82% is the most effective classifier for the dataset. The students identified as at risk of dropout were then clustered and a decision tree was used to visualize each cluster, enabling tailored policy-making.










