Using the Distance in Logistic Regression Models for Predictor Ranking in Diabetes Detection
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Rights
Abstract
Logistic regression is widely used to model the relationship between a response variable and multiple independent variables. In practice, the most important variables for each problem domain are generally well known. However, a wealth of ongoing studies has been exploring additional variables for improving the prediction performance using an enriched model. In this article, a new method for ranking binary independent variables is suggested based on the distance between two decision boundaries. The boundaries correspond to the cases when value of the variable is zero or one. It is shown that, using age and body mass index as the base variables for diabetes prediction, the distances mentioned above are effective for ranking additional variables, leading to better scores than several conventionally used approaches.










