Filter Variable Selection Algorithm Using Risk Ratios for Dimensionality Reduction of Healthcare Data for Classification

dc.contributor.authorBodur, Ersin Kuset
dc.contributor.authorAtsa'am, Donald Douglas
dc.date.accessioned2026-02-06T18:24:16Z
dc.date.issued2019
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractThis research developed and tested a filter algorithm that serves to reduce the feature space in healthcare datasets. The algorithm binarizes the dataset, and then separately evaluates the risk ratio of each predictor with the response, and outputs ratios that represent the association between a predictor and the class attribute. The value of the association translates to the importance rank of the corresponding predictor in determining the outcome. Using Random Forest and Logistic regression classification, the performance of the developed algorithm was compared against the regsubsets and varImp functions, which are unsupervised methods of variable selection. Equally, the proposed algorithm was compared with the supervised Fisher score and Pearson's correlation feature selection methods. Different datasets were used for the experiment, and, in the majority of the cases, the predictors selected by the new algorithm outperformed those selected by the existing algorithms. The proposed filter algorithm is therefore a reliable alternative for variable ranking in data mining classification tasks with a dichotomous response.
dc.identifier.doi10.3390/pr7040222
dc.identifier.issn2227-9717
dc.identifier.issue4
dc.identifier.orcid0000-0001-9687-8042
dc.identifier.scopus2-s2.0-85067510986
dc.identifier.scopusqualityQ2
dc.identifier.urihttps://doi.org/10.3390/pr7040222
dc.identifier.urihttps://hdl.handle.net/11129/10124
dc.identifier.volume7
dc.identifier.wosWOS:000467771400045
dc.identifier.wosqualityQ3
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherMdpi
dc.relation.ispartofProcesses
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WoS_20260204
dc.subjectdata mining
dc.subjectclassification
dc.subjectvariable importance
dc.subjectfilter algorithm
dc.subjectrisk ratio
dc.subjecthealthcare
dc.subjectbalanced classification accuracy
dc.titleFilter Variable Selection Algorithm Using Risk Ratios for Dimensionality Reduction of Healthcare Data for Classification
dc.typeArticle

Files