Improving the precision-recall trade-off in undersampling-based binary text categorization using unanimity rule

dc.contributor.authorErenel, Zafer
dc.contributor.authorAltincay, Hakan
dc.date.accessioned2026-02-06T18:34:14Z
dc.date.issued2013
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractThe distribution of documents over two classes in binary text categorization problem is generally uneven where resampling approaches are shown to improve F-1 scores. The improvement achieved is mainly due to the gain in recall where precision may deteriorate. Since precision is the primary concern in some applications, achieving higher F-1 scores with a desired level of trade-off between precision and recall is important. In this study, we present an analytical comparison between unanimity and majority voting rules. It is shown that unanimity rule can provide better F-1 scores compared to majority voting when an ensemble of high recall but low precision classifiers is considered. Then, category-based undersampling is proposed to generate high recall members. The experiments conducted on three datasets have shown that superior F-1 scores can be realized compared to the support vector machines(SVM)-based baseline system and voting over a random undersampling-based ensemble.
dc.description.sponsorshipMinistry of Education and Culture of Northern Cyprus [MEKB-09-02]
dc.description.sponsorshipThe numerical calculations reported in this paper were partly performed at the ULAKBIM High Performance Computing Center of the Turkish Scientific and Technical Research Council (TUBITAK). This work was supported by the research grant MEKB-09-02 provided by the Ministry of Education and Culture of Northern Cyprus.
dc.identifier.doi10.1007/s00521-012-1056-5
dc.identifier.endpageS100
dc.identifier.issn0941-0643
dc.identifier.issn1433-3058
dc.identifier.scopus2-s2.0-84878018711
dc.identifier.scopusqualityQ1
dc.identifier.startpageS83
dc.identifier.urihttps://doi.org/10.1007/s00521-012-1056-5
dc.identifier.urihttps://hdl.handle.net/11129/11673
dc.identifier.volume22
dc.identifier.wosWOS:000323413300008
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer London Ltd
dc.relation.ispartofNeural Computing & Applications
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260204
dc.subjectClass imbalance
dc.subjectResampling
dc.subjectClassifier ensemble
dc.subjectUnanimity rule
dc.subjectBinary text categorization
dc.titleImproving the precision-recall trade-off in undersampling-based binary text categorization using unanimity rule
dc.typeArticle

Files