Improving the precision-recall trade-off in undersampling-based binary text categorization using unanimity rule
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Rights
Abstract
The distribution of documents over two classes in binary text categorization problem is generally uneven where resampling approaches are shown to improve F-1 scores. The improvement achieved is mainly due to the gain in recall where precision may deteriorate. Since precision is the primary concern in some applications, achieving higher F-1 scores with a desired level of trade-off between precision and recall is important. In this study, we present an analytical comparison between unanimity and majority voting rules. It is shown that unanimity rule can provide better F-1 scores compared to majority voting when an ensemble of high recall but low precision classifiers is considered. Then, category-based undersampling is proposed to generate high recall members. The experiments conducted on three datasets have shown that superior F-1 scores can be realized compared to the support vector machines(SVM)-based baseline system and voting over a random undersampling-based ensemble.










