Improving the precision-recall trade-off in undersampling-based binary text categorization using unanimity rule

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Springer London Ltd

Access Rights

info:eu-repo/semantics/closedAccess

Abstract

The distribution of documents over two classes in binary text categorization problem is generally uneven where resampling approaches are shown to improve F-1 scores. The improvement achieved is mainly due to the gain in recall where precision may deteriorate. Since precision is the primary concern in some applications, achieving higher F-1 scores with a desired level of trade-off between precision and recall is important. In this study, we present an analytical comparison between unanimity and majority voting rules. It is shown that unanimity rule can provide better F-1 scores compared to majority voting when an ensemble of high recall but low precision classifiers is considered. Then, category-based undersampling is proposed to generate high recall members. The experiments conducted on three datasets have shown that superior F-1 scores can be realized compared to the support vector machines(SVM)-based baseline system and voting over a random undersampling-based ensemble.

Description

Keywords

Class imbalance, Resampling, Classifier ensemble, Unanimity rule, Binary text categorization

Journal or Series

Neural Computing & Applications

WoS Q Value

Scopus Q Value

Volume

22

Issue

Citation

Endorsement

Review

Supplemented By

Referenced By