Feature extraction using single variable classifiers for binary text classification
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Rights
Abstract
The most popular approach for document representation is the bag-of-words where terms are considered as features. In order to compute the values of these features, the term frequencies are generally scaled by a collection frequency factor to take into account the relative importance of different terms. The term frequencies can be considered as raw data about the input document. In this study, a novel framework for feature extraction is proposed for binary text classification where feature extraction is defined as a single variable classification problem. The term frequencies are the inputs and the output of each classifier is used to define a triple of features for the corresponding term. The magnitude of the classifier output that is in the interval [0.5,1] is an indicator for the confidence of the classifier and it is also employed in document representation together with the term frequency and the collection frequency factor. © 2013 Springer-Verlag.










