Using the absolute difference of term occurrence probabilities in binary text categorization

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Springer

Access Rights

info:eu-repo/semantics/closedAccess

Abstract

In this study, the differences among widely used weighting schemes are studied by means of ordering terms according to their discriminative abilities using a recently developed framework which expresses term weights in terms of the ratio and absolute difference of term occurrence probabilities. Having observed that the ordering of terms is dependent on the weighting scheme under concern, it is emphasized that this can be explained by the way different schemes use term occurrence differences in generating term weights. Then, it is proposed that the relevance frequency which is shown to provide the best scores on several datasets can be improved by taking into account the way absolute difference values are used in other widely used schemes. Experimental results on two different datasets have shown that improved F-1 scores can be achieved.

Description

Keywords

Term occurrence probability, Term weighting, Relevance frequency, Mutual information, Chi-square, Odds ratio, Text categorization

Journal or Series

Applied Intelligence

WoS Q Value

Scopus Q Value

Volume

36

Issue

1

Citation

Endorsement

Review

Supplemented By

Referenced By