Termset weighting by adapting term weighting schemes to utilize cardinality statistics for binary text categorization

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Springer

Access Rights

info:eu-repo/semantics/closedAccess

Abstract

This study proposes a novel scheme for termset weighting based on cardinality statistics. Specifically, termsets are evaluated by considering the number of apparent member terms. Based on a recently verified hypothesis that the occurrence of a subset of terms may also transfer worthwhile information about class memberships, the existing term weighting schemes are adapted. Here, the weight of a given termset is computed as the product of two factors. The first is a function of the member term frequencies that exist in the given document, and the second takes into account the numbers of positive and negative training documents in which the same number of members appear. By assigning a non-zero weight to the termsets when a subset of the member terms appears, the discriminative ability of different member term subsets is taken into consideration.

Description

Keywords

Termsets, Termset cardinality, Termset weighting, Termset selection, Document representation, Text categorization

Journal or Series

Applied Intelligence

WoS Q Value

Scopus Q Value

Volume

47

Issue

2

Citation

Endorsement

Review

Supplemented By

Referenced By