Compact Representation of Documents Using Terms and Termsets

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Verlag

Access Rights

info:eu-repo/semantics/closedAccess

Abstract

In this study, computation of compact document vectors by utilizing both terms and termsets for binary text categorization is addressed. In general, termsets are concatenated with all terms, leading to large document vectors. Selection of a subset of terms and termsets for compact but also effective representation of documents is considered in this study. Two different methods are studied for this purpose. In the first method, combination of terms and termsets in different proportions is evaluated. As an alternative approach, normalized ranking scores of terms and termsets are employed for subset selection. Experiments conducted on two widely used datasets have shown that termsets can effectively complement terms also in cases when small number of features are used to represent documents. © 2018, Springer International Publishing AG, part of Springer Nature.

Description

14th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2018 -- 2018-07-15 through 2018-07-19 -- New York -- 216139

Keywords

Compact representation, Different proportions, Document vectors, Subset selection, Text categorization, Binary sequences

Journal or Series

Lecture Notes in Computer Science

WoS Q Value

Scopus Q Value

Volume

10934 LNAI

Issue

Citation

Endorsement

Review

Supplemented By

Referenced By