Nonlinear transformation of term frequencies for term weighting in text categorization

dc.contributor.authorErenel, Zafer
dc.contributor.authorAltincay, Hakan
dc.date.accessioned2026-02-06T18:37:58Z
dc.date.issued2012
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractIn automatic text categorization, the influence of features on the decision is set by the term weights which are conventionally computed as the product of term frequency and collection frequency factors. The raw form of term frequencies or their logarithmic forms are generally used as the term frequency factor whereas the leading collection frequency factors take into account the document frequency of each term. In this study, it is firstly shown that the best-fitting form of the term frequency factor depends on the distribution of term frequency values in the dataset under concern. Taking this observation into account, a novel collection frequency factor is proposed which considers term frequencies. Five datasets are firstly tested to show that the distribution of term frequency values is task dependent. The proposed method is then proven to provide better F-1 scores compared to two recent approaches on majority of the datasets considered. It is confirmed that the use of term frequencies in the collection frequency factor is beneficial on tasks which does not involve highly repeated terms. It is also shown that the best F-1 scores are achieved on majority of the datasets when smaller number of features are considered. (C) 2012 Elsevier Ltd. All rights reserved.
dc.identifier.doi10.1016/j.engappai.2012.06.013
dc.identifier.endpage1514
dc.identifier.issn0952-1976
dc.identifier.issn1873-6769
dc.identifier.issue7
dc.identifier.scopus2-s2.0-84866732584
dc.identifier.scopusqualityQ1
dc.identifier.startpage1505
dc.identifier.urihttps://doi.org/10.1016/j.engappai.2012.06.013
dc.identifier.urihttps://hdl.handle.net/11129/12721
dc.identifier.volume25
dc.identifier.wosWOS:000309787800021
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherPergamon-Elsevier Science Ltd
dc.relation.ispartofEngineering Applications of Artificial Intelligence
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260204
dc.subjectText categorization
dc.subjectTerm weighting
dc.subjectTerm frequency
dc.subjectCollection frequency factor
dc.subjectDocument length normalization
dc.titleNonlinear transformation of term frequencies for term weighting in text categorization
dc.typeArticle

Files