Explicit Use of Term Occurrence Probabilities for Term Weighting in Text Categorization

dc.contributor.authorErenel, Zafer
dc.contributor.authorAltincay, Hakan
dc.contributor.authorVaroglu, Ekrem
dc.date.accessioned2026-02-06T18:22:25Z
dc.date.issued2011
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractIn this paper, the behaviors of leading symmetric and asymmetric term weighting schemes are analyzed in the context of text categorization. This analysis includes their weighting patterns in the two dimensional term occurrence probability space and the dynamic ranges of the generated weights. Additionally, one of the newly proposed term selection schemes, multi-class odds ratio, is considered as a potential symmetric weighting scheme. Based on the findings of this study, a novel symmetric weighting scheme derived as a function of term occurrence probabilities is proposed. The experiments conducted on Reuters-21578 ModApte Top10, WebKB, 7-Sectors and CSTR2009 datasets indicate that the proposed scheme outperforms other leading schemes in terms of macro-averaged and micro-averaged F-1 scores.
dc.description.sponsorshipMinistry of Education and Culture of Northern Cyprus [MEKB-09-02]
dc.description.sponsorshipThe numerical calculations reported in this paper were partly performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TR-Grid e-Infrastructure). This work was supported by the research grant MEKB-09-02 provided by the Ministry of Education and Culture of Northern Cyprus and the preliminary version of it was presented in the 2009 International Conference on Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control.
dc.identifier.endpage834
dc.identifier.issn1016-2364
dc.identifier.issue3
dc.identifier.scopus2-s2.0-79958158629
dc.identifier.scopusqualityQ2
dc.identifier.startpage819
dc.identifier.urihttps://hdl.handle.net/11129/9783
dc.identifier.volume27
dc.identifier.wosWOS:000291237900002
dc.identifier.wosqualityQ4
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInst Information Science
dc.relation.ispartofJournal of Information Science and Engineering
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260204
dc.subjecttext categorization
dc.subjectsupervised term weighting
dc.subjectsymmetric schemes
dc.subjectterm occurrence probabilities
dc.subjectsupport vector machines
dc.titleExplicit Use of Term Occurrence Probabilities for Term Weighting in Text Categorization
dc.typeArticle

Files