A novel framework for termset selection and weighting in binary text classification

dc.contributor.authorBadawi, Dima
dc.contributor.authorAltincay, Hakan
dc.date.accessioned2026-02-06T18:37:58Z
dc.date.issued2014
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractThis study presents a new framework for termset selection and weighting. The proposed framework is based on employing the joint occurrence statistics of pairs of terms for termset selection and weighting. More specifically, each termset is evaluated by taking into account the simultaneous or individual occurrences of the terms within the termset. Based on the idea that the occurrence of one term but not the other may also convey valuable information for discrimination, the conventionally used term selection schemes are adapted to be employed for termset selection. Similarly, the weight of a selected termset is computed as a function of the terms that occur in the document under concern where a termset is assigned a nonzero weight if either or both of the terms appear in the document. This weight estimation scheme allows evaluation of the individual occurrences of the terms and their co-occurrences separately so as to compute the document-specific weight of each termset. The proposed termset-based representation is concatenated with the bag-of-words approach to construct the document vectors. Experiments conducted on three widely used datasets have verified the effectiveness of the proposed framework. (C) 2014 Elsevier Ltd. All rights reserved.
dc.identifier.doi10.1016/j.engappai.2014.06.012
dc.identifier.endpage53
dc.identifier.issn0952-1976
dc.identifier.issn1873-6769
dc.identifier.orcid0000-0002-7286-820X
dc.identifier.scopus2-s2.0-84906078534
dc.identifier.scopusqualityQ1
dc.identifier.startpage38
dc.identifier.urihttps://doi.org/10.1016/j.engappai.2014.06.012
dc.identifier.urihttps://hdl.handle.net/11129/12722
dc.identifier.volume35
dc.identifier.wosWOS:000341553200004
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherPergamon-Elsevier Science Ltd
dc.relation.ispartofEngineering Applications of Artificial Intelligence
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260204
dc.subjectCo-occurrence features
dc.subjectTermset selection
dc.subjectTermset weighting
dc.subjectDocument representation
dc.subjectText categorization
dc.titleA novel framework for termset selection and weighting in binary text classification
dc.typeArticle

Files