Effective use of 2-termsets by discarding redundant member terms in bag-of-words representation

dc.contributor.authorBadawi, Dima
dc.contributor.authorAltincay, Hakan
dc.date.accessioned2026-02-06T18:34:14Z
dc.date.issued2019
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractRecent studies have proven the potential of using termsets to enrich the conventionally used bag-of-words-based representation of electronic documents by forming composite feature vectors. In this approach, some of the member terms may become redundant due to being strongly correlated with the corresponding termsets. On the other hand, the co-occurrence of terms may be more informative than their individual appearance. In these cases, removal of the member terms should be addressed to avoid the curse of dimensionality during model generation. In this study, elimination of member terms that become redundant due to employing 2-termsets is firstly addressed and two novel algorithms are developed for this purpose. The proposed algorithms are based on evaluating the relative discriminative powers and correlations of member terms and corresponding 2-termsets. As a third approach, evaluating redundancies of all terms when 2-termsets are used and discarding the terms that are most correlated with the 2-termsets is addressed. Simulations conducted on five benchmark datasets have verified the importance of eliminating redundant terms and effectiveness of the proposed algorithms.
dc.identifier.doi10.1007/s00521-018-3371-y
dc.identifier.endpage5418
dc.identifier.issn0941-0643
dc.identifier.issn1433-3058
dc.identifier.issue9
dc.identifier.scopus2-s2.0-85042108069
dc.identifier.scopusqualityQ1
dc.identifier.startpage5401
dc.identifier.urihttps://doi.org/10.1007/s00521-018-3371-y
dc.identifier.urihttps://hdl.handle.net/11129/11675
dc.identifier.volume31
dc.identifier.wosWOS:000488645700065
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer London Ltd
dc.relation.ispartofNeural Computing & Applications
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260204
dc.subjectTermsets
dc.subjectRedundancy
dc.subjectTerm selection
dc.subjectDocument representation
dc.subjectText classification
dc.titleEffective use of 2-termsets by discarding redundant member terms in bag-of-words representation
dc.typeArticle

Files