Ternary encoding based feature extraction for binary text classification

dc.contributor.authorAltincay, Hakan
dc.contributor.authorErenel, Zafer
dc.date.accessioned2026-02-06T18:34:18Z
dc.date.issued2014
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractA novel framework for termset based feature extraction is proposed for binary text classification. The proposed approach is based on the encoding of the terms within a termset. The ternary codes '+1' and '-1' are used to represent the class that the term supports, whereas '0' denotes no support to any of the classes. Four different encoding schemes are proposed where the term weights and the term occurrence probabilities in the positive and negative documents are used to define the ternary code of a given term. The ternary patterns are utilized to define novel features by splitting them into positive and negative codes where each code is treated as a different feature extractor. Use of the derived features individually and together with bag of words representation are both investigated. The histograms of the resultant features are also employed to study the improvements that can be achieved using a small number of additional features to augment bag of words representation. Experiments conducted on four benchmark datasets with different characteristics have shown that the proposed feature extraction framework provides significant improvements compared to the bag of words representation.
dc.identifier.doi10.1007/s10489-014-0515-3
dc.identifier.endpage326
dc.identifier.issn0924-669X
dc.identifier.issn1573-7497
dc.identifier.issue1
dc.identifier.scopus2-s2.0-84957439316
dc.identifier.scopusqualityQ1
dc.identifier.startpage310
dc.identifier.urihttps://doi.org/10.1007/s10489-014-0515-3
dc.identifier.urihttps://hdl.handle.net/11129/11736
dc.identifier.volume41
dc.identifier.wosWOS:000338214100020
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofApplied Intelligence
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260204
dc.subjectLocal ternary patterns
dc.subjectFeature extraction
dc.subjectTermsets
dc.subjectn-grams
dc.subjectTermset weighting
dc.subjectText classification
dc.titleTernary encoding based feature extraction for binary text classification
dc.typeArticle

Files