A new dictionary-based preprocessor that uses radix-190 numbering

dc.contributor.authorSenergin, Mete Eray
dc.contributor.authorInce, Erhan Aliriza
dc.date.accessioned2026-02-06T18:24:44Z
dc.date.issued2016
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractVarious scholarly works in the literature have pointed out that placing a preprocessor in front of a standard postcompressor would help achieve higher gains while compressing natural-language text files. Ever since, there has been much research on preprocessors to improve the gain attained by concatenated systems. With the same goal in mind our paper proposes a new word-based preprocessor named METEHAN190 (M190) and contrasts its performance with four other state-of-the-art preprocessors. Throughout the experiments source files from the Wall Street Journal (WSJ) archive, and the Calgary, Canterbury, Gutenberg, and Pizza and Chili corpora were used. Postcompressors adapted were Prediction by Partial Matching compressor using method-D (PPMD) and Monstrous PPM II compressor (PPMonstr). It was observed that in all three experiments WRT and M190 would achieve the two highest compression gains. For small text and transcription files from the Calgary corpus, M190 would outperform all preprocessors including WRT. On the other hand, a look at average encoding and decoding times shows that the semistatic byte-oriented methods are much faster in comparison to the static dictionary-based methods that encode words with characters.
dc.identifier.doi10.3906/elk-1410-124
dc.identifier.endpage4480
dc.identifier.issn1300-0632
dc.identifier.issn1303-6203
dc.identifier.issue5
dc.identifier.orcid0000-0002-1079-3601
dc.identifier.scopus2-s2.0-84978216538
dc.identifier.scopusqualityQ2
dc.identifier.startpage4465
dc.identifier.trdizinid247537
dc.identifier.urihttps://doi.org/10.3906/elk-1410-124
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/247537
dc.identifier.urihttps://hdl.handle.net/11129/10351
dc.identifier.volume24
dc.identifier.wosWOS:000378097800084
dc.identifier.wosqualityQ3
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakTR-Dizin
dc.language.isoen
dc.publisherTubitak Scientific & Technological Research Council Turkey
dc.relation.ispartofTurkish Journal of Electrical Engineering and Computer Sciences
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WoS_20260204
dc.subjectLossless text compression
dc.subjectpreprocessing
dc.subjectpostcompressor
dc.subjectdictionary
dc.subjectsemistatic byte-oriented preprocessors
dc.subjectMETEHAN 190
dc.titleA new dictionary-based preprocessor that uses radix-190 numbering
dc.typeArticle

Files