A new dictionary-based preprocessor that uses radix-190 numbering

Senergin, Mete Eray; Ince, Erhan Aliriza

doi:10.3906/elk-1410-124

A new dictionary-based preprocessor that uses radix-190 numbering

dc.contributor.author	Senergin, Mete Eray
dc.contributor.author	Ince, Erhan Aliriza
dc.date.accessioned	2026-02-06T18:24:44Z
dc.date.issued	2016
dc.department	Doğu Akdeniz Üniversitesi
dc.description.abstract	Various scholarly works in the literature have pointed out that placing a preprocessor in front of a standard postcompressor would help achieve higher gains while compressing natural-language text files. Ever since, there has been much research on preprocessors to improve the gain attained by concatenated systems. With the same goal in mind our paper proposes a new word-based preprocessor named METEHAN190 (M190) and contrasts its performance with four other state-of-the-art preprocessors. Throughout the experiments source files from the Wall Street Journal (WSJ) archive, and the Calgary, Canterbury, Gutenberg, and Pizza and Chili corpora were used. Postcompressors adapted were Prediction by Partial Matching compressor using method-D (PPMD) and Monstrous PPM II compressor (PPMonstr). It was observed that in all three experiments WRT and M190 would achieve the two highest compression gains. For small text and transcription files from the Calgary corpus, M190 would outperform all preprocessors including WRT. On the other hand, a look at average encoding and decoding times shows that the semistatic byte-oriented methods are much faster in comparison to the static dictionary-based methods that encode words with characters.
dc.identifier.doi	10.3906/elk-1410-124
dc.identifier.endpage	4480
dc.identifier.issn	1300-0632
dc.identifier.issn	1303-6203
dc.identifier.issue	5
dc.identifier.orcid	0000-0002-1079-3601
dc.identifier.scopus	2-s2.0-84978216538
dc.identifier.scopusquality	Q2
dc.identifier.startpage	4465
dc.identifier.trdizinid	247537
dc.identifier.uri	https://doi.org/10.3906/elk-1410-124
dc.identifier.uri	https://search.trdizin.gov.tr/tr/yayin/detay/247537
dc.identifier.uri	https://hdl.handle.net/11129/10351
dc.identifier.volume	24
dc.identifier.wos	WOS:000378097800084
dc.identifier.wosquality	Q3
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	TR-Dizin
dc.language.iso	en
dc.publisher	Tubitak Scientific & Technological Research Council Turkey
dc.relation.ispartof	Turkish Journal of Electrical Engineering and Computer Sciences
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_WoS_20260204
dc.subject	Lossless text compression
dc.subject	preprocessing
dc.subject	postcompressor
dc.subject	dictionary
dc.subject	semistatic byte-oriented preprocessors
dc.subject	METEHAN 190
dc.title	A new dictionary-based preprocessor that uses radix-190 numbering
dc.type	Article

Collections

WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu
TR-Dizin Indexed Publications Collection

A new dictionary-based preprocessor that uses radix-190 numbering

Files

Collections