Farsi document image recognition system using word layout signature

dc.contributor.authorErgun, Cem
dc.contributor.authorNorozpour, Sajedeh
dc.date.accessioned2026-02-06T18:24:45Z
dc.date.issued2019
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractIn this paper, a new representation of Farsi words is proposed to present the keyword spotting problems in Farsi document image retrieval. In this regard, we define a signature for each Farsi word based on the word connected component layout. The mentioned signature is shown as boxes, and then, by sketching vertical and horizontal lines, we construct a grid of each word to provide a new descriptor. One of the advantages of this method is that it can be used for both handwritten and machine-printed texts. Finally, to evaluate the performance of our system in comparison to other methods, a database that contains 19,582 printed Farsi words is examined, and after applying this approach, a recall rate of 98.1% and a precision rate of 94.3% are obtained.
dc.identifier.doi10.3906/elk-1804-92
dc.identifier.endpage1488
dc.identifier.issn1300-0632
dc.identifier.issn1303-6203
dc.identifier.issue2
dc.identifier.orcid0000-0002-5766-9966
dc.identifier.scopus2-s2.0-85065815882
dc.identifier.scopusqualityQ2
dc.identifier.startpage1477
dc.identifier.trdizinid336808
dc.identifier.urihttps://doi.org/10.3906/elk-1804-92
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/336808
dc.identifier.urihttps://hdl.handle.net/11129/10353
dc.identifier.volume27
dc.identifier.wosWOS:000463355800056
dc.identifier.wosqualityQ3
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakTR-Dizin
dc.language.isoen
dc.publisherTubitak Scientific & Technological Research Council Turkey
dc.relation.ispartofTurkish Journal of Electrical Engineering and Computer Sciences
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WoS_20260204
dc.subjectFarsi document image retrieval
dc.subjectword spotting
dc.subjectword layout signature
dc.subjectoptical character recognition
dc.titleFarsi document image recognition system using word layout signature
dc.typeArticle

Files