FindStem: Analysis and evaluation of a Turkish stemming algorithm

dc.contributor.authorSever, H
dc.contributor.authorBitirim, Y
dc.date.accessioned2026-02-06T18:16:38Z
dc.date.issued2003
dc.departmentDoğu Akdeniz Üniversitesi
dc.description10th International Symposium on String Processing and Information Retrieval -- OCT 08-10, 2003 -- MANAUS, BRAZIL
dc.description.abstractIn this paper, we evaluate the effectiveness of a new stemming algorithm, FINDSTEM, for use with Turkish documents and queries, and compare the use of this algorithm with the other two previously defined Turkish stemmers, namely A-F and L-M algorithms. Of them, the FINDSTEM and A-F algorithms employ inflectional and derivational stemmers, whereas the L-M one handles only inflectional rules. Comparison of stemming algorithms was done manually using 5,000 distinct words out of which the FINDSTEM, A-F, and L-M failed on, in respect, 49, 270, and 559 cases. A medium-size collection, which is comprised of 2,468 law records with 280K document words, 15 queries in natural language with average length of 17 search words, and a complete relevancy information for each query, was used for the effectiveness of the stemming algorithm FINDSTEM. We localized SMART retrieval system in terms of a stopping list, introduction of Turkish characters, i.e., the ISO8859-9 (Latin-5) code set, a stemming algorithm (FINDSTEM), and a Turkish translation at message level. Our results based on average precision values at 11-point recall levels shows that indexing document as well as search terms with the use of FINDSTEM for stemming is clearly and consistently more effective than the one where the terms are indexed as they are (that is, no stemming at all).
dc.identifier.endpage251
dc.identifier.isbn3-540-20177-7
dc.identifier.issn0302-9743
dc.identifier.orcid0000-0002-8261-0675
dc.identifier.orcid0000-0002-1780-2806
dc.identifier.scopus2-s2.0-0142218942
dc.identifier.scopusqualityQ3
dc.identifier.startpage238
dc.identifier.urihttps://hdl.handle.net/11129/8571
dc.identifier.volume2857
dc.identifier.wosWOS:000187498600018
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer-Verlag Berlin
dc.relation.ispartofString Processing and Information Retrieval, Proceedings
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260204
dc.titleFindStem: Analysis and evaluation of a Turkish stemming algorithm
dc.typeConference Object

Files