Efficient watcher based web crawler design

dc.contributor.authorAlqaraleh, Saed
dc.contributor.authorRamadan, Omar
dc.contributor.authorSalamah, Muhammed
dc.date.accessioned2026-02-06T18:49:11Z
dc.date.issued2015
dc.departmentDoğu Akdeniz Üniversitesi
dc.description.abstractPurpose - The purpose of this paper is to design a watcher-based crawler (WBC) that has the ability of crawling static and dynamic web sites, and can download only the updated and newly added web pages. Design/methodology/approach - In the proposed WBC crawler, a watcher file, which can be uploaded to the web sites servers, prepares a report that contains the addresses of the updated and the newly added web pages. In addition, the WBC is split into five units, where each unit is responsible for performing a specific crawling process. Findings - Several experiments have been conducted and it has been observed that the proposed WBC increases the number of uniquely visited static and dynamic web sites as compared with the existing crawling techniques. In addition, the proposed watcher file not only allows the crawlers to visit the updated and newly web pages, but also solves the crawlers overlapping and communication problems. Originality/value - The proposed WBC performs all crawling processes in the sense that it detects all updated and newly added pages automatically without any human explicit intervention or downloading the entire web sites.
dc.identifier.doi10.1108/AJIM-02-2015-0019
dc.identifier.endpage686
dc.identifier.issn2050-3806
dc.identifier.issn1758-3748
dc.identifier.issue6
dc.identifier.orcid0000-0002-7146-3905
dc.identifier.scopus2-s2.0-84946546854
dc.identifier.scopusqualityQ1
dc.identifier.startpage663
dc.identifier.urihttps://doi.org/10.1108/AJIM-02-2015-0019
dc.identifier.urihttps://hdl.handle.net/11129/14762
dc.identifier.volume67
dc.identifier.wosWOS:000366468500003
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherEmerald Group Publishing Ltd
dc.relation.ispartofAslib Journal of Information Management
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260204
dc.subjectInformation retrieval
dc.subjectSearch engine
dc.subjectAJAX crawler
dc.subjectCrawler re-visiting policies
dc.subjectCrawling algorithm
dc.subjectStatic crawler
dc.titleEfficient watcher based web crawler design
dc.typeArticle

Files