Efficient watcher based web crawler design
| dc.contributor.author | Alqaraleh, Saed | |
| dc.contributor.author | Ramadan, Omar | |
| dc.contributor.author | Salamah, Muhammed | |
| dc.date.accessioned | 2026-02-06T18:49:11Z | |
| dc.date.issued | 2015 | |
| dc.department | Doğu Akdeniz Üniversitesi | |
| dc.description.abstract | Purpose - The purpose of this paper is to design a watcher-based crawler (WBC) that has the ability of crawling static and dynamic web sites, and can download only the updated and newly added web pages. Design/methodology/approach - In the proposed WBC crawler, a watcher file, which can be uploaded to the web sites servers, prepares a report that contains the addresses of the updated and the newly added web pages. In addition, the WBC is split into five units, where each unit is responsible for performing a specific crawling process. Findings - Several experiments have been conducted and it has been observed that the proposed WBC increases the number of uniquely visited static and dynamic web sites as compared with the existing crawling techniques. In addition, the proposed watcher file not only allows the crawlers to visit the updated and newly web pages, but also solves the crawlers overlapping and communication problems. Originality/value - The proposed WBC performs all crawling processes in the sense that it detects all updated and newly added pages automatically without any human explicit intervention or downloading the entire web sites. | |
| dc.identifier.doi | 10.1108/AJIM-02-2015-0019 | |
| dc.identifier.endpage | 686 | |
| dc.identifier.issn | 2050-3806 | |
| dc.identifier.issn | 1758-3748 | |
| dc.identifier.issue | 6 | |
| dc.identifier.orcid | 0000-0002-7146-3905 | |
| dc.identifier.scopus | 2-s2.0-84946546854 | |
| dc.identifier.scopusquality | Q1 | |
| dc.identifier.startpage | 663 | |
| dc.identifier.uri | https://doi.org/10.1108/AJIM-02-2015-0019 | |
| dc.identifier.uri | https://hdl.handle.net/11129/14762 | |
| dc.identifier.volume | 67 | |
| dc.identifier.wos | WOS:000366468500003 | |
| dc.identifier.wosquality | Q1 | |
| dc.indekslendigikaynak | Web of Science | |
| dc.indekslendigikaynak | Scopus | |
| dc.language.iso | en | |
| dc.publisher | Emerald Group Publishing Ltd | |
| dc.relation.ispartof | Aslib Journal of Information Management | |
| dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | |
| dc.rights | info:eu-repo/semantics/closedAccess | |
| dc.snmz | KA_WoS_20260204 | |
| dc.subject | Information retrieval | |
| dc.subject | Search engine | |
| dc.subject | AJAX crawler | |
| dc.subject | Crawler re-visiting policies | |
| dc.subject | Crawling algorithm | |
| dc.subject | Static crawler | |
| dc.title | Efficient watcher based web crawler design | |
| dc.type | Article |










