Reducing distributed URLS crawling time : A comparison of GUIDS and IDS

Suhailan, Safei and Abdul Samad, Shibghatullah and Burairah, Hussin (2014) Reducing distributed URLS crawling time : A comparison of GUIDS and IDS. Journal of Theoretical and Applied Information Technology, 67 (1). pp. 121-128. ISSN 19928645 [P]

[img] Image
FH02-FIK-06-01524.jpg
Restricted to Registered users only

Download (174kB)

Abstract

Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive search engines. This research project is aimed to provide survey of current problems in distributed web crawlers. It then investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling time can be reduced up to 7% by using GUIDs technique instead of using IDs.

Item Type: Article
Uncontrolled Keywords: Distributed systems, Web Crawler, GUID, Search Engine
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Informatics & Computing
Depositing User: Syahmi Manaf
Date Deposited: 13 Sep 2022 05:29
Last Modified: 13 Sep 2022 05:29
URI: http://eprints.unisza.edu.my/id/eprint/5512

Actions (login required)

View Item View Item