Characterization of national Web domains
- 1 May 2007
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Internet Technology
- Vol. 7 (2)
- https://doi.org/10.1145/1239971.1239973
Abstract
During the last few years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web because at the same time these are diverse (as they are written by several authors) and yet rather similar (as they share a common geographical, historical and cultural context). This article discusses the methodologies used for presenting the results of Web characterization studies, including the granularity at which different aspects are presented, and a separation of concerns between contents, links, and technologies. Based on this, we present a side-by-side comparison of the results of 12 Web characterization studies, comprising over 120 million pages from 24 countries. The comparison unveils similarities and differences between the collections and sheds light on how certain results of a single Web characterization study on a sample may be valid in the context of the full Web.Keywords
This publication has 14 references indexed in Scilit:
- Characterizing a national community webACM Transactions on Internet Technology, 2005
- Toward a basic framework for webometricsJournal of the American Society for Information Science and Technology, 2004
- Spam, damn spam, and statisticsPublished by Association for Computing Machinery (ACM) ,2004
- Ranking the web frontierPublished by Association for Computing Machinery (ACM) ,2004
- UbiCrawler: a scalable fully distributed Web crawlerSoftware: Practice and Experience, 2004
- Dynamic Models for File Sizes and Double Pareto DistributionsInternet Mathematics, 2004
- Uncovering Information Hidden in Web ArchivesD-Lib Magazine, 2002
- Self-similarity in the webACM Transactions on Internet Technology, 2002
- Authoritative sources in a hyperlinked environmentJournal of the ACM, 1999
- Workload characterization of a Web proxy in a cable modem environmentACM SIGMETRICS Performance Evaluation Review, 1999