āThe Availability and Persistence of Web References in D-Lib Magazineā, 2005-11-21 (; backlinks; similar)ā :
We explore the availability and persistence of URLs cited in articles published in D-Lib Magazine.
We extracted 4,387 unique URLs referenced in 453 articles published July 1995āAugust 2004. The availability was checked 3 times a week for 25 weeks September 2004āFebruary 2005.
We found that ~28% of those URLs failed to resolve initially, and 30% failed to resolve at the last check. A majority of the unresolved URLs were due to 404 (āpage not foundā) and 500 (āinternal server errorā) errors. The content pointed to by the URLs was relatively stable; only 16% of the content registered more than a 1 KB change during the testing period.
We explore possible factors which may cause a URL to fail by examining its age, path depth, top-level domain and file extension.
Based on the data collected, we found the half-life of a URL referenced in a D-Lib Magazine article is ~10 years. We also found that URLs were more likely to be unavailable if they pointed to resources in the
.net,.edu, or country-specific top-level domain, used non-standard ports (ie. not port 80), or pointed to resources with uncommon or deprecated extensions (eg..shtml,.ps,.txt).