“Writing a Wikipedia Link Archive Bot”, Gwern2008-09-26 (, , , ; backlinks)⁠:

Haskell: tutorial on writing a daemon to archive links in Wikipedia articles with TagSoup and WebCite; obsolete.

This is a 200816ya tutorial demonstrating how to write a Haskell program to automatically archive Internet links into WebCite & Internet Archive to avoid linkrot, by parsing WP dumps, downloading & parsing WP articles for external links with the TagSoup HTML parsing library, using the WebCite/IA APIs to archive them, and optimizing runtime. This approach is suitable for one-off crawls but not for live archiving using the RSS feed; for the next step, see Wikipedia RSS Archive Bot for a demonstration of how one could write a RSS-oriented daemon.

Obsolete