âA Large-Scale Characterization of How Readers Browse Wikipediaâ, 2021-12-22 (; backlinks; similar)â :
Despite the importance and pervasiveness of Wikipedia as one of the largest platforms for open knowledge, surprisingly little is known about how people navigate its content when seeking information. To bridge this gap, we present the first systematic large-scale analysis of how readers browse Wikipedia. Using billions of page requests from Wikipediaâs server logs, we measure how readers reach articles, how they transition between articles, and how these patterns combine into more complex navigation paths. We find that navigation behavior is characterized by highly diverse structures. Although most navigation paths are shallow, comprising a single pageload, there is much variety, and the depth and shape of paths vary systematically with topic, device type, and time of day. We show that Wikipedia navigation paths commonly mesh with external pages as part of a larger online ecosystem, and we describe how naturally occurring navigation paths are distinct from targeted navigation in lab-based settings. Our results further suggest that navigation is abandoned when readers reach low-quality pages. These findings not only help in identifying potential improvements to reader experience on Wikipedia, but also in better understanding of how people seek knowledge on the Web.
[If users load an average of 1.5 pages per session, and almost all the subsequent 0.5 page loads are by following internal wiki links (and only 6% by alternative navigation methods like search), and sessions terminate at low-quality pages mostly, how much reading or lack of reading is due to the presence or absence of wiki links?
I notice that there are still a lot of missing wiki links on articles (even proper noun ones which are dead obvious: eg. John EyreĂOsman II). From a readerâs perspective, the absence of a link is evidence that they shouldnât bother searching and they should halt there if that was what they wanted. Quality is in considerable part just accuracy and comprehensiveness of wikilinking. (See also impact of banner ads/latency; âWikipedia Mattersâ, et al 2019; âScience Is Shaped by Wikipedia: Evidence from a Randomized Control Trialâ, et al 2017.)
If an average page has, say, 50 wikilinks and the expectation of another page is ~0.5 or 50% of a page, then each individual wikilink would on average be worth 1% of a pageview and oneâd expect a marginal gain of <1% for each additional wikilink added to that page. That sounds ludicrously valuable if the real value is even a tenth of that, because adding wikilinks has traditionally not been a major focus of WP tooling or bot operator cause area (compared to disambiguation or vandal fighting). Can the user tracking estimate the value more directly? One could also look at analyzing the effects of the various semi-auto and auto-linking bots as natural experiments on the logged traffic.]