“The Complex Dynamics of Collaborative Tagging”, Harry Halpin, Valentin Robu, Hana Shepherd2007-05 (, , ; backlinks)⁠:

The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including whether coherent categorization schemes can emerge from unsupervised tagging by users.

This paper uses data from the social bookmarking site delicio.us to examine the dynamics of collaborative tagging systems. In particular, we examine whether the distribution of the frequency of use of tags for “popular” sites with a long history (many tags and many users) can be described by a power law distribution, often characteristic of what are considered complex systems. [Does not compare against log-normal]

We produce a generative model of collaborative tagging in order to understand the basic dynamics behind tagging, including how a power law distribution of tags could arise. We empirically examine the tagging history of sites in order to determine how this distribution arises over time and to determine the patterns prior to a stable distribution.

Lastly, by focusing on the high-frequency tags of a site where the distribution of tags is a stabilized power law, we show how tag co-occurrence networks for a sample domain of tags can be used to analyze the meaning of particular tags given their relationship to other tags.

[Keywords: tagging, Del.icio.us, power laws, complex systems, emergent semantics, collaborative filtering]

…There is reason to believe a stable distribution should arise. Online tagging systems have a variety of features that are often associated with complex systems such as a large number of users, a lack of central coordination, and non-linear dynamics, and these sort of systems are known to produce a type over time a distribution known as a power law. One important feature of power laws produced by complex systems is that they can often be “scale-free”, such that regardless of how larger the system grows, the shape of the distribution remains the same, and thus “stable.” Researchers have observed, some casually, some more rigorously, that the distribution of tags applied to particular URLs in tagging systems follows a power law distribution where there are a relatively small number of tags that are used with great frequency and a great number of tags that are used infrequently11. We are concerned with a thorough demonstration, explanation, and empirical analysis of this phenomenon…One of the specific features of del.icio.us is the inclusion of “most common tags” for a given site when a user saves that site, facilitating the use of the tags others have used with the greatest frequency. They explain that the stability of common tags, which are displayed for users when they save a site, is based on a shared background and set of assumptions among users…This behavior is a clear example of preferential attachment, known popularly as the “rich get richer” model.

…Thus, folksonomy structure could also be seen as emerging at the intersection between the efforts of taggers who try to minimize their effort and thus prefer to choose more common tags with less information value, and retrievers or “hearers” who need to use these tags to find as precise resources as possible and thus use tags with the highest information value.