“What Is the Commons Worth? Estimating the Value of Wikimedia Imagery by Observing Downstream Use”, 2018-08-22 (; similar):
The Wikimedia Commons (WC) is a peer-produced repository of freely licensed images, videos, sounds and interactive media, containing more than 45 million files. This paper attempts to quantify the societal value of the WC by tracking the downstream use of images found on the platform.
We take a random sample of 10,000 images from WC and apply an automated reverse-image search to each, recording when and where they are used ‘in the wild’. We detect 54,758 downstream uses of the initial sample, and we characterise these at the level of generic and country-code top-level domains (TLDs). We analyse the impact of specific variables on the odds that an image is used. The random sampling technique enables us to estimate overall value of all images contained on the platform.
Drawing on the method employed by et al 2015, we find a potential contribution of $28.9 billion from downstream use of Wikimedia Commons images over the lifetime of the project.
…We find an overall quantity of 54,758 downstream uses of images from our sample. We estimate a series of logistic regressions to study variables that are statistically-significant in the odds of uptake of WC images. Overall, we find that license type is a statistically-significant factor in whether or not an image is used outside of the WC. Public domain files and licenses (those without attribution or share-alike clauses) are associated with increased odds of downstream use. This is consistent with other economic studies of the public domain ([2] [6]). We also find that for commercial use, prior appearance of the file elsewhere on Wikipedia has a statistically-significant positive effect, suggesting that human curation and selection are important in promoting key images to widespread use. We suggest further experimentation using a purposive sample of ‘quality’ and ‘valued’ images to test for the impact of human curation on the WC.
…This paper has tracked downstream digital use of images hosted on the WC. We find a mean rate of online use of 5.48 uses per image. Using commercial TLDs as a proxy for commercial use, we estimate a mean commercial usage of 2.99 per image. The odds that a given image from the WC will be used is statistically-significantly influenced by the license type issued by its uploader. Images with attribution and share-alike licenses have statistically-significantly reduced odds of being used externally compared to images fully in the public domain.
The actual societal value of the WC is likely considerably greater, and would include direct personal uses as well as print, educational and embedded software applications not detectable by our reverse image search technique. Getty routinely charges license fees of $650 or more for creative use (such as magazine covers), considerably higher than the rate for editorial use. Our valuation method could be improved with more information about usage rates of commercial stock photography as well as potential qualitative differences between stock and Commons-produced imagery.