Hacker News new | past | comments | ask | show | jobs | submit login
In Defense of Inclusionism (2009-18) (gwern.net)
100 points by luu on May 5, 2020 | hide | past | favorite | 78 comments



As a purely end-user of Wikipedia I just want to chip in only to say that I have not seen any observable decline in Wikipedia over the last 15 years. I am generally very happy with the results I get when I use it.

> The fundamental cause of the decline is the English Wikipedia’s increasingly narrow attitude as to what are acceptable topics and to what depth those topics can be explored, combined with a narrowed attitude as to what are acceptable sources, where academic & media coverage trumps any consideration of other factors.

I guess it sucks for people who wanted to write articles on each chapter of Atlas Shrugged or thought Bulbasaur needed his own page. Presumably, though, you agree there should be SOME criteria for what is notable enough to deserve a Wikipedia article? For example, in order of increasing mundanity these topics probably don't deserve a Wikipedia article: Me, my cat, my cat's water bowl, the water in the bowl on May 5, 2020 etc.

Like, there must be some line that separates what deserves an article and what is does not. We can argue about where to draw that line but I'm actually pretty happy with how Wikipedia has done it.


> these topics probably don't deserve a Wikipedia article: Me, my cat, my cat's water bowl, the water in the bowl on May 5, 2020 etc.

Yes, they probably don't. But what is the cost of having them? If they confuse an issue, like if your cat shares a name with a more notable animal, then yours can just be renamed to "Fluffers (Permit's cat)". Who does it hurt to have that page on there, even in the reductio ad absurdum? Even, hypothetically, storage costs -- in the current deletionist world, the initial article and its deletion will be preserved in history forever, so we don't save anything.

If we need to provide the capability to flag pages as "deletionists don't like this" and present a deletionist view of wikipedia to those who don't wish to be exposed to that content, then go for it.

I don't mean to try to throw rhetorical exclamation points everywhere; I'm genuinely curious about the cost of having a page about your cat's water bowl.


Wikipedians have been arguing over these positions for many years, so if you want to see the arguments in favor of deletionism, there are plenty of places to look.

For example, here are IMO the most salient points from https://meta.wikimedia.org/wiki/Deletionism:

> Some believe that the presence of uninformative articles damage the project's usefulness and credibility, particularly when casual visitors encounter them through Internet search engines or Wikipedia's "random page" or "recent changes."

> Articles on obscure topics, even if they are in principle verifiable, tend to be very difficult to verify. Usually, the more obscure, the harder to verify. Actually verifying such articles, or sorting out verifiable facts from exaggeration and fiction, takes a great deal of time. Not verifying them opens the door to fiction and advertising. This also leads to a de facto collapse of the "no original research policy", which is one of the fundamental Wikipedia policies. Empirically, there have been a number of hoax articles which were difficult to prove to be hoaxes but which could have easily been deleted by a sufficiently strict notability policy.

> Poorly-sourced articles can result in Citogenesis, as incorrect or unsourced information on Wiki (e.g., information that is the product of original research) is then repeated outside Wiki and eventually works its way into a publication that is normally regarded as a reliable source.


When you hit random page, most of the time you get one of the following:

Uninformative article about a locality

Uninformative article about the football team of the above.


These are arguments for a verifiability guideline, not a notability guideline.


> Yes, they probably don't. But what is the cost of having them?

I like the idea that the articles on Wikipedia are generally on "notable" topics or people. When I am reading an article I know that it has probably been eyed up and down by deletionists, hunting for any excuse to delete it, but they walked away unable to do so. That's a very useful signal to me!

For example, right now I can look someone up and use "has a Wikipedia page" as a rough proxy for "is well known". If everyone had Wikipedia pages, I could no longer do this.


In general I agree with you on "has wikipedia article ~= notable". However it's well known that WP editors are biased and for example, articles about women get deleted where a similar article about a man wouldn't be.


As gp says, though, a deletionist flag and deletionist viewport would accomplish that just as well, while preserving more obscure and niche topics.


Another possibility is to have it in the positive: have a notability flag.


> I'm genuinely curious about the cost of having a page about your cat's water bowl.

The fact that it's at some point impossible to disambiguate information. If you have 50.000 pages of everyone's cat it'd be borderline impossible to find general information that is relevant to a public audience. It's the same reason I can't go to the public library and put random writings of me on a shelf, there needs to be a level of curation so that the content isn't being bogged down by what is mostly going to be noise.

The article brings up a page for each pokemon say, and if you have countless of pokemon all with similar names to other real-world stuff everyone who doesn't care about the pokemon will have to wade through links and pages of irrelevant content, it'd quickly turn into a huge digital garbage dump.

Also not to mention that Wikipedia wants to provide a reasonable level of accuracy and factfulness, and nobody can independently verify personal content or topics so niche that only one person knows what's going on.

I don't know why someone really would want wikipedia to turn into a website for in-universe fiction or people's personal content. That stuff is more suited for a self-hosted wiki.


> The fact that it's at some point impossible to disambiguate information. If you have 50.000 pages of everyone's cat it'd be borderline impossible to find general information that is relevant to a public audience. It's the same reason I can't go to the public library and put random writings of me on a shelf, there needs to be a level of curation so that the content isn't being bogged down by what is mostly going to be noise.

If this were true, there would be the same problem with the internet, since we don't have notability limitations for who can build a webpage. But the fact is, search is a pretty well-developed field. A very simple metric is simply: how many places is this linked from. These can come up higher in search results, be prioritized higher in disambiguation pages, etc. Start with that, and refine from there. Notability is a sorting problem, not a filtering problem.

> The article brings up a page for each pokemon say, and if you have countless of pokemon all with similar names to other real-world stuff everyone who doesn't care about the pokemon will have to wade through links and pages of irrelevant content, it'd quickly turn into a huge digital garbage dump.

You're wrong in your own examples. Sorry buddy, but if someone searches for "Bulbasaur", is there some more notable thing you think they are looking for besides the Pokemon?

> Also not to mention that Wikipedia wants to provide a reasonable level of accuracy and factfulness, and nobody can independently verify personal content or topics so niche that only one person knows what's going on.

Which is a great case for a verifiability guideline, not a notability guideline.

> I don't know why someone really would want wikipedia to turn into a website for in-universe fiction or people's personal content.

"Must be nonfiction", "must be verifiable", etc. are all different requirements from "must be notable".


>> these topics probably don't deserve a Wikipedia article: Me, my cat, my cat's water bowl, the water in the bowl on May 5, 2020 etc.

> But what is the cost of having them?

0. Volunteer time may be cheap, but it's not infinite. Without standards, the volume of articles could become so large it will be an impossible task to fact check and edit them all.

1. Malicious, unscrupulous, or misguided actors co-opting Wikipedia's reputation for their own purposes (e.g. Jack's snake oil has been scientifically proven to cure all disease by all these articles in fake medical journals).

2. Useless garbage in search results that you'd have to wade through. Literally thousands of high school students every year would put up pages about their bands, cartoons, and art which are of no value to anyone but themselves. Why host all that if you're just going to have to build your search to exclude it?


(Late reply, so you'll probably never see this, but oh well)

> 0. Volunteer time may be cheap, but it's not infinite. Without standards, the volume of articles could become so large it will be an impossible task to fact check and edit them all.

The importance of fact-checking and verification are in my opinion proportional to notability, and editors will naturally flock to verify articles that they have an interest in or are generally notable. So what if the Bulbasaur article has some inaccuracies?

And frankly, the problem is unsolved -- read the wikipedia article on "hypnosis" and cringe at what appears to be the script for a poorly made thriller. At least we could splinter that article into multiple articles talking about the treatment in fiction, which has notability in itself compared to the semi-mystical ravings on the main page.

The other complaints are related to notability as well. I'm not saying that we should treat all articles as definitive -- we don't even do that now. The correctness of the article is a function of the dedication of contributors which is a function of the notability of the subject; this reinforcement happens organically.

Frankly, I'd welcome more skepticism approaching wikipedia content because that skepticism forms the basis for people becoming editors rather than seeing editing being something reserved for an elite class of self-diagnosed "experts".


In the reductio ad absurdum, storage costs do become a problem, and deletionism discourages many useless pages from being created in the first place. OTOH, deletionism discourages many useful pages from being created in the first place, which is a much more serious problem.


Isn't what you are describing exactly the Internet plus a search engine you trust will pick up the article type you are interested in? In fact, you'd still need to trust that open WPv2 thing just like that search engine.


> If we need to provide the capability to flag pages as "deletionists don't like this" and present a deletionist view of wikipedia to those who don't wish to be exposed to that content, then go for it.

That flag exists. It's set by deleting the article. As you note:

> in the current deletionist world, the initial article and its deletion will be preserved in history forever

So if someone wanted to present an inclusionist view of Wikipedia to those who find that content valuable, they could do so.


No, you can't edit pages that have been deleted, so that's more than a flag.


> I'm genuinely curious about the cost of having a page about your cat's water bowl.

Several billion cat-water bowl pages are probably just an annoyance for someone.

But what would you think of a page explaining, say, all the healthy virtues of drinking diluted bleach for fighting C19 being hosted on wikipedia.org?

How do you think that's going to work out, in the first instance, when panicking people read it, and in the second, when people start treating wikipedia.org as trustworthy as their spam folder?


Wouldn’t that be blocked by Wikipedia’s source requirements? Presumably no trustworthy source will say that drinking diluted bleach is good for you so there won’t be any sources to make a Wikipedia article out of.


I realize that you and the parent are referring to recent news, but bleach can be used to disinfect water for drinking in an emergency[0].

I'm not going to argue with anybody who feels like this is pedantry (yes, technically the resulting water is highly diluted bleach, but that's not the point), but it is worth filing away for the future.

You know, the one where we can buy bleach again.

[0] https://www.epa.gov/ground-water-and-drinking-water/emergenc...


> But what is the cost of having them?

The time and effort of the Wikipedia editors; the reputation of Wikipedia in general.


> Like, there must be some line that separates what deserves an article and what is does not.

That line is simply:

> If a topic has received significant coverage in reliable sources that are independent of the subject, it is presumed to be suitable for a stand-alone article or list.

-- https://en.wikipedia.org/wiki/Wikipedia:Notability

This instantly weeds out a lot of obviously trivial topics. There's arguments about how exactly "significant coverage", "reliable sources" and "independent of the subject" should be defined, and refinements of this policy for specific subject areas, but that's the core of Wikipedia's notability policies. I've never heard any solid arguments for including a topic which doesn't meet this criterion.


> I guess it sucks for people who wanted to write articles on each chapter of Atlas Shrugged or thought Bulbasaur needed his own page

Bulbasaur is a pop culture icon; he may very well be the second most recognizable Pokémon after Pikachu (though as noted Charmander and Squirtle are up there too.) In any case, he definitely earned his own page:

https://en.wikipedia.org/wiki/Bulbasaur


> Presumably, though, you agree there should be SOME criteria for what is notable enough to deserve a Wikipedia article? For example, in order of increasing mundanity these topics probably don't deserve a Wikipedia article: Me, my cat, my cat's water bowl, the water in the bowl on May 5, 2020 etc.

Well, there are a few criteria that are inherent:

1. Something is at least a little notable if someone is willing to write about it. In a literal sense, it's notable because someone noted it.

2. There's also the citation factor: even if someone cites their own work, they've at least gone to the effort to note it twice.

> Like, there must be some line that separates what deserves an article and what is does not.

Citation needed.

Why must there be such a line?


It's worth noting that the Bulbasaur fans won and he does have his own page.


That's great! And I think it shows that no matter where you draw the line you're going to end up with compelling topics on either side of it. I see it as a strength of Wikipedia that these things are fluid and there is an ongoing tension between inclusionists and deletionists. I'm very happy where they've ended up at the moment.


I'm in essentially the same camp. When I've tried to look things up, it's always already been there. I've made edits to wikipedia, but it's always been undoing vandalism.


> As a purely end-user of Wikipedia I just want to chip in only to say that I have not seen any observable decline in Wikipedia over the last 15 years. I am generally very happy with the results I get when I use it.

What you are missing is the 'seen and the unseen'. Wikipedia has stagnated badly, and it captures an ever more vanishingly small slice of the world. The world keeps getting bigger, things keep happening, and even the core topics were never actually covered in the detail they could and should be covered.

You simply cannot keep up with everything with a core of, what is it now, less than 30k regular editors? That's basically enough to fight off vandals, do basic maintenance, and cover the occasional hot topic like coronavirus or Donald Trump to a reasonably adequate degree. It's not enough to cover even the Anglosphere, much less seriously improve the back catalogue, or realize a fraction of WP's original aims.

Whenever I look at WP articles on any topic I know, like deep learning/deep reinforcement learning/behavioral genetics/decision theory/darknet markets etc, I find that they tend to be hilariously out of date, short, incomplete, or not covered at all. They're usually not outright wrong, but what is not there is much more important than what is usually there. (Let's take a moment to consider how WP's AlphaGo article, which is one of its best articles on DL/DRL because AG is globally famous, does not even mention MuZero or the term 'expert iteration'...) Typically, most of that would be 'notable', assuming an editor tenacious enough to fight for it - but such editors are precisely what have been lost.

And yet, WP is more important than ever! Isn't it ironic that when WP was at its healthiest, when it was most keeping up with Notable topics, it was hardly used, and now that it's been severely ailing for a decade, search engines and social media networks depend abjectly on it for their knowledge graphs and factcheckers and disinformation fighting efforts? (Well, not ironic. Just bad. For everyone.)


> Whenever I look at WP articles on any topic I know, like deep learning/deep reinforcement learning/behavioral genetics/decision theory/darknet markets etc, I find that they tend to be hilariously out of date, short, incomplete, or not covered at all.

The same is true for the stuff I know anything about, too. But I wonder if there's an alternative interpretation of this. The article argues:

> And it’s casual users who matter. We lost the credentialed experts years ago, if we ever had them.

I can see the truth in this. But at the same time, couldn't you say that to the same extent, the more "casuals" you have editing, the worse the articles that really need an expert's oversight (like machine learning articles) are going to be?

I recently made some huge improvements to an article that was basically unreadable. Then I started looking through the history, and found that the article had been basically okay (though not great) about 8 months prior. Tracing the edits, I found that a professor at a community college had made it a semester long assignment for each student to pick a Wikipedia article to work on, in order to learn ... something. In any case, the student assigned to work on the article in question couldn't even write coherent English, let alone make high quality edits to this article. They had systematically ruined the article with a new account over about 4-5 months, with no oversight, and none of their commits were reverted by any editor.

So I find it rather hard to say for certain that what we need are "more" editors, if those editors are largely going to be casual editors. As an expert (in a small number of fields), I don't (usually) edit Wikipedia for two reasons:

1. The starting quality of most of the articles I would work is so abysmally bad that I'd honestly rather tell the students I work with to just totally ignore Wikipedia as a way of learning about stuff in my field. Making two or three articles passable would not resolve that fundamental problem. (The problem is that the quality of an article not overseen by a very good editor is at best going to approach an average quality essay by a college student - in other words, about a 2 out of 5.)

2. I don't mind strict editing rules (see above), in fact I think they may often be necessary. What I've encountered when trying to edit Wikipedia, however, is that long-time editors know how to use and distort the rules to their own advantage. This doesn't mean the rules are misguided, necessarily, it just means that Wikipedia's culture is too insular and the top editors tend to have their own way with things.


As an inclusionist who has basically given up on editing wikipedia because of deletionist forces, I read this article and see nothing but points I agree with.

What I would like to see, however, is a good defense of deletionism. I have a lot of trouble even understanding the point of view.

If there's some higher-level motivation about storage costs or something, then I can buy it, but I don't think anyone is making that argument. Does it hurt wikipedia to have low-quality articles? I think only insofar as the articles are about subjects that are important, so the effect seems to be self-balancing. Can wikipedia be used by people to glorify themselves with personal wikipedia pages? Sure, but who cares -- unless someone does care in a particular instance (i.e. a person named Albert Einstein tries to take over the main article -- people will fight it because they want the article to point to the right person).


I think that the main argument that isn't simply a matter of storage costs (which probably does kick in at some point) is simply a matter of brand-management. If the average Wikipedia article is a discussion an incomplete and poorly sourced article about Pokemon then Wikipedia gets a reputation as poor source of information. It actually had the reputation pretty strongly early on, but it's reputation for reliabity has actually improved over the years.

I suppose that there is also the inverse problem where poor articles can use Wikipedia's relatively good reputation to give visibility to some pseudoscience or conspiracy theory. If it's really obscure, the only people who will be writing articles about are will be proponents who will then link to the article from their closed Facebook groups. It won't necessarily be obvious to non-experts what's going on so deleting certain kinds of articles could be a protection against that as well.

(I'm not any kind of Wikipedian btw; just speculating off of the top of my head.)


If that's the case, I'd prefer a two-tiered Wikipedia over a deletionist Wikipedia. In such a Wikipedia, each page could be clearly-labeled as either an encyclopedia-quality entry or as a "wannabe" encyclopedia entry. (A better term would be needed for the lower tier.) Such labeling should be very noticeable and visible when you open an entry, at least for the lower-tier entries.


What about additional information on encyclopedia quality pages that would normally have been excluded?

It seems like anyone could make an extension site to wikipedia where articles that aren't allowed at wikipedia could find a home, and the search there could also search wikipedia. The issue seems that nobody wants to put the hard work into making this happen, marketing it, and managing it. They just want wikipedia to do it for them, using the limited resources wikipedia has at their disposal, for free.


But this is contradictory. If you don’t care about credibility, why are you posting the content on wikipedia to begin with? Anyone can start a wiki, or a blog, or whatever.


They used to use the term "Stub" for brief pages that weren't full articles yet and they do put warnings on articles that don't meet certain quality standards, though I think that's different that what you're talking about here.


There's some overlap, though not full equivalency. Stubs presumably fall in line with Wikipedia's editorial vision, even if they haven't been fleshed out into full entries. The entries I'm referring to are mostly ones which would run afoul of Wikipedia's notability criterion.

Going back to the brand management issue, I'd be OK with deleted articles and their history being moved to another wiki with a different URL, with a name that is not "Wikipedia". Cookies or some other mechanism could be used to manage user/session preferences so that users who want to to be redirected to this other, questionable-page wiki can be redirected when they try to access pages that don't fall in line with Wikipedia's editorial vision. I'd be fine with having huge in-your face warnings about the likely lack of quality or veracity of such entries... just as long as I can access that content.


The argument for deletionism is straightforward: the scarcest resource on Wikipedia --- and this article agrees! --- is volunteer time. Every article on WP is a drag on volunteer time, because every article needs to be policed for spam and vandalism. Wikipedia is an encyclopedia, not a Hitchhiker's Guide, so scoping it to encyclopedic topics reserves volunteer time for meaningful tasks.

later

If you want to come at this from the vantage point of principles: one of the oldest rules on Wikipedia is "No Original Research" (an encyclopedia is a tertiary source). Articles on non-notable subjects are almost by definition original research.


This is one of the few responses I've seen that actually articulates a defense of deletionism rather than just a summary of deletionist theory, so let me engage here.

What exactly does policing for spam and vandalism consist of? There are a number of automated tools already deployed on wikipedia that get improved over time for detecting particular types of spam and vandalism. The use of anonymous accounts, in particular, make it hard to do any interesting spam at scale. It seems to me that the larger wikipedia is, paradoxically, the easier it will be to detect patterns of spam, especially out on the tail.

As for "No Original Research", that's probably the principle of wikipedia that I'm most skeptical of; and really, it isn't respected, as wikipedia has become such an important secondary source. I think it's useful as a guiding principle, in that you don't want people using wikipedia as a platform to advance their alternative theory of gravity, but for the most part the harm is small to allow us to push the envelope a little bit.


Policing spam and vandalism entails reading all the changes to pages you're policing. That's why WP editors have watchlists, because no one person can keep context for all the pages on the site in their head. A lot of spam detection is automated, but a lot of spam isn't automated: it's people with a fairly decent understanding of a topic surreptitiously editing its page to give it a preferred spin (or promotion). The whole premise of Wikipedia is that not being allowed to stand.

As regards "no OR isn't a problem": if your feeling is that the project is rife with OR, you should have no problem generating links to 5 pages that demonstrate the phenomenon. Let's discuss specifics.


I wonder if this measurement takes I to account the voulenteer time that could be gained by letting new members get a foothold.


> Does it hurt wikipedia to have low-quality articles? I think only insofar as the articles are about subjects that are important, so the effect seems to be self-balancing.

Here is where I disagree. I think Wikipedia is hurt by any low-quality articles. If I see some information on Wikipedia, there is a moderate positive signal that that information is correct. (Not as strong as a signal as I'd like, but oh well). Reducing the strength of that signal would make Wikipedia less useful.


And you dont see how that's an incredibly slippery slope, where the relative quality is rated by the worst article ala stack ranking?


I see two problems. One, as the qualifications loosen, Wikipedia will tend to have not just more articles, but more lesser quality articles. At a certain point, Wikipedia would become mostly poor quality.

Two, managing the articles has a cost for the editors involved. As the number of articles increases, their ability to do a good job will decrease.


A thread from 2016: https://news.ycombinator.com/item?id=13152255

Thread from 2014 - interesting top comment there: https://news.ycombinator.com/item?id=8791791

(Reposts are ok after about a year: https://news.ycombinator.com/newsfaq.html)

I've made the year in the title a gwernian range.


“The fundamental cause of the decline is the English Wikipedia’s increasingly narrow attitude as to what are acceptable topics and .. what are acceptable sources, where academic & media coverage trumps any consideration of other factors.”

.. as well as self-serving corporate and political interests. As in they sit on an article 24/7 making sure nothing controversial get in.

“Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge.”

Except for those that contradict the inner party. Go to the Talk Page and discuss it they say. Do that and your account gets disabled for violating some obscure WP rule.


No comments yet? I guess others are still reading TFA...

There are a number of issues that seem to be eternal. Even if there's an obvious right answer, I see year after year, decade after decade that they keep being discussed, with the same points being made over and over again. Is there a name for these?

I guess that for each of them, there is some kind of unspeakable reason that trumps any sound rationale.


> I guess that for each of them, there is some kind of unspeakable reason that trumps any sound rationale.

I think the general unspeakable reason is pretty simple: once you give people power, some of them will misuse it. And since it takes more effort to correct a misuse of power than to commit the misuse in the first place, any institution that gives people power becomes more and more corrupt over time as misuses of power outweigh valid uses of power.


While I'm mostly in Camp Inclusion, I appreciate the issues that tend to come up as you loosen criteria for articles more and more--you probably inevitably get articles that are "notable" to a narrower and narrower set of people and a lot of verification depends on shakier and harder to access sources.

That said, it's pretty clear that there are more than a few Wikipedia admins who seem to have embraced deletionism for topics that aren't near and dear to them personally and/or which poke at whatever their particular political hot buttons are.


The nice thing about Wikis in general is that it's usually easier to correct a misuse of the power to edit a page than it is to commit the misuse in the first place. That's why Wikis work at all.


That works for the power to edit, not for the power to delete. So it makes sense, yes.


As a former Wikipedian, what's there to say? The evidence is still there, and was there for years and years. Deletionism was, and remains, a wrong-headed attitude that does not understand what makes WP qualitatively different from standard encyclopedic offerings.

There is not a motivating need to limit WP's scope, and indeed that is why WMF forked off projects like Wiktionary and Wikidata to their own TLDs. The main problem with WP is that it is far easier to be wrong than to be right, and the effort required to be right is linear in the number of words written. In short, the number of editors per page required for acceptable quality does not roll off with large numbers of pages, but stays relatively high, at around 1-2 editors/page.

Worse, the number of moderators per editor does not roll off either. The number of bureaucrats required therefore keeps growing, logarithmically but steadily, and the demands on arbitration committees keep growing. The committees themselves have long ago failed basic principles of legal legibility, leading to sprawling bureaucracy.

One possible solution is to fundamentally alter what we store. Rather than writing thousands of words of prose, we could use Wikidata to automatically generate articles. We already have factboxes which could be largely automatically populated, and many people only care about the factboxes. Prose would be limited to commentary and explication, but would not be the main bodies of articles. This is not just a pipe dream; LMFDB [0] exists and is worth examining as an example of how code and data can automatically generate the bulk of an encyclopedia.

But, let's be honest, the writing was on the wall when Esperanza [1] was dissolved. We are now somewhere between Bureaucracy and The Aftermath.

[0] https://www.lmfdb.org/

[1] https://en.wikipedia.org/wiki/Wikipedia:Esperanza


> One possible solution is to fundamentally alter what we store. Rather than writing thousands of words of prose, we could use Wikidata to automatically generate articles.

This is in fact being proposed at https://meta.wikimedia.org/wiki/Wikimedia_Forum#Proposal_tow... by a prominent Wikidatan. (Edit: follow up at https://meta.wikimedia.org/wiki/Wikilambda and https://meta.wikimedia.org/wiki/Talk:Wikilambda .) However generating sensible articles would require expanding the current Wikidata model, and this is something that should happen gradually and be managed by the WD community itself, not at a separate project. The whole pretty-printing-in-natural-language part is the most speculative by far, and incubating it separately makes more sense.

It's worth noting that Wikidata itself is not "deletionist" other than as implied by verifiability- and sourcing-requirements. Its model is far more general and far more "inclusionist" than even the most permissive visions for Wikipedia.


> There is not a motivating need to limit WP's scope, and indeed that is why WMF forked off projects like Wiktionary and Wikidata to their own TLDs.

This feels like a gross misreading of the facts.

Wiktionary is separate from Wikipedia because its work product is fundamentally different -- it's a dictionary, not an encylopedia. Wikipedia has a lot of articles about things that aren't words, and Wiktionary has a lot of pages for words that wouldn't make sense to write an encyclopedia article about.

Wikidata, meanwhile, is only about raw data. Which is a part of an encyclopedia, but far from all of it. (How would you write an article about the history of Rome using only data?)

> But, let's be honest, the writing was on the wall when Esperanza [1] was dissolved.

This, too, is a gross misstatement of the facts.

Esperanza was dissolved because it was becoming a cabal. It started as a social group, but rapidly grew into its own organization, with its own decision-making processes and elected officials separate from those of Wikipedia. There was widespread agreement, even from Esperanza's founder, that the organization was no longer fulfilling its purpose, and an effort to reform it before it was shut down.

There's a decent summary at: https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...


Wow, the death of hope...


I would attribute it to an inadequate equilibrium, a local minimum that's simply too hard to escape at this point.


> I see year after year, decade after decade that they keep being discussed, with the same points being made over and over again. Is there a name for these?

Because human beings don't see reality accurately, see the science:

https://www.youtube.com/watch?v=PYmi0DLzBdQ


I noticed the decline. About a year ago, during a hackathon, we tried to get an article online. It took us 3 deletes and retries. Then I reached out to a national official at Wikipedia and finally the article got accepted. That's not good...


What was the article? It'll have a history, and we can see for ourselves what happened with it.


Same goes for stackoverflow.

Since the stackoverflow database is freely available I cannot see a single good reason why they haven't been outcompeted years ago.


So I have my own beefs with stackoverflow and the stackexchange network at large, but...

...at some point you have to ask yourself, why haven't they been outcompeted? For every post complaining about stackoverflow moderation or policies, there are probably thousands of people using it every day to do their jobs, and it hasn't been outcompeted!

So if we're honest, we cannot rule out the possibility they are doing something right. They set out to replace closed sites like "Expert Sex Change" -- ok, sorry for the joke, expertsexchange -- and also reduce low quality noise. They succeeded and they are now the gold standard. So why hasn't anyone simply taken their data and forked it?

A great example: softwareengineering.stackexchange -- formerly programmers.stackexchange -- has a troubled history. At times it has decided everything was off-topic there (I'm not joking, there were times where every question on the home page was closed as off-topic), and some long time "inclusionist" and well-intentioned contributors declared they would fork it, and their fork would include everything even tangentially related to programmers and people wouldn't be censored for asking questions about anything.

Where are those forks now?


First mover advantage. It's like asking why no one has created another Reddit, or another HN. When the original exists, a derivative doesn't gain traction unless something goes seriously wrong.

The sole time that Reddit was in danger was when their front page was blacked out over the moderator rebellion. Luckily for them, Voat's servers completely died under the load. Then Reddit came back online, and the rest was history.

I would imagine your questions have straightforward answers along those lines. No one switches because inertia, not quality.

In general, a new thing has to be fundamentally different in some way. It's rare that an internet product is out-competed by someone else doing literally the same thing. Even Reddit was fundamentally different from digg. They were both link aggregators, but that's about where the similarities stopped.


Agreed about the first mover advantage.

Do note StackOverflow is only relatively a first mover: expertsexchange came first, and StackOverflow killed it. Evidently, they did something better.

Wikipedia also had challengers. I don't remember their names (heh) but many years ago there were forks. I don't mean Wikia, but actual alternatives to Wikipedia. Nobody uses them.

> In general, a new thing has to be fundamentally different in some way. It's rare that an internet product is out-competed by someone else doing literally the same thing.

Possibly right, too! But if an inclusionist attitude is not something "fundamentally different" enough to succeed, then might it be that it wasn't important enough to say that deletionism is killing Wikipedia/StackOverflow? If people don't flock to inclusivist forks, maybe inclusivism isn't important enough?


> First mover advantage. It's like asking why no one has created another Reddit, or another HN. When the original exists, a derivative doesn't gain traction unless something goes seriously wrong.

Good point, yet here we are:

- slashdot feels like a shadow of its former self

- reddit replaced digg

- lobste.rs is a healthy alternative in addition to hn

- stackoverflow replaced expertsexchange

I still think someday someone will create a better q/a site and if I was an a position to do a startup to change the world for the better that would probably be high on my list of ideas.

I admit I could be wrong, but I am surprised that nobody has tried as far as I can see: after all the early bird gets the worm but the second mouse gets the cheese, and there's a lot to be learned from what works and doesn't work today.


The Stack Overflow database may be freely available, but the Stack Overflow user-base will probably keep on going to Stack Overflow because that's where the rest of the Stack Overflow user-base goes. Network effects are tough sometimes.


This sort of problem (or its opposite) is inevitable with our internet circa 2020.

Maybe wikipedia is too narrow. Maybe it's too broad. These things don't have singular, indisputable answers...

The problem is that we have an internet of bottlenecks. Wikipedia choices about what is encyclopedic or notable is the only definition of encyclopedic that matters. Youtube's interpretation of fair use, twitter's definition of offensive or facebook's definition of obscene... they're the working definitions.

The internet needs to be less centralised... Even wikipedia.


If Wikipedia is to retain a deletionist editorial cultural, I'd at least like to be able to access the history for deleted entries and read old versions of such entries. As it stands, those entries seem to be permanently removed from public access (and maybe even on the back end.) I don't like that 1984-style versioning, which gives the message that certain entries never existed in the first place.

(I'd understand a small exception for copyright infringing content--that such content should remain unavailable when deleted.)


Does anyone here remember Seth Finkelstein?

https://www.theguardian.com/technology/2008/jul/31/wikipedia

I miss his writing!

edit: His blog is still up: http://sethf.com/infothought/blog/archives/cat_wikipedia.htm... but unfortunately, no new entries since 2013 :(


I have to say this is spot on. The move of things like the Culture spacecraft names into wikia was a killing blow, particularly due to the use of ... dubious... cookies and tracking on wikia.

And while i understand the feeling this does not belong in wikipedia, this was not wrong either or low quality and a good open door for new editors. I could keep going, i have a long list of lost good information from wikipedia over the years. This one is just the one that made me decide to stop using it.


> If you try to write niche articles on certain topics, people will tell you to save it for Wikia. I am not excited or interested in such a parochial project which excludes so many of my interests, which does not want me to go into great depth about even the interests it deems meritorious—and a great many other people are not excited either, especially as they begin to realize that even if you navigate the culture correctly and get your material into Wikipedia, there is far from any guarantee that your contributions will be respected, not deleted, and improved.

I hate to be seen as supporting deletionism, because I've never contributed enough to WP to really have an opinion (I've mostly just expanded on existing articles and fixed obvious vandalism and typos), but I don't understand why this is a bad thing.

I don't like that they're being driven to an ad-supported website, but the simple act of not having Wikipedia eat all of the web seems like a good thing. The Iron Law of Oligarchy will weaken the website just like it's destroyed every other website, with no good reason to believe it'll be different this time. At least if Wikipedia is limited in scope, the damage it does will be limited too.


> I don't like that they're being driven to an ad-supported website, but the simple act of not having Wikipedia eat all of the web seems like a good thing.

No, having all of this content under Wikipedia would be an amazing thing. For all its warts, Wikipedia (When it has the information you are looking for) is the gold standard in what the the web should be. It's amazing that it exists as a website, and compared to the mountains of crud that surrounds it, it has an amazing UX.

The inclusionist argument is that the web would be better if more of its content was available under the same high standards that Wikipedia set.

Competitors can't deliver that same standard because of economies of scale.


> Competitors can't deliver that same standard because of economies of scale.

The invasion of policy-driven jerks seems like a good counterargument; it's an anti-economy of scale, since smaller websites have less of the problem.

edit: Arguing about scale also kind of ignores that there's a good space between the scope of the website being "everything" and having a website dedicated to the IDW "Sniffles the Mouse" comics. IMDB is well-respected, TVTropes has been successful with a very different culture from IMDB, the Arch Linux Wiki has a reputation for being useful even if you aren't technically using Arch, and, as we've already proven, Wikia has pretty much made a business out of this.


Smaller websites have this problem, it's just that the holy wars waged within the Pokemon fandom rarely make it to the HN front page. If anything, the politics get even more personal and vicious in smaller communities.

The business of Wikia may be successful, but as a user, it's complete rubbish.


Most of the important topics were covered years ago. Encyclopedias are not high-maintenance; the maintenance team for Britannica wasn't all that big.

I used to edit Wikipedia quite a bit, but got into other things.


Is it easy to tell what technology gwern uses for his website? I'm not web-savvy, but it looks html looks like it's not generated, but hand crafted. Is is a static website?


As Juped said, there is actually an in depth discussion of the tool used to generate the website on the about page: https://gwern.net/About#tools It looks like static generated website is about correct.


You can find this information on the site


> fundamental cause of the decline is the English Wikipedia’s increasingly narrow attitude as to what are acceptable topics and to what depth those topics can be explored, combined with a narrowed attitude as to what are acceptable sources, where academic & media coverage trumps any consideration of other factors.

This is a result of the state/traditional media asserting more control over internet propaganda. Year by year, what is acceptable on wikipedia, forums, social media, etc have narrowed. Followed by the strong arming of these tech companies into privileging authoritarian/authoritative sources.

Look at what happened to youtube. There was a time when they had great recommendations and you could go into a "deep dive" of youtube for hours. Now, the recommendations has no depth and focused on "late night shows", establishment news and pre-screened content. Google search is no better. Reddit is just advertisement/pr ( celebrity X donated 5 masks to a hospital ) garbage and propaganda.

It's pretty impressive how easily and quickly a handful of "bankrupt" news companies could control trillion dollar tech companies. What you could write, say, etc and what you could watch, hear, etc has been limited in just few years. And the trend doesn't seem to be stopping. Just 10 years ago, nobody could've predicted this could happen.


Gwern's website starts to be next level.




Applications are open for YC Summer 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: