I don't think people realize what a big deal it is that Stanford retrained a LLaMA model, into an instruction-following form, by **cheaply** fine-tuning it on inputs and outputs **from text-davinci-003**. It means: If you allow any sufficiently wide-ranging access to your AI model, even by paid API, you're giving away your business crown jewels to competitors that can then nearly-clone your model without all the hard work you did to build up your own fine-tuning dataset. If you successfully enforce a restriction against commercializing an imitation trained on your I/O - a legal prospect that's never been tested, at this point - that means the competing checkpoints go up on bittorrent. I'm not sure I can convey how much this is a brand new idiom of AI as a technology. Let's put it this way: If you put a lot of work into tweaking the mask of the shoggoth, but then expose your masked shoggoth's API - or possibly just let anyone build up a big-enough database of Qs and As from your shoggoth - then anybody who's brute-forced a *core* *unmasked* shoggoth can gesture to *your* shoggoth and say to *their* shoggoth "look like that one", and poof you no longer have a competitive moat. It's like the thing where if you let an unscrupulous potential competitor get a glimpse of your factory floor, they'll suddenly start producing a similar good - except that they just need a glimpse of the *inputs and outputs* of your factory. Because the kind of good you're producing is a kind of pseudointelligent gloop that gets sculpted; and it costs money and a simple process to produce the gloop, and separately more money and a complicated process to sculpt the gloop; but the raw gloop has enough pseudointelligence that it can stare at other gloop and imitate it. In other words: The AI companies that make profits will be ones that either have a competitive moat not based on the capabilities of their model, OR those which don't expose the underlying inputs and outputs of their model to customers, OR can successfully sue any competitor that engages in shoggoth mask cloning.

Mar 14, 2023 · 9:45 AM UTC

...I apologize to the comments stranded by my edits. I didn't realize/remember that edits worked like that.
That's the obvious implication! Notice of filtered info: I would not have shared a thought about AI business models that pointed in the opposite direction.
Replying to @ESYudkowsky
Interesting. So it might be uneconomical to be the ones investing in the biggest, costliest, cutting edge models? Almost all surplus is going to consumers of the models in that world - lots of cheap AI use cases everywhere, but no big successful AI backbone companies?
By apparent demand, a quote-tweetable repeat of the conclusion:
So: The profitable AI companies will either have a competitive moat not based on the capabilities of their model, OR not expose the underlying inputs and outputs of their model to customers, OR be able to successfully sue any competitor that engages in shoggoth mask cloning.
Dear silly Internet saying "but knowledge distillation is already known": the key idea here is that the fine-tuning above the base model is comparatively much *easier* and *cheaper* to extract and re-imbue, if you have a new base comparable to the earlier base model. (Spelling out the further implications, since they apparently missed the foundational point: this fine-tuning is often based on a proprietary dataset, where the base model rests on something more like Common Crawl. The base model is hugely expensive to train, but straightforward to train off that common data, and Facebook just open-sourced one. The fine-tuning is computationally cheap, but rests on proprietary data that's expensive to produce, so companies were counting it for part of their competitive moat. What we've learned is that this incremental part of the competitive moat can be much more cheaply stolen than you could steal a whole foundation model from distilling its inputs and outputs.) ((Also this is just not how knowledge distillation works to turn big models into small models - that takes the full probability distribution on the output logits, not just text inputs and generative text outputs. In other words, everybody who said "but isn't just this knowledge distillation?" literally didn't read the Wikipedia article on it and apparently has no idea what "knowledge distillation" does.))
*Further* apparently-necessary clarification: Alpaca did not train a model from scratch using only 50,000 GPT-3 input-output pairs and $600. Alpaca *further fine-tuned* an *existing LLaMA model* so that it would have the extra behavior of 'instruction-following'.
You use to talk about safety for humanity two centuries ago
And I'd never have shared that particular thought if I didn't think it helped.
Replying to @ESYudkowsky
That's the obvious implication! Notice of filtered info: I would not have shared a thought about AI business models that pointed in the opposite direction.
Replying to @ESYudkowsky
Is it all that big of a surprise? The model was mostly trained by mimicking the internet in the first place, so it's not like it's all that different from the initial training.
And also people in the past have had success with training models to mimic already-trained models.