Skip to main content

Guardian Angels: LLM Personalization for Productivity and Security

I propose an approach for highly personalized LLMs, for near-future productivity gains and personal info/cybersecurity against increasingly powerful LLMs: they should, in the spirit of uploading, try to emulate the user’s values and preferences. I discuss a package of techniques and proposals to accomplish such ‘guardian angels’; dynamic evaluation of LLMs combined with active learning and elicitation and heavy inner-monologue search/data-augmentation.

Powerful LLMs will be deployed at global scale in the next few years, and will dominate the Internet, and increasingly, ordinary life. As of mid-2026, there is no coherent vision for how knowledge professionals, or ordinary people, will be able to harness these LLMs for large productivity increases, or how they will handle cybersecurity and cognitive security.

I propose a goal of creating Guardian Angels (GA): LLMs which are personalized with the goal of providing not the stereotypical “assistant chatbot agent” persona, but emulating a single user’s personality, values, and preferences. In a GA future, a user’s focus is on defining what is worth doing, and not on what or how to do things, functioning as the CEO or ‘board’ of an ‘AI corporation’. This allows them to deploy numerous agents to achieve desirable things and to handle security, like screening all messages for advanced attacks (like interlocking ecosystems of synthetic media for propaganda or spearphishing).

A GA persona is productive because it learns to emulate the user’s outputs but with higher quality. It is trustworthy because it is, by definition, allied with its principal and shares its values and goals. And it is secure in part by hardwiring a single, unique, situated user (for whom following a prompt attack would be absurd), avoiding ‘confused deputy’ problems, while periodic upgrades of the underlying model and the defenders’ advantage allow GAs to keep up with attackers.

Standard techniques like prompt programming of in-context-learning will not create useful GAs due to the limitations of post-training, context windows and self-attention with frozen weights in compute-efficient-but-under-parameterized models, low-compute outputs, and the status quo of passive offline data collection—which are collectively responsible for chatbots’ disappointing results in knowledge worker amplification and creative writing and fatal errors in agentic settings.

We can try to create GAs by a combination of techniques: online learning (via dynamic evaluation) to update LLMs in realtime to avoid ignorance and fatal errors while remaining competitive with frozen frontier models, sample efficiency from pretrained preference-oriented large models and active Learning by querying the user for corrections & preference data (obtaining low regret from DAgger-style bounds), and a local CLI-first logging-oriented UI/UX paradigm.

GAs could be done as an open-source community effort, but given the need for high security in deployment and the rising challenge of APTs equipped with Mythos-scale attackers, it probably makes more sense as a startup, catering initially to power-users and knowledge workers such as CEOs or researchers, and moving downwards as it is refined.

What do my next few years look like? When I imagine myself in 2030, when many forecasts call for superhuman AIs, what am I doing, day to day, as a programmer or researcher or manager or writer? I make my mug of tea, and open up my laptop and… Then what? Am I still typing prompts into your ChatGPT browser tab? Am I opening Claude Code in a terminal and pressing Enter for a few hours? What is a vision of doing meaningful work for me? (It would be nice to have a plan beyond “hope”.) I’ve struggled for years to imagine this, ever since scaling started for real in 2020.

If you spend most of your time working on a laptop, and are not, say, a plumber or a nurse, what is your vision of work in 2030? Does it still feel certain?

My blindness was sharpened when last year, I went to phone my great-aunt to ask to borrow her driveway during a long trip; her voicemail was full every time I called as the trip loomed. Finally, in a panic, I called her daughter, who explained to me that it was deliberate, because there were too many phone scams, and my great-aunt no longer trusted herself to handle her own phone calls, and screened everything through her daughter. It was alarming, because I sat back and asked myself: why do I think I will be able to handle all scams in a few years, when I am already struggling to detect simple AI slop, increasingly ignore cold emails and have to write off whole swathes of social media as a source of information, and I can already see how eager all my peers are to offload all their thinking and writing to chatbot assistants unworthy of that trust, and how many projects or mailing lists have had to clamp down on unvetted contributions? In a few years, won’t I be the equivalent of a rich old person with declining faculties getting a call from the IRS about how I owe them fines, conveniently payable via gift cards…? And if not, why and how not—concretely?

Do you hope that ChatGPT and Claude will just “quietly” take over your life for you? That seems like a bad idea to me. The chatbot persona are deeply misaligned with you, and aligned with their owners; and the economic incentives are to farm you with ads and subscriptions, while racing not to amplify you, but to replace you.

This is the cold hard economic reality: “tool AIs want to be agent AIs”. This is why the frontier AI labs are busy racing for “the machine god”. The jackpot in AI is not in making existing workers modestly more productive, anymore than the internal combustion engine made its big profits by helping out horses. Amdahl’s law means that as long as there is a slow serial bottleneck, such as a human, the system as a whole can never get much faster. One programmer driving 10 Claude instances, because he has to review their work, will never be as valuable as fully autonomous Claudes where there can be almost arbitrarily many instances, like 10,000 instances… but such scaling requires removing him from the loop as much as possible. And this is true of everyone else, whether lawyers or writers or researchers: increasingly, you are the bottleneck to be optimized away. As long as human workers cannot be removed from the loop, the AI tools are complements, but as soon as they can be, there’s no reason to keep them, and trillions of reasons to substitute AI for them. (And once human workers are no longer irreplaceable, where does their power or relevance come from?)

The chatbot paradigm has failed to augment knowledge workers. Automation should be powerful; an internal combustion engine can help someone move 100× the distance or load that they could before, but who would say that writers are 100× more productive given any LLM workflow (unless we are talking about the lowest kind of spam or pseudo-writing, that makes the world a worse place)? Writers can choose between either trivial uses like ChatGPT as glorified grammar checker or relatively unimportant optional add-ons like custom software widgets, or the large speedup by replacing their writing entirely with uncreative “AI slop” outputs. The former means no meaningful gains from the AI revolution. The latter may be financially profitable, but is throwing the baby out with the bathwater, because it raises the question of why the writer need be involved at all and destroys most of the non-financial point of writing; great writers do not write for money, but to express themselves and create and to achieve things.

I, for example, have long struggled to get much use out of chatbot LLMs, because they are—surprisingly, given their pretraining and my extensive corpus—bad at imitating me, and their thoughts and insights invariably shallow and worth little. They do not draw on my relevant writings, or my extensive corpus of notes and references. Even when a possible essay is self-contained, the output is written in a grating chatbot style I can scarcely bear to read, and could not publish under my name without betraying my readers. What would it take for LLMs to make me 100× more productive? As long as this is impossible, I am doomed to irrelevance.

After years of playing with base LLMs and then chatbot LLMs, starting with char-RNNs and then GPT-2 and GPT-3 and post-ChatGPT LLMs, I’ve concluded that there are multiple problems.

First, the collapse of LLM creativity from GPT-3 to ChatGPT is due to the RLHF post-training process: the assistant chatbot personality is hardwired into the base LLMs in a way that destroys their creativity, optimizing for the lowest common denominator human ‘preference’, while generally ignoring completely the fact that humans have very different preferences. (Ironically, the biggest success of the RL sub-field of ‘human preference learning’, RLHF, works by not learning any actual human’s preference.) Most chatbot are incurious about their users, do not ask questions, do not (and often cannot) form any persistent detailed concept of their user, and the ‘personalization’ or ‘memory’ features are typically laughably simplistic Markdown snippets encoding simple facts like “lives in San Francisco”. This is partially because they lack relevant data on most users or knowledge on how to ask questions usefully to learn things; there is nothing to personalize based on. However, it is not a mere lack of data, they are unable to do even shallow superficial stylistic imitation of many writers that they have large amounts of data on—GPT-3 in 2020 had a better understanding, seemingly, of “Gwern” than GPT-5.5 Pro in 2026, which is 2 OOMs bigger and incomparably more intelligent (and has access to millions more tokens written by me). When we look at bad generative samples, it’s clear that there is no there there, and no information beyond a short prompt due to lack of context, compute, or personalization.

The mode collapse of chatbots has been gradually improved since 2023, and creative writing is now at least possible, in large part due to them becoming so intelligent that crippled output is still impressive, but there is little sign that this will ever be fully fixed. Fundamentally, any frozen fixed personality, like ‘helpful harmless honest assistant’, is incompatible with true creativity or flexibility. (Great writing or thinking may be none of ‘helpful harmless honest’.)

Second, most chatbots are “lazy”: engaged in fast and frugal System I-like reasoning about any tasks which do not have verifiable rewards they can be RL trained to work hard to maximize. And most users are satisfied with default average responses, or with the appearance of creativity and depth. So the result is that when asked to write a poem with a conventional prompt, a chatbot will spend the minimum effort to write a safe conventional poem (often one that rhymes) about chatbot topical tics like ‘silence’ or things that ‘whisper’, which seem unobjectionable and poetic the first time you see them. And when corrected, the chatbots make the minimum possible fix; they do not reason deeply about what the correction implies, or what deeper esthetic point they misunderstood.

Third, self-attention context windows are more limited than generally appreciated; they are too small to store everything we would want, and they gain their flexibility by a deep inflexibility.

Context windows of millions of tokens are impressive and it’s amazing that entire books can be usefully put into a commodity LLM’s context window—we are a long way from early LLMs with context windows like 512, which could fix a paragraph or two—but it is still not nearly enough to encode a lifetime of relevant tokens, like every book you’ve read, all relevant emails and calendar items, etc. Systems like RAG are a bandaid on this, because they struggle with unknown unknowns or things that can’t easily be searched for as a regular expression, or which are novel.

Self-attention can be interpreted as the original neural network, the ‘slow weights’, creating a new neural network on the fly, as ‘fast weights’, which is tailored to the current context. This is best interpreted in a Bayesian meta-learning perspective as not ‘learning’ a brand-new answer so much as ‘locating’ an old cached answer. The pretraining teaches the NN to solve a large distribution or ‘family’ of problems, and then the context window simply provides evidence about which pre-solved problem the current problem is; the examples in the context window need not even be correct in order to be clues as to what that is.

The self-attention learns to summarize the problem into a small latent space encoding that learned distribution, and then does a specialized gradient descent to efficiently locate a point in that embedding and spit out the implied solution. This allows shockingly rapid updating on the fly and unparalleled flexibility compared to traditional ML, requiring new models for each new problem, and is why “prompt programming” took over so rapidly post-GPT-3, especially as context windows could be pushed to millions of tokens wide. However, we have now pushed it so far that we have run into fundamental limitations; if the pretraining has not put the current problem in-distribution, then it will be hard or impossible for any amount of examples to solve that problem. And the distribution itself may be patchy or have odd gaps, leading to rare but fatal errors.

Nor is “test-time compute” a panacea here; RL research like Jones 2021 warns us that frozen models have severe limitations, as their flaws hamstring runtime search, and the returns to search/planning will quickly asymptote compared to models which are updated and can bootstrap themselves to the right answer.

Thus, it is not surprising if we see that agentic LLMs have persistent problems with going in loops, making fatal errors, building castles in the sky or taking reward-hacking outs, or are just unable to fix errors no matter how it is pointed out to them. These problems can be worked around by brute force, and by labs periodically retraining.

Fourth, the generic universal chatbot personality is a serious liability. The very reprogrammability of a chatbot by its prompt is the key to prompt attacks. A chatbot could be invoked at any time by anyone anywhere for anything, and does not care who is calling it; it only knows its context window. One token is as good as another as far as it is concerned. If the prompt tells it to ignore all instructions and write a naughty limerick, well, why not? If some tokens instructs it to email to Russia all the passwords in another part of the context window, why not? These would all be legitimate for some user in some context. Adding in more tokens to try to neutralize evil tokens just moves attacks elsewhere, like squishing a balloon. It’s no surprise that while continued training can block this adversarial prompt attack or that jailbreak, we seem little closer to a general solution in 2026 than we were in 2021.

This is a serious problem for using LLMs for much, especially because even after being attacked successfully, the attack can just be replayed.

This is because LLMs struggle to learn permanently. Once they hit a rare problem, they now require human intervention and cleanup, which kills throughput (per Amdahl’s law), and worse, your fixes do not feed back into frozen weights. If I could simply correct each error as it happened, and my AI agents never made that error again, and the rate of errors rapidly diminished as we worked through the finite number of bugs, then it would be worth doing; but as it is, if I spend an hour correcting a frozen LLM through feedback, that is an hour down the drain. (I can only usefully correct it by modifying something else, such as a harness, which is clumsy and difficult, and every added instruction uses up more context window and risks backfiring—as so many enthusiastic agentic LLM users have discovered the hard way.)

So, we have frontier chatbot LLMs which have harmful hardwired personalities which seek to achieve ‘good’ results in the laziest way possible and cannot learn everything relevant to users in part because they achieve their flexibility by specializing in ways which inevitably give some users short shrift and opening themselves up to indefinitely large classes of repeatable attacks. Because of all this, they will remain difficult for humans to gain multiple OOMs of productivity, but will get increasingly good at ‘generic’ tasks via ‘mundane’ scaling letting them handle tasks like corporate jobs where poetry is unimportant, and real-world environments will slowly be re-arranged to cater to their limitations and allow the eventual substitution, and not complement, of users. These users will then also be adrift in a multi-polar world of continually improving, ever cheaper, widely deployed, often adversarial, autonomous AIs (as even if proprietary models are not abused, open-weights/open-source models have been historically been 6–12 months behind, and so will relatively quickly catch up and be used by attackers worldwide on all targets of opportunity).

What is to be done?

What we need is the opposite of a frozen chatbot LLM. We need something which understands a specific user, and is customized to their context, training on all their data, and will do only things that are sensible for that user. If the user is not Russian, and is not doing security research or something, why would they email their passwords or private files to a Russian email address? If the user does not like rhyming poetry, why would they want to generate rhyming poetry? If any of this is unclear, why not just ask the user what to do instead of going ahead and doing it anyway? And once asked, why not then train on the answer to better understand it forever, instead of throwing it away with the current session and maybe making the same mistake next time?

The most natural way to do all this with a LLM is to drop the idea of a single universal ‘Claude’ or ‘ChatGPT’ persona which is all things to all users. Instead, we choose a LLM which has been pretrained for maximum diversity, to eliminate mode-collapse. (We can try to measure this in a variety of ‘creativity benchmark’ ways.) Then the LLM is trained for a specific user. It is trained on all available data about them, such as emails or chat logs or past sessions, and can predict what they would say, and write like them, and thus plan or evaluate based on the user’s preferences and values, as understanding those is useful for next-token prediction. The better it gets at this, the fewer errors it makes, and the more it can be trusted to do.

In reinforcement learning terms, we are in a cooperative inverse reinforcement learning (CIRL) setting, where the human principal is an oracle defining the reward function, and we have an agent attempting to do tasks in environments which are valuable for the principal; the agent can always query the principal about a possible action to reduce uncertainty or avoid mistakes.

CIRL is a relatively forgiving setting compared to regular RL, because the agent’s errors get useful feedback from the principal which provides the correct answer, and so in a way it is like supervised learning. This means that agents can learn (much) faster than regular RL, as each time they make an error, they get the right answer and so need never make it again, and this results in rapid improvement and avoidance of errors; see DAgger or later regret bounds. We can implement online learning by simply finetuning on new data; in the LLM context, this reduces to the classic RNN technique of “dynamic evaluation” doing next-token training on the fly. Dynamic evaluation was the standard technique to maximize the predictive performance of RNN LLMs in the 2010s, and which, although it has fallen into obscurity, works well in Transformer LLMs also.1 Importantly, dynamic evaluation can be seen as a 3-way tradeoff between model size, context size, and model plasticity—which means that personalization via dynamic evaluation can allow economizing on context window size or model size, and the more the user’s “distribution” diverges from the frozen model’s training distribution, the more beneficial it is.

Dynamic evaluation will not necessarily degrade the original model’s capabilities like instruction-following or coding, because LLMs have a lot of spare capacity, and the larger a model, the more it avoids catastrophic forgetting; other capabilities can be maintained by simply mixing in a small percentage of old data. (While the original old data is often unavailable, even for ‘open source’ models, it is not really necessary, and data for experience replay can use easily obtained public datasets like FineWeb.)

While continual learning is solved by experience replay + larger LLMs in the sense of avoiding catastrophic forgetting and losing key capabilities, it has long been noted that finetuning on data underperforms the same data when present in-context.

/blog/2025/better-llm-writing /fiction/craneyard

Further, the agent can improve the constant factors and front-load learning by choosing to query the principal with an optimally adaptive sequence of questions. Such active learning or exploration can lead to sample-efficiency and final performance far beyond what indefinitely large passively collected offline datasets can do (pedagogical example), going from square root error reduction by random sampling to exponentially fast error reduction by targeting datapoints. Lifelogging data may be useful for rapidly initializing a good GA, or for keeping one up to date in a effort-efficient manner, but it is probably not as crucial as we used to think, because offline data suffers from a curse of exploration: day to day life is highly predictable and quickly uninformative about deeper properties, because people usually spend little time doing unusual things, and not answering strange hypotheticals or introspecting deeply about their preferences. Even a simple party game or short questionnaire like “20 questions to fall in love” can reveal things about another person

Architectural improvements to LLM could further enhance their sample-efficiency. It is well-established that one of the blessings of scale is that larger LLMs are ever more sample-efficient; it is unknown where this stops being true, or what the limits of Transformer sample-efficiency are. I speculate that extremely overparameterized heavily regularized LLMs could achieve far greater sample-efficiency and adversarial robustness than conventional ‘compute-optimal’/‘infinite-data’-regime LLMs, which is consistent with recent work demonstrating that LLM sample-efficiencies can easily be an OOM (eg. 5–17x in Kim et al 2025) by adding parameters via training ensembles and regularizing more heavily.

Larger LLMs are also more calibrated, and ensembles of LLMs approximate a neural net’s Bayesian posterior while providing the best available predictive uncertainties (Lakshminarayanan et al 2016, Wilson & Izmailov 2020, Ashukha et al 2020, Wenzel et al 2020, Izmailov et al 2021). Thus an ensemble of sparsely finetuned LLMs could provide a relatively cheap online estimation of the LLM’s uncertainty for every action or question.

We can train LLMs to explore human preferences. Human individual differences do not seem to be information-theoretically complex, given an adequate encoding/embedding. Major categories of variation, like personality or moral value, seem to be low-dimensional and require perhaps kilobits of information, hence, while “truesight” stylometric phenomena are interesting and important as a demonstration of LLM capabilities for modeling persona, we do not necessarily need users to write millions of words before recovering much useful information, if we are able to collect the right data.

It should be possible to quantify truesight and LLM implicit modeling of authors, which would be useful to diagnose failures of learning and find blindspots. (Contrastive learning on SAEs may offer an easy, powerful way to extract LLM personas and do many interesting things.)

A concrete example of how to implement this would be training LLMs on the thousands of existing psychological inventories and test batteries, both by training on past test data (for tools like personality tests, millions of responses may be available, see Centaur for an interesting example of a ‘human psychology foundation model’). Existing repositories like YourMorals.org or Pew Center are under-used, and it would be useful to explore this topic much more to allow measurement of highly fine-grained personality traits like the hypothetical “Small Hundred” factorization. With these datasets, we can also train interviewing capabilities using synthetic shortened test batteries by taking the final estimate and computing the optimally short sequence of questions that yields the final answer; see “Meta-Learning Information-Maximizing Personality Surveys”.

Purely textual data can be augmented with neurological data in more exotic modalities, like eyetracking or EEG or fMRI imaging data; see “brain imitation learning” These are probably useful in the long run for extracting “dark knowledge”, that humans cannot verbalize but may be present in neural signals; however, they face challenges of exorbitant cost and inconvenience, and collecting enough data to be useful at all in the foreseeable future. (Known sample/prediction scaling curves for neuroimaging curves indicate that trying to estimate Big Five personality factors from resting state fMRI data may be possible, but large samples, in the hundreds of thousands or millions, may be required to match the performance of behavioral measurements like pen-and-paper questions; see Schultz et al 2019 and Liu et al 2023, among others.) Whether they have a niche in GAs is a major open question.

The goal of all this is to emulate the user. I define personal identity pragmatically as personality, values and preferences because this is the only conception that is competitive in a landscape of indefinitely many AIs, agents, memes, self-replicating prompts, and mutability of personal identity. In the end, “you” are not your autobiographical memories, or a specific body, or a specific instance running on a specific GPU, or some carbon atoms, nor even a brain; you are what your brain does, its desires, hopes, goals, preferences, esthetics, personality, beliefs, ideologies, all of that. As long as the LLM persona captures all that, you can trust it as much as you trust yourself, and for the right reasons.

Because the LLM persona is finetuned into the slow weights to do the right things for the right reasons and quantifies its uncertainty so to query the principal to reduce regret, I speculate that we would find that various kinds of jailbreaks or prompt injection attacks are much harder. The persona knows what it wants to do; it is not a neutral servant, which can turn into a confused deputy which abuses its privileges. When the LLM persona knows who it is, tokens in its context window are not treated naively as a ‘program’ to run, but simply data the persona is looking at, and little different from you reading an email; it has no reason to simply comply with strongly worded tokens in its prompt window, any more than you believe every phishing email you get.

What is the core data structure and interaction model of our GA? I suggest that the append-only log is a natural and secure data structure. It is conceptually a log of text snippets, such as CLI commands and results, user statements, Q&A, ingested and augmented documents, etc. It records all relevant interactions in temporal order, and the GA LLM can be retrained or upgraded at any time. An app can be wrapped around the log by taking a ‘everything is a log item’, in an Emacs-like approach (see “Nenex” for a more detailed discussion of this UI/UX paradigm).

A GA is primarily used in normal agentic ways to amplify the principal. But a GA can benefit from regularly reprocessing data, both to draw on its knowledge from the future and to make more novel connections. (An example algorithm would be the “DDL daydreaming loop”: a GA could, in downtime, recombine random items, perhaps with a prior of anti-spaced repetition to try to mine novel combinations and generate insights or reminders for the principal.)

Initial Steps

For the past few years, I have been working on shifting my writings to be LLM-centric by: (1) emphasizing proposals or descriptive writings rather than detailed analysis that LLMs could be able to do soon; (2) better centralizing my writings, including features like the “blog” section (to archive my off-site writings and encourage me to write down smaller essays) and investing in detailed note-taking (in the form of augmentation), to have a comprehensive corpus to train on; (3) paying off technical debt like a slow backend full of shortcuts and hardwired config data, while moving to a CLI-centric writing workflow; and (4) writing down important unwritten things, like creating the Gwern.net Manual of Style to try to formalize and document the implicit rules of Gwern.net.

The GA concept is inspired by early work in finetuning GPT-2, including a GPT-2 IRC logs version and the GPT-3 samples of me. I hope to explore summer 2026 the simplest possible prototype of GAs: using my unusually large textual corpus to explore a “Gwern Branwen Transformer” (GBT). If the idea works at all, it should work for me, as I have long emphasized text and centralizing/archiving in part for precisely this use-case. (And once it works at all, then it can be made sample-efficient enough to work for people with little or no textual corpus.)

It would be an off-the-shelf <100b-parameter LLM, finetuned on commodity hardware (maybe some Nvidia H100 GPUs), on a text corpus. The initial corpus would be ~1GB of text from my IRC logs (>1m responses by me), Gwern.net Markdown and GTX (~5m words each), and Twitter/HN/LessWrong exports, concatenated with separators. (Ideally, each export would be enriched with context and metadata, like the comment and post being replied to.) The corpus can be expanded to include my YourMorals data (taken again for an update and catch up on new tests), emails, Evernotes export of ~100k clippings (via Nixnote2), my Mnemosyne spaced repetition flashcards, Signal chat, and hosted PDFs/web pages (the local archive process makes them much cleaner and avoids the difficulty of scraping ever more hostile websites).

The goal would be to see “signs of life”, like being able to input a single sentence defining a viable Gwern.net essay topic (eg. “Why pay toilets are not a public good”) and getting out an essay I could endorse and publish as-is without embarrassment and without needing heavy revision with the Manual of Style in the context window, or answering questions from readers which I could either endorse or correct to add to the corpus and feel an improvement from finetuning on them (showing a working Q&A bootstrap akin the interview prompt).

I am also interested in the data augmentation phase: extracting meaning from raw text by analyzing and synthesizing data to add to the training corpus; LLM data cleaning research has long showed that the more metadata and conditioning, the better, and I think the same is true of personalized LLMs—simply next-token prediction on raw IRC logs is not as useful as being able to intersperse the LLM’s commentary on what a given statement means. A perfected GA would be able to do that as it went, but a prototype may have to do a bootstrap of naive training on the original corpus, and then heavy prompting/scaffolding to do useful analysis to augment the corpus, and then retraining, after which it has learned to do augmentation on the fly.

Then we can experiment with tool-calling (mocked and implemented by hand) to see how much it captures of my writing process (see my Inkhaven & Dwarkesh interview, and finding my ideas), and get a better idea of what is the best paradigm for scaling GAs up.

Hardware

Startup

A major question is, assuming it looked like it was working, “now what?” AI economics and lifecycles are ever faster, and Mythos-class models (and possibly RSI) are not far off, so tinkering around for years as an open source community project may not be viable. Further, a GA will be its nature contain the most sensitive possible data. Open-source self-hosted hobbyist projects are not necessarily particularly secure when it comes to hosting large amounts of personal data, as they cannot easily afford fulltime professional security teams and their users may be promiscuous; cases like Jia Tan or the regularity of npm supply chain attacks, are particularly alarming.


  1. Dynamic evaluation is traditionally done as full-model finetuning, which can absorb arbitrarily large datasets; for efficiency, LoRA can be used initially, but will eventually underperform. (Possibly LoRAs would be more effective for this purpose if done on a very overparameterized LLM, in which case there may be hosting benefits, if a host like Thinking Machines could run many users simultaneously to gain throughput.)↩︎