“Utext: Rich Unicode Documents”, Gwern2023-10-08 (, , )⁠:

An esoteric document proposal: abuse Unicode to create the fanciest possible ‘plain text’ documents.

Utext is a proposed esoteric-document format for typographically-rich documents (‘utexts’) under the constraint that they are pure UTF-8 text files. Utext is a Unicode answer to the typography maximalist question: “what is the most advanced (or at least, interesting) document that can be generated by (ab)using the full range of obscure capabilities provided by contemporary UTF-8? What is ‘plain text’ that is not so plain?”

I outline the inline & block formatting features that Unicode enables (comparable to popular formats like Markdown → HTML), and more advanced features that Utext could target: for better layout and saving text-artist labor, Utext could exploit text modification using large language models (LLMs) and ASCII image generation with neural nets. LLMs could rewrite text to replace words with synonyms or tweak punctuation for better line-justification. ASCII images could be generated from arbitrary image inputs or text prompts.

I note one should store together both Utext ‘source’ & ‘compiled’ text, which would greatly enhance upgradeability, accessibility, and community-building, by letting readers see & re-compile the source in addition to the final ‘compiled’ version. This further allows for interesting line-oriented text formats, which allow live WYSIWG editing, in-place version-control, or can stream over the network (opening up applications like simple chat rooms).

But probably the best output format would be as a narrow subset of HTML, turning it into hypertext and making it usable as a website, through judicious use of <pre> tags.