Skip to main content

Manual of Style

Style guide documentation of Gwern.net writing conventions for essays and code.

This page is the Manual of Style for Gwern.net, defining house conventions for prose, typography, and citations in a terse “classic style”. It is written to be usable by humans and tooling, including third-party editors and LLMs.

It specifies Pandoc Markdown and HTML practices, including an “iceberg” information hierarchy (abstracts, margin notes, footnotes, collapses, and appendices) designed to keep pages dense but navigable. It also standardizes linking and citation behavior (minimal Surname Year links, deep anchors, and metadata for popups and archiving).

It codifies presentation rules for tables, figures, and code (language-tagged blocks, Bash/Haskell/Elisp conventions, and lint-friendly source formatting), plus file naming and format whitelists for long-term stability. It includes an explicit policy for generative media: permitted only with human editing, clear provenance, and aggressive removal of model tells.

The goal is “Long Content”: durable, self-documenting hypertext that compiles cleanly, reads well in source control, resists link rot, and stays maintainable for decades.

This is a style guide for Gwern.net, documenting formatting/writing/coding preferences and do/do-nots. It is intended primarily for third-parties like LLMs, in the spirit of a .cursor/rules file.

Background: Design principles, rejected designs, live functionality tests, subscripts, thoughts on how to write for LLMs; the English Wikipedia MoS.

Condensed MoS

The 2026 Gwern.net style mandates a terse, “classic style”: analytic, declarative, and unhedged, prioritizing clarity, precision, and long-term consistency. It is written in Pandoc Markdown compiled to HTML5.

Adhere strictly to American spelling (silently editing quotes for consistency, originals archived), metric units (providing conversions for quoted imperial), the Oxford comma, and logical quotation. Use single-spacing after periods. Employ Kesselman estimative words for probabilities. Write statistical-significance testing (hyphenated) and replace “Type I/II error” with false positive/negative. Acronyms drop periods (CIA, AI), and personal titles (Mr./Ms.) are omitted; unfamiliar terms should be spelled out and bold-defined on first use, Wikipedia-style. Inline citations are minimal: Surname Year[a–z], always linked to fulltext (ideally locally archived PDFs with page-specific anchors like #page=N); these generate popups and a subscripted ellipse format, entirely replacing a separate “References” list. Numbers over ~1,000 or those confusable with years get digit-group commas (eg. $1,234.56); currency requires inflation-adjustment syntax like [$1]($2026) or Bitcoin date-stamping [₿1](₿2026-01-01). Dates are YYYY-MM-DD or “8 May 2026”. For source readability and version control, use “ventilated prose”: one sentence per source line, with paragraphs separated by a double-newline. The emphasis cycle for nested highlighting is strongitalicsspan.smallcaps (repeating indefinitely).

Pages follow an “iceberg” model of information density: an initial div.abstract (a blockquote, broken into multiple blocks ideally following scientific structure: background, methods, results, conclusion) precedes a left-to-right hierarchy of detail—margin notes (brief, left-aligned paragraph summaries; if multiple, they form a section’s micro-ToC), then paragraphs, concise footnotes/sidenotes (≤200 words for digressions, never simple citations), expandable div.collapse elements for longer asides or excerpts (≤500 words, with .abstract-collapse for summary text), and finally appendices (which also start with an abstract). Custom HTML, favoring explicit <div>/<span> wrappers for control and consistency over potentially problematic standard tags (eg. <details>, <abbr>), uses general → specific class names (eg. link-live) and -not negations (eg. ); the most specific metadata (eg. a link’s class attribute) overrides site-wide configurations. Links may be enriched with a title="‘Title’, Author Year" attribute (primarily for the author’s editing convenience, secondarily as a fallback tooltip; eg. [display text](/url "'Essay Title', Smith 2026")), automatically scanned for local archiving to combat link-rot, and assigned a file-type/domain icon unless explicitly suppressed (eg. .icon-not). URLs are short, non-pluralized slugs (eg. /sidenote) or precisely anchored deep links. Files follow YYYY-surname[-description][-nth].ext naming (eg. 2025-01-01-gwern-gpt4o-frogmeme-desc.png for enhanced findability) using a conservative whitelist of approved formats (eg. JPG/PNG, XZ-tarballs, PDFs over DjVu) selected for stability and security.

Images are presented within <figure> elements, which may feature detailed, multi-part captions (**Figure X**: _Summary statement._<br>(*A*) Detail one. (*B*) Detail two.), and support click-to-zoom carousels; dark-mode inversion is controlled by InvertOrNot.com by default but can be overridden with .invert/.invert-not classes. Lists are always logically ordered (by similarity, importance, or alphabetically); if containing >6 short items (<30 characters), they should use a two-column layout via <div class="columns">.

Code blocks must specify a language for syntax highlighting, include comments focused on the “why” and “why not,” and compile/run cleanly. Specifically: Bash scripts must set -e (explicitly ignoring errors with || true) and use long flags (eg. sort --unique); Haskell compiles with ghc -Wall -Werror, employing fully-enumerated, standard qualified imports; Emacs Lisp must produce zero byte-compile warnings.

Generative AI output (text or images) is permissible only after rigorous human polishing to meet high-quality standards, with explicit labeling in the body or via filename/caption (recording model and date, eg. 2025-gwern-gpt4o5-concept.png), and meticulous removal of common artifacts or stereotypical stylistic tells (eg. no “sepia GPT-4o images” or “delve”). Typographic flourishes—such as topic-specific dropcaps (consult /dropcap for current usage and aim for unused letters), span.smallcaps for emphasis, epigraphs (italicized quote, roman attribution), or admonitions (div.admonition [tip/note/warning/error])—must be “earned” by genuinely enhancing readability or information density, not used merely decoratively.

The overarching goal is durable, self-documenting hypertext: precise and unambiguous in prose, explicit and robust in code, and maximally resistant to link-rot or stylistic drift for decades.

Writing

  • The intended style and attitude is analytical, inquisitive, and precise, despite exploring complex topics, in the “classic style” of Western writing.

    • constant level novelty: the level of formality should be inverse to the topic’s novelty: the weirder something is, the more formal. For ‘safer’ topics, one should cut loose with the humor, epigraphs, typographical stunts and experiments, etc.

    • I try to avoid hedging and qualifying, even at the risk of making overly-strong claims. It is a slippery slope.

  • Hide details using site features: Because it is so reference-heavy, without great attention to reducing reader fatigue, it risks becoming an unreadable sea of citations & opaque hyperlinks & blockquotes: the reader needs assistance navigating all the links, in the form of link-icons, deferring content to popups, collapses, standardization of vocabulary, and so on.

    These features may strike the first-time reader as clutter, but they are designed for the power-user. After enough experience, they will come to appreciate it.

  • Long-term maintenance matters: The readability of the Markdown source code is almost as important as the final product.

    If the author cannot read it, then they cannot easily improve it, and they will not enjoy writing, and risk aversion or burnout.

    And if machines cannot read it, then bugs cannot be detected, new features added, or regressions avoided; broken syntax and links, spelling errors, visual glitches, and other subtle issues will pile up over time, unnoticed.

  • American spelling; I take the liberty of silently editing even quotes & titles in the name of consistency, unless there is a specific reason to preserve the original spelling, eg. in poetry.

    (Since I always provide an archive of the original fulltext, I do not hesitate to modify the writing version to make it easier for the reader.)

  • metric units by default. (If a quote, should be silently edited to metric; if an idiom, then leave it alone.)

  • Oxford comma

  • Logical quotation

    • Ampersand operator: “&” vs “and” is used to disambiguate uses of logical operators like “and”, where the intended nesting might be unclear (eg. if one meant X AND (Y OR Z), or (X AND Y) OR Z).

      (Compare “Link titles may include the author and date or source identifier.” versus “Link titles may include the author & date or source identifier.”)

  • Editorial comments, whether in text or the UI/UX, are written in square brackets.

    They are marked up using div/span.editorial for styling differently from regular body text (example).

    In Markdown (but not HTML) files, because of the syntactic risk of double-brackets, all inner brackets should be escaped; do not write [[commentary]]{.editorial} but always write the safer, more explicit [\[commentary\]]{.editorial}.

    div.editorial is also useful for providing lengthy descriptions of files or images (especially AI-generated ones; eg.).

  • Editorial elisions: ellipsis, but without whitespace and not in brackets (“A…B”, rather than “A … B” or “A […] B”).

    Given the heavy use of excerpts, brackets would be obtrusive, and omitting whitespace both saves space and is not ambiguous with the use of ellipsis for trailing off (“A…B” ≠ “A… B”).

  • Inline author/year citations: When writing formal citations (as opposed to normal anchor text), citations are written in the minimal possible form in the Markdown: “Surname Year[a–z]”, “Surname-1 & Surname-2 Year[a–z]”, or “Surname et al Year[a–z]”.

    The disambiguation suffix “[a–z]” is assigned in the order of first-use-on-site. So for example, if I cited “John Smith2020” and then later “Jane Smith2020”, these would be “Smith2020a”/“Smith2020b” respectively.

    They are not written in parenthetical form; instances in quoted text like annotations must be rewritten into the Gwern.net citation style.

    The citation style is important because those will be automatically detected & compiled into the subscripted ellipse form I developed for easier reading. For details on the Pandoc API conversion from the written Markdown Foo et al 2026 to the displayed “Foo et al 2026” form, see Typography.hs.

    The first use of a citation should always be to a fulltext URL, preferably annotated; for example, [concept name](/doc/topic/source.pdf "‘Full Title’, Author Year"). (URLs are turned into a bibliography automatically, removing the need for a manual one.)

    Self-citations are usually written however convenient; they are usually not written as “Gwern YYYY”, however, but something more natural like “As I previously wrote…”.

  • Acronyms/initialisms remove periods, as unnecessary (eg. ‘CIA’, not ‘C​.I​.A​.’; ‘AI’, not ‘A​.I​.’)

    • similarly, titles should be removed. While the New York Times may insist on always referring to “Mr. Altman, CEO of OpenAI”, the rest of us find “Altman, CEO of OpenAI” easier to read.

    • Latin abbreviations keep periods (eg. ‘eg.’, ‘ie.’, ‘cf.’, ‘etc.’); do not italicize until a fullblown foreign phrase

      It’s easier to write without a period, but I find they just look odd that way, because they are not English.

    • optional: more unusual terms are defined in bold on their first use, Wikipedia-style. (For other terms, the popup annotation is considered adequate—a reader unfamiliar with them can simply pop them up and find out.) Example: “The National Aeronautics and Space Administration (NASA) is an independent agency…”

  • Science:

    • Statistics:

      • “Statistical-significance testing” terminology: always written with a hyphen and ‘statistical’, to emphasize that these technical terms mean far less than they seem, and reduce mistaken interpretations

        “Type I/II error” is also banned in favor of “false positive/negative”

      • Latent variables like factors are always capitalized to emphasize that they also do not necessarily to the laymen understanding of the word.

        For example, the Big Five personality factor “Conscientiousness”, which is a highly technical and specific measurement with flaws and limitations, is not necessarily what a non-psychometrician understands by the word “conscientiousness”, and it is misleading to write something like “we measure soldiers’ conscientiousness and predict future career success…”

      • Probability confidence terms: try to use the Kesselman estimative words (“certain” “highly likely” “likely” “possible” “unlikely” “highly unlikely” “remote” “impossible”).

        Our set of estimative words includes the additional “fiction”, “log” (data, experiences, memoirs etc.), “emotional” (feelings, self-expression).

    • Chemistry: chirality is written with smallcaps, eg. the left-handed form of the amino acid theanine is written “l-theanine” (note that ‘l’ is lowercase because uppercase does nothing when smallcaps, ie. <span class="smallcaps">l-</span>).

  • Numbers: comma-separated. (Especially if they may be confused for a year.) Digits are preferred for compactness for numbers >1.

    • Units: prefer compactness eg. ‘55s’ rather than ‘0m55s’. ‘Approximately’ can be replaced with ‘~’; similarly ‘>’ can replace ‘greater than’, ‘more than’, ‘at least’, ‘higher than’, etc. (Do not write out inequalities with HTML entities like &gt; unless writing raw HTML or in a dangerous context.)

    • Common unit bases: when mixing units like ‘billions’ or ‘millions’, try to convert them to a common base unit for easier intuitive comparison & subitizing. It is hard to compare ‘$1 trillion’ to ‘$100 million’, but easy to compare ‘$1,000 billion’ to ‘$0.1 billion’.

    • Scientific notation: do not write a number like 1.5e3 except in source code literals; prefer either decimal like ‘1,500’ or full scientific notation like ‘1.5 × 103’.

  • Foreign phrases or words or sentences: italicized if not naturalized or familiar to an educated English reader; “tsunami” or “etc.” are not italicized, but “pluralis auctoris” is.

  • Dates: dates are written either ‘YYYY-MM-DD’ or ‘Day Month Year’. The former is preferred for data/table/etc., but it can be awkward in prose, where the latter is acceptable. This allows easier machine parsing and eliminates ambiguity.

  • Sentences: single-space after the period, not typewriter double-space.

  • Semantic line breaks/“ventilated prose”: paragraphs are separated by double-newlines, and every sentence is separated by a newline. (ie. This is a sentence.\nThis is another sentence in the same paragraph.\n\nThis is a new paragraph.)

    This “ventilated prose” makes it easier to edit & read the Markdown source & diffs. This is not enforced due to context-dependence and not wanting to necessarily break at short sentences.

  • Pluralis auctoris: use “I” when describing something specific that the author did, like running an experiment (unless it was a collaboration, in which case it must be “we”); use a plural auctoris “we” when it’s a general discussion the reader is part of.

    For example, “I” run a self-experiment on a drug, but “we” read an excerpted passage from a novel and draw a critical conclusion from it; or in this MoS, I do not expect the reader to agree with many choices, and I implicitly exclude them.

  • Profanity: profanity like ‘f—k’ or ‘d—n’ is censored with em-dashes, in keeping with the semi-academic style (and because it amuses me to borrow old-timey Victorian writing conventions)

  • Ampersand abbreviation: “and” should be abbreviated as “&” for logic disambiguation, where “&” binds more tightly.

  • Section cross-references: when link or citing, the word “Section” is abbreviated with the SECTION SIGN “§”

Structure

Excluding collapsed text, ideally lengths would look like this: footnotes should be <200 words; annotation commentary should be <1,000 words; ‘blog’ posts should be <1,500 words; essays should be <10,000 words. Past those lengths, they should probably be refactored or ‘promoted’ (annotations can be split using the “anchor trick”); much below, and they may be better ‘demoted’ and moved elsewhere.

Pages should be information-dense “icebergs”: relatively short, with few blockquotes, but with many links and excerpts and related material hidden just a mouse-hover away in popups and collapses, and available through the annotated link-bibliographies, backlinks section, and similar-links reading list.

Section headers are in mixed title case. (Title capitalization is otherwise left alone—I don’t have a strong feeling on whether they should be sentence case or title case or some other capitalization.) Headers should not be more than 6 words long, to minimize line-wrapping in the Table of Contents (ToC). As the ToC is auto-generated by Pandoc, it is hard to adjust or tweak.

Sections should be a large block element like a program, or at least two paragraphs long. (They should not be 1 sentence long.)

Sections should be structured by level of detail, roughly going left from right: section title → margin note → paragraph → footnote/sidenote → collapsed elements → excerpts or writing inside popups. This is paralleled by a hierarchy of digression: footnotes/sidenotes < collapses < appendixes

A “See Also” section can be added at the end to link relevant on-site essays not already linked; relevant external links or documents can be included as an “External Links” section.

Unfinished or draft or future sections should be commented out using HTML comments: <!-- TODO -->.

Markdown

Essays are written as standalone Pandoc Markdown files. The Markdown filename is /directory/slug.md, where a slug is a Unix-style lowercase alphanumeric hyphenated abbreviation or mnemonic of the page contents; eg. this page is /style-guide.md. Directories are likewise; directories are not heavily used for essays aside from a few exceptions like /review/, /newsletter/, and /fiction/. (They are heavily used for organizing files & document in the tag-category hierarchy.)

  • Lists:

    • Keywords should be emphasized in a 3-cycle by depth: bold for top-level, then italics for second-level, then smallcaps for third-level, then bold for fourth-level, italics fifth-level etc.

    • Unordered lists: should be sorted meaningfully, such as by similarity; if no order, alphabetical.

    • Ordered lists: written using #. syntax for auto-numbering unless specific numbers required (eg. in a quote from a numbered list, like a list of aphorisms)

      Ordered inline lists should use fully parenthesized integers, Oxford comma separated, like “(1) one, (2) two, (3) three”. (This help ensure scannability, un-ambiguity, and automatic balance-checking.)

    • Columns: lists with many short items can be laid out in multiple columns using div.columns. (These can be given IDs & linked, transcluded, collapsed, etc.) This responsively creates 1–3 columns.

  • Headers: Header IDs must be overridden under two circumstances:

    • No ID periods: if headers contain a period, their ID must be overridden to remove the period, as Pandoc will otherwise generate invalid HTML IDs!

      So a header like `# Gwern.net`{.Markdown} must be written `# Gwern.net {#gwernnet}`{.Markdown}.
    • Manual header numbering: if the header is a number, Pandoc’s auto-ID algorithm will (surprisingly) delete it and replace it with section etc.

      So a numeric header like a year must be written like # 2026 {id=2026} to ensure the expected ID #2026 exists.

  • Surveys: when reporting results from a survey I’ve run, they should follow a rough flow of: “Survey Design” → “Survey Questions” (quoted instrument) → “Results” → “Interpretation”.

    Along with the raw survey data (usually a CSV), the results of each question should be reported with the item.

    If possible, provide a text visualization like a Unicode sparkline (‘▁▂▃▅▇’, like spark or termgraph). or progress bar, eg. \[████████▒▒▒▒▒▒▒▒▒▒▒▒ 40%\] (20-block scale, 5% per block, using ‘█’/‘▒’).

  • Images: do not require a **Figure N** or an alt-text (which are written using Pandoc’s attribute syntax like ![](/foo.jpg){alt="..."}, and may contain inline HTML, supported by image-focus.js).

    • Caption text (Pandoc figure caption) goes in ![].

    • Accessibility alt attribute optional: but if used, goes in {alt="…"}.

Essay Metadata

Markdown essays must start with a YAML Pandoc metadata header; this is only one YAML metadata header per file. Validation of enumerations & per-page uniqueness is done in hakyll.hs during compilation.

The order of all fields is: title, author, description, thumbnail, thumbnail-text, created, modified, thumbnail-css, status, confidence, importance, css-extension.

Mandatory fields:

  • title: sets the page title and the first <h1> header; written in simple inline HTML (eg. italics, smallcaps, subscripts/superscripts, but not bold or links/footnotes); <13 words.

  • description: short (20–650 characters) inline-HTML summary of the page; level of detail should be in between the title and the abstract.

    Lightweight “blog” posts (path starting with /blog/) are exempted from the description requirement.

  • created: “YYYY-MM-DD”; must be after “2008-01-01” and before tomorrow.

  • status: enumerated list of writing completion (“finished”, “in progress”, “draft”, “notes”, “abandoned”, “obsolete”)

Optional fields:

  • author: comma-separated list of authors, same format as annotations.

    If not set, the author is assumed to be “Gwern”. If it is someone else, it may be a good idea to include an explicit in-page byline, using <div class="text-center">by author</div>.

    Authors should have a homepage/profile URL defined in Config.Metadata.Author. AI authors should be listed as well as human authors; unless a piece is ~100% AI-generated (and this is clearly noted), then the lead human author should be listed first, as they are responsible for the page as a whole.

  • modified: “YYYY-MM-DD”; must be after “2008-01-01” and before tomorrow.

  • confidence: a single extended Kesselman estimative word

  • importance 0–10

  • css-extension: space-separated HTML classes which will be substituted in per page

    These are used to style an entire page and control things like the page dropcaps, dark vs light vs holiday theme, etc.

    Example of a field: css-extension: dropcaps-cheshire reader-mode would make a page use the Cheshire Art Deco dropcap while turning on reader-mode by default to reduce visual clutter.

    Examples of values: dark-mode dropcap-not dropcaps-cheshire dropcaps-de-zs dropcaps-dropcat dropcaps-gene-wolfe dropcaps-goudy dropcaps-kanzlei dropcaps-yinit extract-not index reader-mode test-april-fools-2024 test-april-fools-2025 test-april-fools-2026 test-christmas test-easter test-halloween toc-not

  • thumbnail: absolute image path (eg. /doc/cs/shell/2024-01-17-cmatrix-matrixstylescreenscroll.png); local image must exist. The thumbnail is used in social media previews, annotation popups of a page, and may be automatically displayed in the page abstract to add flair.

    • Thumbnail reuse: it is common (esp. on poetry/fiction pages) to use the same image as both thumbnail and an in-body full-width figure. In that case, reuse the thumbnail-text as the img title= (or otherwise ensure they stay consistent).

  • thumbnail-text: inline-HTML caption text for the thumbnail (displayed in link previews/popups).

    May be lengthy and include hyperlinks etc. Desirable but optional.

  • thumbnail-css: CSS classes applied to the thumbnail image (eg. .invert-not for images that shouldn’t invert in dark mode, like thumbnail-css: invert-not outline)

  • placeholder boolean “True”/“False”

  • index: boolean

  • error404: boolean

  • backlink: boolean

Example YAML front-matter, based on this page:

​-​--
title: "Manual of Style"
description: "Style guide documentation of Gwern.net writing conventions for essays and code."
thumbnail: /doc/ai/nn/transformer/gpt/dall-e/4o/2026-01-07-gwern-gpt5-thevelveteenrabbit-velveteenaishoggoth-simplifiedforthumbnail.png
thumbnail-text: "The Velveteen Shoggoth: if a boy loves a shoggoth long enough and hard enough, can it turn into a <em>real</em> rabbit...?"
created: 2025-05-07
modified: 2026-01-14
status: in progress
confidence: certain
css-extension: dropcaps-kanzlei
...

HTML

  • Big-endian naming: Attributes or classes are named in left-to-right general → specific style for easier tab-completion and memory.

    They are usually written as div/span elements.1

    Hence, there are eg. link-live, , link-icon, and icon-not classes: they pertain to a ‘link’, specify some attribute (whether the original URL can be displayed in a popup, or if it has a link-icon), and can be overridden the same way (eg. to disable a link-icon, [foo](bar){.icon-not}).

    • -not suffix: There is always an exception, so custom classes can usually be negated with a -not suffix. (Like tags or URLs, pluralization is discouraged.)

    • The master list of Gwern.net CSS classes is kept in the html_classes_whitelist lint variable of sync.sh.

    • When there is conflict, the most specific metadata wins. If a link is on a blacklist/whitelist specified in the site-wide configuration files (the Config.* hierarchy in the Haskell source code), an attribute on a link overrides it. So if an URL is on the site-wide live-link blacklist, putting a .link-live class on a specific <a> will override the blacklist and make it a live-link.

  • Page metadata:

    • created” refers to when I had the core idea or wrote the first version, which may be when I wrote a comment on social media and not when the Gwern.net page first appeared.

    • the (last) “modified” refers to the last major modification of an essay, such as to add a new section or appendix.

      It does not cover minor modifications like updating broken links or adding references or paragraphs, or fixing minor errors.

      If there is a major error or obsolete material, it should be explained where it happened, possibly stored in a footnote or collapse. It may be useful to strike-through the original text.

    • importance tags are 1 integer 1–10; see that page for proper usage. (Infrastructure pages like indexes are 0.)

    • status”: rough ordinal of completion of an essay. Self-explanatory. (currently: “finished” “in progress” “draft” “notes” “abandoned” “obsolete”)

    • confidence: how confident I am in the contents broadly, overall; must be a Kesselman word

    • auxiliary links section: pages can have a special appended triplet of backlinks/similar-links/link-bibliography sections

  • All essays should begin with an abstract, which is a div.abstract containing a blockquote. These are critical for popups & similar-link embedding recommendations.

    • Appendixes ought to begin with an abstract as well.

  • HTML5 output: should pass W3C Validator, except for some warnings/errors which must be ignored; currently, you should ignore:

    no footnotes section header (Pandoc-ism) no alt caption on images (currently too much work to manually add to the thousands of current images, although I hope that LLMs will soon be capable of affordably adding acceptable alt-captions) “Consider using the h1 element as a top-level heading only” (Pandoc-ism)

  • Self-documenting: all elements should be either readable as text, or have useful metadata when interacted with (eg. tooltips on metadata fields by setting a title attribute either directly or using a span/div wrapper.)

  • Divs are written in both Markdown & HTML using raw <div> HTML elements, because the ‘native’ Pandoc syntax is ugly and dangerously finicky; spans should be written in the Pandoc [foo]{.class ...} syntax in Markdown, and as <span> in HTML

  • Collapses are good to use on entire sections which are a digression and appendix-like (eg. # Appendix {.collapse}), on large blockquotes or lists, drafts or alternatives, or on transcluded annotations. They are often an alternative to a giant footnote or an appendix.

    Collapse elements are a div.collapse/span.collapse wrapper. They are a superior implementation of the <details> disclosure element.

    By default, the entire element is collapsed; to define what gets displayed while collapsed, use .abstract-collapse. To show that only when collapsed, and make it disappear when uncollapsed, use .abstract-collapse-only. (This can be useful when you want to show something other than a prefix, like a heavily edited summary or excerpt of an entire collapsed region, rather than just the first few sentences.)

    Almost every block or inline element can be collapsed in the same way: sections, blockquotes, tables, code blocks… Due to historical reasons, Pandoc allows directly setting classes on only some elements, like sections, but not on others, like blockquotes; but the class itself remains the same, whether it’s set directly on the element or on a div-wrapper.

  • Link metadata: links optionally encode the basic metadata of title/author/year into the title attribute of a URL (eg. [foo](/bar.pdf "'Title', Surname Year") creates an <a href="/bar.pdf" title="'Title', Surname Year">foo</a>, which with JS-disabled, will on mouse-hover show a tooltip of 'Title', Surname Year.) Optional because this is unnecessary for many links, where it is irrelevant or generated automatically like interwiki links.

    This is not because I want readers to see it (except as a fallback) but because it makes it easier to edit the Markdown source if I have the title right there to jog my memory, particularly for more opaque or unfamiliar URLs. (It is presumably also useful to LLMs, who may not have memorized a given URL.)

    This is usually downstream of annotating links, as link-titler.hs automatically rewrites Markdown & HTML to insert the title attribute when it can. The title attribute will override any annotation title, and so can be overloaded for other purposes; for Twitter tweets, as they are short and don’t have a “title” per se, one may just archive the tweet text into the title.

  • Lint/Rewrite overrides: Gwern.net relies heavily on lints and automatic rewrites to maintain consistency & correctness. This can misfire, especially when writing a document like this one which deliberately includes errors or deprecated examples. These matches can usually be disabled by inserting an invisible Unicode character like ZERO WIDTH SPACE.

  • Tags: slashes on self-closing tags are never used in HTML5 (with the exception of SVG, because that is XML), particularly <img>, <br>, and <hr>.

    They were apparently a holdover from now-obsolete XHTML, and are meaningless in HTML5 (where, sans slash, they are now just “void” elements). Besides triggering validation warnings and leading to inconsistent syntax where there might be (at least) 3 ways to write an element, self-closing tags kept causing mysterious sporadic problems in the overall Gwern.net stack, like a self-closed <br> would somehow turn into two of them, or horizontal-rulers would break entirely. They should never be used.

    • Pandoc definition lists are never used. I have not found any use-case for them on Gwern.net; annotated links, lists and transclusions work well.

  • Backlinks/similar-links/link-bibliography: these are all automatically generated.

    In the case of backlinks, a backlink can be disabled at either the link level (.backlink-not) or at the page level (backlink: False in the YAML metadata).

    This is useful for pages that serve as aggregators, changelogs, or indexes to prevent them from cluttering the “Backlinks” section of other pages, or when the discussion of a page is irrelevant to readers of that page—for example, in this Style Guide, we often link to an example essay which happens to use a feature, but no reader of that essay would care about that. (It is also helpful for when we are redirecting multiple IDs, which would yield useless backlinks.)

  • Markdown source is available for essays, using the extension .md. (eg. this HTML page, /style-guide, has its Markdown source at /style-guide.md).

    These Markdown sources are also served by default to agents which specify that they prefer Markdown over HTML in their Accept headers (example: curl --follow --include --header 'Accept: text/markdown' "https://gwern.net/archiving").

  • LLM outputs/conversations formatting:

    In presenting output from LLMs like ChatGPT, the Gwern.net convention is that user inputs/prompts are written in bold, and LLM outputs are left in roman. Common repeated parts of prompts or outputs are marked by ellipses (so multiple responses to a single prompt are denoted by a bolded ellipsis and then the response). For long transcripts you don’t expect readers to read, they can be presented in a collapsed code block. If you must include literal triple-backticks inside a Markdown fence, insert zero-width spaces between backticks.

    Because of the rapid development of AI, outputs should include exact dates.

  • Miscellaneous:

    • .display-not: hide an element, such as a hyperlink.

      • .desktop-not/.mobile-not: selectively hide an element on different sized screens

    • reader-mode-not: hide an element inside reader-mode

      Especially useful for things like annotated poems/fiction where we provide a clean ‘pure’ reading experience by default by setting a page-level reader-mode variable and remove parts of it which would distract from reading.

    • inline icons: some SVG icons used in the site UI can be included explicitly in text as empty spans with the corresponding icon-CSS class, like []{.icon-single-white-star-on-black-circle}.

      These icons may be site controls, like span.reader-mode-selector-inline, which provides a clickable toggle widget to set reader-mode to on/off/auto-mode. (This is redundant with the floating theme togglebar but allows us to explicitly tell the reader about it, which is useful on link-heavy pages where readers are especially likely to want to use reader-mode but not know about it.) T-

    • .: disable popups on a link

Transclusion

A signature Gwern.net site feature is a rich set of client-side transclusion primitives (see transclude.js), which allow copying into the current page an almost arbitrary set of other pages or parts of pages or metadata about pages. This allows avoidance of repetition, and is tightly integrated into the popups, backlinks, and local archive features. (This is a major Gwern.net design pattern used to build up many things such as popups of annotations—which lazy-load the document, and auxiliary links.)

They are “lazy”, and done client-side in JS to allow arbitrarily large deep amounts of recursive transclusion, including loops. They are written as HTML classes on a link.

Use-cases:

  • DRY of boilerplate or repeated text

  • migration of parts of pages: transclude the annotation to summarize what used to be there, while automatically redirecting readers to the new location

  • precise range citation of a specific part of a sentence, or paragraph, or section, by defining a span/div wrapper with an ID, and linking or transcluding that ID

  • reader-friendly longform: including a large text blob in a reader-friendly way by transcluding it inside a collapsed region ([text URL]{.include .collapse})

  • combining auto-generated pages with hand-written pages: the auto-generated page simply transcludes the path of a hand-written page. (Example: tag-directory indexes like /doc/foo/index will transclude a /doc/foo/abstract, if it exists, to summarize or describe the tag.)

Common uses:

  • .include: copy everything at the URL in; this is scoped to the ID (eg. a section)

    • .include-strict: perform the transclusion as soon as possible, non-lazily; typically useful as a performance optimization, to ensure readers don’t have to wait, or to ensure that a link target ID exists

  • .include-annotation: transclude the annotation as a block, with its metadata header, excerpts/abstract/commentary, etc.

    Especially useful in bibliography-like pages.

URLs

Site essay URLs follow the ‘slug’ pattern: one or two alphanumeric hyphen-separated keywords, avoiding plural endings to reduce ambiguity (eg. /sidenote, not /sidenotes). HTML essay pages are “cool URIs” which have no extension. Conventions:

  • “Graveyard”: pages which contain ‘outtakes’, ‘failures’, ‘prototypes’, ‘rough drafts’ etc. of a final polished product; they are named with the -graveyard suffix, so /foo-graveyard for /foo (eg. /face-graveyard records failed attempts at generating the successful StyleGAN anime faces in /face).

  • “Lorem”: prefix of pages testing out site functionality, split due to size & browser stress

  • Topic groups: directories for organizing essays by theme (eg. fiction haskell newsletter nootropic review sicp zeo). Generally self-explanatory, but note that newsletters follow a strict /newsletter/YYYY/MM.md naming convention with an optional /newsletter/YYYY/13.md for annual newsletters.

    Or by document type: blog doc note. (Blogs are short posts intended to be easier to write than a full essay page; the document hierarchy encodes the hierarchical tag system & all hosted files, and notes are currently something of an atavism, which are deprecated in favor of more sophisticated use of blogs or tags with transcluded abstracts.)

All URLs should be fulltext links if humanly possible.

URLs should link to the most relevant anchor inside a URL. In the case of PDFs, one should link to a specific page using an anchor ID like #page=n.

URLs written in foreign languages should be identified with a language code (eg. ).

Files follow the naming pattern YYYY[-MM[-DD]]-surname[-description][-nth].ext. (This is memorable, predictable, short, and sorts well on the command line.) If there is no surname, it uses the closest available entity name; if there is nothing at all, “anonymous”. If there are collisions, they are disambiguated by tacking on a count: eg. 2026-foo-1.pdf vs 2026-foo-2.pdf. For more complicated documents, such as books or generated images, it can be worth encoding descriptions or titles; like a book is better named 2026-foo-title.pdf to make it more findable. For image files, given how hard they are to work with or refind later, it is best to specify a lot of data, like an author (and/or tool), exact date, and description (eg. 2025-01-01-gwern-gpt4o-frogmeme-description.png).

File formats should be on the file type whitelist. We are conservative in allowed file formats; images should be JPG/PNG, avoiding WebP/AVIF (see Image.hs); documents should be PDF and not DjVu; archives should be XZ-compressed tarballs, etc. Large-files >250MB are specially supported. All file formats should have a link-icon. (The link-icon test page doubles as the file type whitelist.)

Image files are compressed automatically.

Inline

  • Dashes: I require correct use—hyphens for regular spelling, en-dashes for ranges, em-dashes for comments. (Em-dashes are not space-separated.)

  • Dropcaps: dropcaps are chosen by topic (tests):

    • Dropcats: cat

    • Goudy: biological

    • Cheshire: literary

    • De-Zs: non-technical or general articles

    • Kanzlei: technical

    • yinit: highly technical

    • Gene Wolfe: Wolfe-fiction-related essays

    When a new essay is written and the dropcaps set picked, the list of current uses of that dropcaps set (stored in the dropcaps page) should be consulted, and the first paragraph (after the abstract) rewritten to try to use an unused dropcap letter. (Or if that turns out to be difficult because the unused letters are rare ones like ‘Z’, at least avoid the most overused letters.)

    Do not start pages with quotations or numbers if a dropcap is desired.

    Dropcaps are set on a page-level with dropcaps-$THEME (eg. css-extension: dropcaps-yinit), and on a per-block basis with dropcap-$THEME (eg. <div class="abstract-collapse dropcap-yinit">); the latter overrides the former. (Dropcaps are always available in both versions.)

  • Currency/Inflation: all currency amounts should be written with American decimal notation (eg. $1,234.56).

    Dollar/₿ prices or values should be inflation-adjusted if they reflect some real transaction or amount, rather than being placeholders, especially in any historical context. This includes in quotes, as the original nominal amount is still available to the reader.

    If you buy something for ‘$1’ in 2026, it should be written [$1]($2026) to be appropriately adjusted for the future inflation by Inflation.hs; whereas if it’s describing an economics or thought experiment and is just an arbitrary unit of value, it should be left as-is. (It would be confusing if 10 years from now, an essay asked you to imagine Omega flying up to you and offering you ‘$1.21’ if it correctly predicts your response…)

    Due to the extreme volatility, ₿ amounts must be written with a specific YYYY-MM-DD date like [₿1](₿2026-01-01). For details, see Inflation.hs/Config.Inflation.hs.

  • Exclamation: instead of writing ‘?​!’, I condense to an interrobang.2

  • Links: the first instance of any term or citation should be hyperlinked.

    All later uses should be unlinked, or they should link to the first one’s anchor. (This is usually unnecessary, but in large essays or ones with relatively independent sections, this may be helpful.)

  • Lists: the start of a list item is capitalized. List items should begin with a label: a colon-separated keyword or phrase which can be emphasized.

    Inline lists can be comma-separated, or optionally use a BULLET ‘•’ point.

  • margin notes: very short summaries.

    Our ‘margin notes’ are a custom kind of sidenote, which are typeset in the left margin without a number, or left italicized inline. (They are enabled in annotation as well as essays.) They summarize a paragraph, but are not used for asides or digressions like sidenotes/footnotes. (This is why they are left of the paragraph they summarize.) If there is more than one margin note, they are also copied to an indented list at the beginning of the section to serve, Victorian-style, as a micro-table of contents.

    Conceptually, they are like a deeper level of headers; HTML headers only allow for 6 levels (h1h6), and this is not always enough, especially if one wants to summarize each paragraph. (For example, if this were not a list item, the phrase “margin notes” would be an acceptable margin note.) So our span.marginnote class allows taking a 1–3 word phrase (longer typically looks unnatural), and setting it in the left margin or leaving it inline and italicized.

    If a section has only 1 paragraph (even if it is a long complex paragraph), a margin note should not be used, because the section header should already cover it.

  • Math:

    • HTML inline math: inline equations are written using pure HTML/Unicode/CSS: many math glyphs are already available in Unicode

      • LaTeX equations are auto-converted using the script latex2unicode.py, which has a comprehensive set of rules & examples.

        If latex2unicode.py refuses to convert an expression, then it probably should not be converted. Complex block equations, particularly horizontal lines, are not currently supported (but may be possible at some point using raw HTML <table> with rowspan/colspan, eg.).

      • simple Markdown super/subscripts: use the standard Pandoc ^superscript^/~subscript~ syntax;

      • complex HTML super/subscripts: like a superscripted variable over a subscripted variable, can be done in custom CSS.

        I defined a span.subsup which does this. To use that, simply write <span class="subsup"><sub>Bottom</sub><sup>Top</sup></span>. (We write the subscript first to reduce the risk of Pandoc misinterpreting it as a footnote.)

      • multiplication: use MULTIPLICATION SIGN ‘×’ for multiplication in arithmetic; MIDDLE DOT ‘·’ for contexts where there may be an x variable (eg. not 𝒪(n × log n) but 𝒪(n · log n)).

      • division: use FRACTION SLASH for compact vulgar fractions (eg. “7/11 is a food chain” vs “7⁄11 vaccinated mice survived”)

      • Logotypes: use LaTeX/TeX logotypes (written as <span class="logotype-latex">L<span class="logotype-latex-a">a</span>T<span class="logotype-latex-e">e</span>X</span>/<span class="logotype-tex">T<sub>e</sub>X</span>). Compounds are written the obvious way.

  • Ordinals: the extension (eg. ‘th’, ‘nd’) is superscripted (not “1st” but 1<sup>st</sup> or 1^st^)

    Note that some uses of ordinals could be replaced by our ‘progress indicators’, but they are difficult to write by hand and generally better left to infrastructure.

  • Smallcaps: smallcaps are written using a span.smallcaps class (similar to Pandoc’s default CSS). We prefer Markdown syntax where possible, like [Smallcaps]{.smallcaps}. (Note that smallcaps requires some lowercase letters or else it is pointless, as uppercase smallcaps = uppercase, and so one should never smallcaps an all-uppercase string.)

    They are used for emphasis as the third level, and in the site UI like styling some levels of headers.

    The first line of the first paragraph of an essay is set in smallcaps for style; if this is undesirable, it can be explicitly disabled using .smallcaps-not.

  • Wikipedia links: should be written using the Interwiki.hs !W shortcut syntax: ie. not [George Washington](https://en.wikipedia.org/wiki/George_Washington) but [George Washington](!W) or <a href="!W">George Washington</a>.

    The WP article is inferred from the anchor text, and is overridden in Markdown/HTML by specifying the link title instead, like [President Washington](!W "George Washington"). Never repeat the target or use a redundant full WP URL (ie. do not write [George Washington](!W "George Washington") or [George Washington](https://en.wikipedia.org/wiki/George_Washington) but just [George Washington](!W)). The target may or may not be URL-encoded. (Only a few interwiki targets beyond English Wikipedia are supported: !Hackage, !Hawiki, !Hoogle, !Wikiquote, !Wiktionary. All other wikis or Wikipedias must be linked normally.)

    The frontend JS code automatically handles annotations for WP articles by calling the WP API.

  • Poetry: inline poetry is formatted using slashes and the span.poem class. See § Poetry for details.

    Note that poetry is permitted to violate all regular style rules if necessary, as long as the violations are documented as intentional & whitelisted in a comment.

  • Annotations:

    You cannot link an ID inside an annotation. If you need granular addressing of an annotation, see the annotation anchor trick.

    The annotation anchor trick: An annotation design pattern is to provide multiple single-topic annotations for the same URL, rather than one large multi-topic annotation. For example, a PDF could be annotated repeatedly, with a different page each time, like /doc/foo.pdf vs /doc/foo.pdf#page=10 vs /doc/foo.pdf#page=15 (adding a section title to disambiguate, like “Title” vs “Title § Methods” vs “Title § Conclusions”); web pages likewise can be annotated using different anchors. The annotations can of course link each other, or transclude each other (using .include-annotation-core to avoid repeating the metadata header), possibly in collapses, so one could have a ‘master’ annotation of the naked URL and then transclude in 3 sub-annotations, as it were (eg. Raphelson1980).

    Sometimes there is not a useful ID already; for documents hosted on Gwern.net, they can just be edited to include anchor IDs as necessary, but for other URLs (eg. web pages completely devoid of IDs), we can simply create a fake anchor ID for each separate topic, and annotate those. (This may create false positives when checking links, but oh well.)

Block

  • Admonitions (demos): admonitions (sometimes ‘callouts’ or ‘pullquotes’) are intended for warnings or alerts where simply bolding some text won’t do. They are a div.admonition [tip/note/warning/error] wrapper around a <p> and possibly a div.admonition-title.

    Fully-written-out example (avoiding ‘native’ Pandoc div syntax as usual):

    <div class="admonition tip">
       <div class="admonition-title"><p>Tip Title</p></div>
    
       <p>Tip.</p>
    </div>
  • Epigraphs: are a div.epigraph wrapper around a blockquote.

    The blockquote is italicized; the text is not normally wrapped in double-quotation marks (unless a dialogue) because that would be redundant with the fancy CSS ‘quotes’ around the epigraph as a whole. The optional final paragraph is roman, and is usually the attribution of the quote; this is denoted by an em dash. Example:

    <div class="epigraph">
    > Fourscore and 7 years ago...
    >
    > ---Abraham Lincoln (1863)
    </div>

    The attribution is usually the author, full name or surname, and then a source & year in parentheses. But this can vary for effect—sometimes it will be funnier to attribute it to a character instead (in which case I put the author in the parentheses).

  • Footnotes/Sidenotes: the same thing, chosen based on responsive design. Standard Pandoc Markdown syntax for footnotes like [^id]: Content. or ^[Content.].

    The ID should be be usable as a descriptive human-readable HTML ID—a lowercase alphanumeric hyphen-separated phrase similar to the URL slugs, and should meaningfully summarize the context. (They should not be mere numbers.)

    They are not used for simple citations, as that is better handled by linking a fulltext URL + annotation. They are used for detailed citations (eg. translation), multiple citations, complex citations like excerpts, and digressions or tangents. Length-wise, they should be less than 200 words; anything longer is better refactored into something else (an annotation, a collapse, an appendix…).

    Block footnotes, where the footnote body is defined separately, are usually located immediately after the Markdown element they are used in, for easier editing. (They are not grouped at the end of the Markdown document.) If at the end of a sentence, they are placed after sentence-ending punctuation and not before.

    Footnotes are sometimes worth linking. Unfortunately, Pandoc chooses not to use the ID to define a linkable anchor, due to concerns about creating collisions. In that case, they can be linked by defining an empty span with the ID, like These IDs, like all other IDs, must be unique.

  • Paragraphs: relatively long paragraphs are preferred compared to the usual Internet social media/blog writing style of 1 sentence per paragraph.

  • Lists:

    • Ordered vs unordered: if a list might be referred to by number/position, then it should be ordered. If not, it should be unordered.

      However, even ‘unordered’ lists should be as ordered (or ‘seriated’) as possible. There may not be a canonical ordering, but almost all lists can be put into some order more meaningful than a randomized shuffle: similarity, descending order of importance, or even just alphabetically!

    • Columns: if a list is composed of >6 items which are ‘short’ (maybe <30 characters), then it is a good candidate for formatting as a multi-column list which will wrap as 2 columns. This is a div.columns wrapper.

  • Emphasis: nesting level, especially in unordered lists where keywords or phrases will be emphasized, is indicated by a 3-cycle: strongitalicsSmallcapsstrong

    I prefer to use Markdown bold & italic syntax.

  • Abstracts: an abstract is a div.abstract which contains a blockquote summarizing a section or a page. There may be multiple abstracts on a page, especially for appendixes.

    All abstracts should be broken up into multiple paragraphs. They should try to follow the standard scientific writing of ‘background, data, methods, results, conclusion’. (paragraphizer.py attempts to do this automatically using an LLM.) They may contain additional block elements like lists or nested blockquotes or admonitions.

    Essay abstracts power the annotation of their URL, as they get scraped monthly and turned into an annotation. If this is undesirable, set .scrape-abstract-not on the abstract.

  • Images: Use <figure> elements.

    • caption format: start with a bold ‘Figure’, then the 1-sentence summary is italicized, and a linebreak (a <br>)3 separates the detailed description. The detailed description has parenthetical labels A–Z, which are italicized. (And if further emphasis is necessary, then, following the bold/italic/smallcaps convention, smallcaps is used.) So a Markdown caption of a paper’s “Figure 1” might go like this:

      Since many image captions are copied from a document and are not necessarily what I would have written, it can be a good idea to include a title attribute describing the image. (Both caption and title will be displayed by image-focus.js when the reader zooms in on an image, so neither is wasted.)

      Figures are not usually named or numbered in essays, and so do not need a **Figure N** prefix.

    • layout: images can be laid out with .width-full, .float-right, and .float-left. They can also be collapsed.

      Full-width images are useful for decorative illustrations, or highly-detailed images (eg. scientific paper figures might have 10 figures packed into a single one).

      Floating is useful for smaller images; usually, in accordance with the left-to-right pattern, images will be floated-right. If there are multiple images, they may zig-zag right/left/right to avoid ‘stacking up’.

    • Dark-mode: inversion during dark-mode is controlled by InvertOrNot.com by default, but it can be overridden by specifying .invert (eg. black-on-white line art) vs .invert-not (eg. photographs, color art).

      (Time permitting, explicitly mark images with .invert/.invert-not, as this is more reliable & saves network requests/latency.)

    • Borders: Outlined by default.

      They can be manually outlined or not outlined using .outline/.outline-not (which can be important in dark-mode or for .width-full decorative images).

    • Navigation: images can be click-to-zoom and automatically viewed in a ‘carousel’ by image-focus.js; no additional metadata is necessary.

  • Video: Pandoc does not have any video syntax, so it must be written in raw HTML, using the backtick syntax.

    Videos should usually avoid looping (unless clearly ‘GIF-like’), autoplaying, or loading the entire video (ie. default to preload="none"), and should provide controls; allowed video formats are MP4/WebM. They should include the width/height/aspect ratio (defined in a custom data-attribute) and a caption

    An example video of a statistical visualization, with looping enabled to help the viewer see the evolution from start to finish:

    ```{=HTML}
    <figure>
        <video controls="controls" preload="none" loop height="1080" width="1920" data-aspect-ratio="16 / 9">
            <source src="/doc/tea/gwern-tea-mineralwaters-bestarm-sequential.mp4" type="video/mp4">
        </video>
        <figcaption>Animation of mineral water taste-test showing how the posterior distributions evolve over <em>n</em> = 7 to <em>n</em> = 67, guided by Bayesian best arm sampling. MP4 testcase.</figcaption>
    </figure>
    ```
  • Interview: interviews are formatted specially to vertically align the speakers and indent the responses, and group the conversation by topic.

    They are div.interview wrappers, containing unordered lists of bold speaker-name / colon / quote (not necessarily strict Q&A), where <hr> horizontal rulers separate ‘topics’. Example:

    - **A**: Question 1?
    - **B**: Answer 1.
    - **A**: Commentary.
    
    ​-​--
    
    - **A**: Question 2?
    - **B**: Answer 2.
  • GTX metadata databases: annotations & metadata are stored in a custom line-delimited file format called GTX, which avoids drawbacks of YAML/JSON for writing many complicated HTML snippets.

    See GTX.hs for a detailed description of the syntax and the design rationale.

    GTXs are split by level of quality, for easier editing/revision-control: me.gtx (Gwern-written essays etc.), full.gtx (hand-curated annotations), half.gtx (mix of edited & automatically-generated), auto.gtx (fully automatically generated).

  • Math: complex block equations are written in LaTeX and typeset by MathJax. They are written in the normal Pandoc $$block equation$$/$inline equation$ syntax.

    In some cases, a block equation may, like many inline math equations, be feasible using pure HTML/Unicode/CSS; if they are (ie. the latex2unicode.py script can handle them and they are not complex nested fractions, integrals, matrices etc.), they should be done that way, as the pure approach has several advantages (it can eliminate the need to load heavyweight Mathjax CSS/fonts, looks more natural, reduces risk of long-term bitrot, is more searchable etc.)

  • Tables: Pandoc supports several kinds of Markdown tables. I usually use either ‘simple’ tables, or pipe tables; ‘grid tables’ having proven to be more trouble than they are worth despite an Emacs mode. (Generally, if one finds oneself rejiggering the whitespace in a simple table, it is time to move to a pipe table.)

    Ideally, all tables would be recreated in pure Markdown, but that is often too much work, or infeasible in the case of complex tables which may split columns etc. (eg. 1, 2); screenshots are permitted.

    Tables are usually given titles/captions; the Pandoc syntax is a blank line then “Table: …”. Table captions are full sentences with normal formatting. (We do not ever write <figcaption> captions in Markdown.)

    Layout: Pandoc supports simple table layout control like the relative widths of columns (# of hyphens in column header), and left/right/center alignment of each column (colon on left/right/both sides in header).

    We provide additional control: much like figure images, tables can be floated left/right, or made full-width; they can also be compacted with .table-small. Particularly in annotations, for a very small table, like a 2×2 table, it is good to use both and wrap it in a <div class="float-right table-small">. Example:

    <div class="float-right table-small">
    | 1 | 2 |
    |---|---|
    | 3 | 4 |
    | 5 | 6 |
    
    Table: A small 'inline' table writen as a demo for the Style Guide.
    </div>

    Tables are zebra-stripped by custom CSS, and sorting is done with tablesorter.js; sorting is disabled using .table-sort-not. They are generally not otherwise styled.

    Tables can be collapsed; the collapse will show the first few lines. If one wants to provide a summary or key lines, one can use .abstract-collapse-only.

  • Horizontal ruler: can be considered as an “anonymous section”, when we don’t want to write a title & add another Table of Content, or we have already gone uncomfortably deep, like 7-levels deep.

Poetry

Poems on Gwern.net are not formatted using block or code block tags, but using custom CSS classes set on div/span/pre tags. (For background on the design rationale, see the detailed “Poetry HTML Typesetting” writeup.)

Mobile does not receive any special treatment: mobile poems are rendered just like desktop poems, albeit with a narrow window.

Inline Poetry

Inline poetry is put into a span.poem.

This is usually used for quotations (eg. the famous poem “Roses are red / Violets are blue”).

They will be rendered in a different font (Nimbus Mono L), and the forward-slashes will be subtly faded out for readability.

Block Poetry

Simple Block Poetry

Our block poetry typography attempts to replicate traditional English poetry typography: poetry is rendered in a monospace serif font to preserve spacing alignment; paragraphs are rendered without indentation at the beginning; and if a line must be line-broken, the broken part is indented by the JS+CSS. (Smallcaps on initial lines are disabled if it’s a poem.)

A div.poem around poems where linebreaks are denoted using a backslash (which turns into a <br> element), and full stanzas are separated by a single blank line. Large ‘separators’ can be written as horizontal rulers (and are not treated specially inside a div.poem). Example:

<div class="poem">
**_Dear Santa_**, pray accept this urgent plea \
And burn your police reports regarding me. \
I write to clear my name of wicked lies, \
That cast me as a fiend in festive guise!
</div>

Backslashes must end their line

Pandoc Markdown only treats a backslash as the newline if it is the last character on the line. So lines which are annotated with comments must put all HTML comments before the backslash.

If a block poem’s line includes a space-separated forward slash (" / ")4, that is interpreted as an indented linebreak where the start of the next line is aligned vertically with the end of the previous line.

This is useful for representing caesuras, enjambments, half-lines, etc. An example:

<div class="poem">
So much / depends upon
...
</div>

Will render like this:

So much
        depends upon

If a line includes multiple forward slashes, then they are repeatedly indented and look like a ‘staircase’, like this:

<div class="poem">
For we can always see and feel much that the people in old photos and newsreels could not:

that their clothing and automobiles were old-fashioned, \
that their landscape lacked skyscrapers and other contemporary buildings, \
that their world was black / and white / and haunting / and gone.
</div>

Will render like this:

For we can always see & feel much that the people in old photos & newsreels could not:

that their clothing and automobiles were old-fashioned,
that their landscape lacked skyscrapers and other contemporary buildings,
that their world was black
                           and white
                                     and haunting
                                                  and gone.

For half-lines which are not meant to linebreak, such as many alliterative verse notations, the caesura mark is represented by a space-separated double-pipe (" || "); the caesura mark is, similar to the newline separator, faded to reduce its intrusiveness. (As a special-case, if div.text-center is used, the caesura marks will be vertically aligned.) The forward-slash linebreak and caesura mark can be used in the same line. Normal example:

<div class="poem">
> In they hacked them, || out they hurled them, \
> bears assailing, || boars defending. \
> Stones and stairways || streamed and darkened; \
> day came dimly— || the doors were held.
</div>

Special case illustrating vertical alignment:

Song is the gift || we give them back.
We crown with cadence || what we cannot keep.
The pact holds; || we pay in gold.
Count is a kind || of cold keeping.

It may be in blockquotes if not a standalone poem page, such as a quotation (it doesn’t matter whether the div is inside the blockquotes). Prefix lines with > only when you’re already inside a blockquote / footnote / list indentation; otherwise omit. A simple example:

> <div class="poem">
> There are strange things done in the midnight sun \
> By the men who moil for gold; \
> The Arctic trails have their secret tales \
> That would make your blood run cold; \
> The Northern Lights have seen queer sights, \
> But the queerest they ever did see \
> Was that night on the marge of Lake Lebarge \
> I cremated Sam McGee.
> </div>

.poem is compatible with epigraphs (.epigraph). So this will work (complicated example):

<div class="epigraph poem" id="example-poem-2">
> Roses are red <!-- 4: RO1-ses2 are3 RED4. [red/—] --> \
> Violets are blue <!-- 5: VI1-o2-lets3 are4 BLUE5. [blue/you] --> \
> And I love you. <!-- 4: and1 I2 LOVE3 YOU4. [blue/you] --> <!-- NOTE: foo -->
>
> ---[Anonymous](!W "Roses Are Red")
</div>

Complex Block Poetry

Some poetry requires unusual white-space, like concrete poetry or calligrams. For poems where exact whitespace is required, it can be written in raw HTML using a <pre class="poem-html">.

These HTML blocks must be escaped completely from Pandoc processing, which will eat all whitespace even within HTML elements, using the Pandoc raw_attribute extension (ie. enclosing it in triple-backticks). These must be fully written in raw HTML; no Markdown interpretation will happen.

When rendered by the client-side JS, the horizontal space taken up by tags like <em> or <a> is replaced by whitespace, to make authoring easier and closer to WYSIWYG. So for example, in order to line up an italicized word underneath another word, the two words must line up vertically (aside from the <em> tags); eg.:

```{=HTML}
<!-- NOTE: `pre.poem-html` to force exact vertical alignment rather than enjambment: -->
<pre class="poem-html" id="complex-block-poetry-example">
Their hands remembered stillness in the air,
                                    <em>care</em>
...
</pre>
```

Will render with “care” directly below “air”:

Their hands remembered stillness in the air,
                                    care
...

So for whitespace overall:

  1. Use \ for normal lineation.

  2. Use " / " when you want a visual half-line break with indentation.

  3. Use pre.poem-html only when you need exact columns/geometry.

Miscellaneous Poetry Formatting

  • Scansion/prosody metadata: where feasible, poems should provide commented-out versions of lines which annotate their key metrical properties like rhythm, syllable count, or rhymes (either the rhyme-word or its line). As meters can vary greatly, the scansion line should be tailored to the poem, and preferably documented at the top in a comment. Some examples:

    • Pressure-cooker Pindaric: <!-- [Stress Count]: [SCANSION CAPS] || [SCANSION CAPS] [scheme/notes] -->

      Example: Gold is the wrought word, || god-gift to the world; <!-- 3+3: GOLD WROUGHT WORD || GOD-GIFT WORLD [g-g]; Apollo ref -->

    • Rhyming poetry: <!-- [Syllable count]: A1-b2 c3 D4 e5 f6 GI7-h8, i9 J10 k11 l12 M13. [A-rhyme/B-rhyme] -->

      Examples: Checking for ghosts with his rifle, he paced through the night, <!-- 13: CHECK1-ing2 for3 GHOSTS4 with5 his6 RI7-fle8, he9 PACED10 through11 the12 NIGHT13. [night/tight] -->

      before dawn,</p> <!-- 3: be1 FORE2 DAWN3 [dawn/dawn@46] --> (to highlight a repetition on line 46)

      In free verse, one might write [-/-] explicitly to denote no rhyme being tracked or irrelevant. These are useful for safe revisions and LLM documentation.

    An example which tracks the running count of syllables, the stress, and the end-rhymes: <!-- 10: this1 LAST2 PAIN3 for4 the5 DAMNED6 the7 FA8-thers9 FOUND10. [found/crowned] -->

    What is worth tracking will depend on the exact poem (eg. a haiku doesn’t need to track rhyme).

    (Unusual wording choices or deliberate typos etc. should also be documented inside HTML comments on the same line.)

  • Commentary: similarly, ideally, every full poem will include a detailed commentary (hidden in HTML comments), which explains the background, themes, formal meter, allusions and related poems/poets/schools, and interpretation of the poem.

    The ‘LLM meta-block’ is a short handoff note for editors; it is intended for LLM-editing-heavy pages, and doesn’t apply to handwritten or historical pages.

    Poetry/fiction may additionally include a longer ‘commentary’ HTML comment block (unbounded length) for scansion, allusions, revision rationale, etc. The commentary block is not subject to the ≤10-line limit.

    This should be placed in between the YAML metadata & abstract, for human/LLM readability.

  • Custom horizontal rulers: horizontal rulers are sometimes useful to separate stanzas or sections without section headers, and are usually written ---.

    The default Gwern.net rulers appearance is to cycle between the trio of Vergina sun → arabesque moon → Source-Serif-Pro-stars icons. It may be useful to hardwire a specific icon for thematic purposes, eg. to hardwire a sun with <div class="horizontal-rule-nth-1"><hr></div> like in “Silver Bird”. (Warning: in Markdown, escape the raw HTML using the backtick syntax!) The ruler icon type & rationale should be documented in a HTML comment.

  • Page-level appearance tweaks: Poetry pages benefit from setting several page-level CSS properties, which generally reduce clutter and can help set the right mood.

    • .toc-not: the Table of Contents is usually useless, sometimes overlaps text due to unfixable subtle CSS bugs

    • .reader-mode: mostly to suppress hyperlinks decoration in the poem text eg. commentary/annotation

    • .dark-mode: forces page into dark mode

    • .index: simplifies overall page appearance

    • .dropcap-not: explicitly document no use of dropcaps as they make poems a lot harder to read and are non-traditional. (See also .smallcaps-not)

    • .extract-not: For pages which should not pop up at all because it is a poor reading experience (eg. heavily hyperlinked .reader-mode or concrete poetry)

    • div/span.reader-mode-not: use this on abstracts or descriptions to show content in popups/previews but hide it on the (reader-mode) page itself, and span.reader-mode-disable-when-here to disable reader-mode at a certain point (eg. the end).

      Useful for artistic pages like poem pages, where we can enable reader-mode globally, then hide the abstract (while still keeping it for popups/similar-links extraction).

  • Section headers: usually deprecated as unnecessary and simple bolding preferred (eg. <p><strong>1. Section Title</strong></p>), but may be acceptable as long as .toc-not is used.

    • for Multi-part poems: Prefer a single div.poem containing the entire poem. Mark internal panels/sections with standalone bold lines (ie. **1. Title**) rather than Markdown headers, to avoid ToC pollution and preserve poem typography. Use div.text-center stage directions only when the poem is split into multiple .poem blocks.

  • HTML IDs: note that all the elements support the usual HTML capabilities like an ID, so poems can be directly linked (and popped up) if you set an ID on them—including inline span quotes. (This can be especially useful for detailed commentary where we might want to pop up a specific phrase in a poem.)

  • Collapse compatibility: all .poem elements are compatible with .collapse, so one can easily hide selected parts of a poem, or hide an entire poem (perhaps because it’s a variant)

  • Illustrations: To avoid distracting from the poem, illustrations are usually appended if not directly illustrating a specific stanza. They should be high-quality enough to be worth showing full-width, like .width-full .invert-not; but they do not necessarily need any title or alt-text or caption, as they usually exist just for pretty.

    If there more than one, they can be randomized to show a single one each page load to avoid overload, eg.

    <div class="display-random-1">
      <div class="display-entry">
      ![](…){.width-full .invert-not …} <!-- first illustration -->
      </div>
    
    </div>
  • Acknowledgments, topics, commentary, section headers, attribution lines etc.: Can be presented outside of the .poem using div.text-center (eg.).

    These are distinct from epigraphs: no quote styling, no attribution parenthetical, just a single centered line (optionally a blockquote for consistent spacing).

  • Inter-poem headings / stage directions: use div.text-center between multiple .poem blocks to label place/time/voice shifts.

  • Variant sets or lists: use an ordinary numbered list (eg.)

  • Colophon, discussion: Can be appended and collapsed by default (eg.). The general formatting is:

    1. div.collapse;

    2. span/div.abstract-collapse editorial; and

    3. a margin-note label like [Colophon.]{.marginnote .abstract-collapse}, like thus:

      <div class="collapse editorial">
      [Colophon.]{.marginnote .abstract-collapse} By …
      
      Full colophon text…
      </div>

Annotations

Annotations are typically the abstract of a paper, followed by excerpts. They sometimes have extensive commentary or editorial insertions (not always from myself), which are in square-brackets. These commentary may be mini-essays.

The annotation infrastructure is complicated and supports a mix of manual, semi-automatic, and automatic creation of annotations.

  • Automatic: WP

  • Semi-automatic: arXiv, BioRxiv/MedRxiv, some PDFs with abstracts available through Crossref, Gwern.net essays with .abstract abstracts

  • Manual: everything else

Annotations do not support sections or footnotes. Sections are replaced by horizontal-rulers (<hr>) and margin-notes. Footnotes are simply written inline inside square-brackets, and can be collapsed.

Large excerpts can also be collapsed.

In some cases, collapsing parts of an annotation is not enough—like when we want to link a URL in 2 different contexts, focusing on 2 different excerpts, so we cannot simply collapse either half. In this case, we can use a hack involving anchor IDs: we simply add two new IDs to the original URL, like /doc/2026-foo.html#bar vs /doc/2026-foo.html#baz, and we write separate annotations for them, because they are now technically ‘different URLs’, even though they load the same web page.5 In the special case of PDFs, when this happens, we can usually exploit the built-in page-number anchors to get more useful IDs by specifying the most relevant #page=N for the two different use-cases.

“Keywords” are removed from papers that provide them. While they may have been highly useful for librarians indexing documents in the pre-computer era, they are almost never useful now due to extreme inconsistency in the controlled-vocabulary across all authors/publishers—and even the small benefit they still provide to skimming is obsoleted by the fact that all of the keywords will usually be hyperlinked or bolded & jump out that way. (They can be left hidden in a comment to provide hints to an embedder, but otherwise are a waste of space.)

Bullet-point list summaries are used by a few academic publishers or authors to summarize the summary. Laid out normally, they take up a great deal of vertical space while being redundant with a properly-line-broken or topic-split abstract. We keep them, but we condense them to be a single inline list with BULLETs, and separate with a horizontal ruler.

Similarly, some publishers provide a ‘layman’ or ‘plain’ or ‘public’ abstract along with the scientific abstract. Those should be presented first, and separated with a horizontal ruler.

Code

  • Comments: should focus on the “why” and especially the “why not”, and not on the “how”.

  • Syntax-highlighting required: All code blocks should specify a language for syntax highlighting, using Pandoc syntax highlighting if possible. This includes document formats like diffs. (Even when something in a code block is pseudo-code or not strictly any language, the skylighting suite usually has something which will give useful results. If plain text is intended, use ~~~{.default})

    • Inline optional: inline code syntax highlighting (`code`{.LANG}) is optional, because it is often useless (given lack of context) or distracting

  • Show outputs but commented out: for interactive runnable code, like shells or REPLs, the output cannot usually be syntax-highlighted. So the output should be included but commented-out one-level deep; regular comments can be nested deeper, if permitted by that language’s syntax. For example, in Bash, we can prefix outputs with # and commentary with ##:

    $ echo "Hello world!"
    # "Hello world!"
  • Balance delimiters (brackets/parentheses/double-quotes): the default Emacs Markdown settings check that these delimiters are always balanced (using check-parens).

    This is important because it catches many Markdown/HTML syntax errors, but it naively includes code blocks or generated text samples, where some constructs do not and cannot balance (eg. Bash case syntax). To whitelist these cases, include a HTML comment with the matching ‘missing’ delimiters; in the case of double double-quotes, insert a zero-width space. In some cases, these may have to be inserted directly into the source code snippet.

  • Reproducibility mirage: Reproducibility of code is not a major priority because it is rarely useful and can require complex fragile pipelines and exorbitant resource usage (eg. archiving entire OS images) to achieve genuine reproducibility.

  • JavaScript: JS coding is left to Said Achmiz, who may write up a JS style guide at some point, so we will not cover it here.

  • Emacs Lisp: no lint warnings when byte-compiled

Bash

Gwern.net Bash scripts are Bash-only, GNU/Linux-only, well documented, and optimized for long-term maintenance rather than minimalism. Readability, explicitness, and debugging are key.

  • Shell: GNU bash, not POSIX sh.

    Arrays, associative arrays, [[ … ]], ${var^^}, process substitution, and exported functions are permitted.

  • Userland: GNU/Linux coreutils assumed unless otherwise documented.

  • Working directory: usually ~/wiki/static/build/ or ~/wiki/. Convert the absolute Gwern.net paths using the path2File/file2Path utility functions (eg. /doc/foo.pdf~/wiki/doc/foo.pdf)

  • Helper functions: utility functions are defined in bash.sh.

  • Error handling:

    • Scripts must fail fast, using set -e; fatal preconditions (missing tools, missing directories, bad arguments) must exit (script)/return (library) immediately with a clear message and a unique exit code.

    • Explicitly ignore errors: use || true, or a clearly-scoped set +e … set -e island to selectively run tests which may fail (see the wrapping idiom) Never rely on implicit set -e exceptions or on bash.sh enabling set -e.

    • set -euo pipefail is allowed inside contained helpers where all variables and pipes are controlled (eg. find_colliding_files())

      Do not enable globally unless every variable expansion has been audited. Each exit/return should use a different integer to make it easier to grep for provenance. (The integers do not need to mean anything, but it is helpful if they are in rough order of execution.)

    • Recoverable failures are logged as warnings (using the helper function red) and execution continues.

  • Flags, options, and command style:

    • Always use long flags: sort --unique, not sort -u.

      Where this might be cumbersome, like grep --fixed-strings, aliases can be added to hardwire long flags (eg. the grep DSL, gf/ge/gfv/gev/gfc/gec: g for grep, v for negation or filtering out, f/e for fixed vs regexp, and c for terminal coloring of positive hits)

    • Terminate options explicitly in any file-argument scenario to reduce risk: rm -- "$FILE".

    • Prefer explicit GNU flags over positional magic (--max-args, --recursive, --null, etc.).

  • Quote everything unless word-splitting or globbing is deliberate and documented.

    Gwern.net filenames are required to be whitespace-free, but it is unsafe to assume this, and filenames with whitespace should be handled.

  • Arrays over strings for lists.

  • Reading lines must be robust:

    while IFS= read -r line; do
        
    done

    Any temporary IFS modification must be tightly scoped and commented.

  • Variables & naming:

    • Globals / configuration: ALL_CAPS.

    • Locals: declared with local, grouped at function top.

    • Arrays: ("${array[@]}") when expanding.

    • Associative arrays: only when keys are semantically meaningful.

  • Aliases: preferred over functions if there is no handling of arguments; especially good for more ergonomic names or preventing typos (eg. alias pdfcut="pdf-cut")

  • Function definitions: single-use internal functions can be declared inside a function if that would read better

  • Haskell interoperability:

    Use runghc for scripts or ghci -e for one-liners.

    Note that you must import modules explicitly in one-liners: eg. ghci -e 'do { md <- LinkMetadata.readLinkMetadata; ... }'. Do not assume any module is implicitly imported.

Script Structure & Factoring

Large scripts are structured as:

  1. Metadata header

  2. Strict-mode setup

  3. Imports

  4. Constants / configuration

  5. Helpers

    Prefer small, named helpers over inline pipelines once logic exceeds one screen.

  6. Normalize inputs

  7. Main logic

Output & Logging

  • Status output is loud and skimmable.

  • Errors go to stderr.

  • ANSI terminal color: allowed via tiny helpers (bold for important information, red for errors, green/yellow for regular logging messages, etc.).

  • Long scripts print phase headers before major work.

  • For reporting warnings/errors in running lint passes, use our wrap + λ idiom:

    λ(){ ; }
    wrap λ "warning message"

    This is for more complex pipeline checks/lints: we define an “anonymous function” (reusing the name λ) and pass it to wrap. wrap executes the function, captures stderr/stdout, and prints a formatted warning if any output exists. (wrap does not hide errors, because we probably want to crash as fast as possible; and so if the anonymous function could throw a non-fatal error, it must be explicitly ignored.)

Dependencies & Pre-Flight Checks

  • Check dependencies first: All required external commands must be checked before real work begins.

    Use command -v to check that a necessary binary exists.

  • List all failures: in particular, dependency failure should list all missing tools, not just the first, so the user can efficiently install them in a batch rather than painfully looping one at a time.

Resource Friendliness

  • Waiting: Background jobs must be followed by an explicit wait at the end of the phase to avoid race conditions.

  • Timeouts: Background jobs may need a timeout wrapper—especially network-bound jobs (eg. timeout 5m git pull)

    As a rule of thumb, a web HTTP query should take <30s, a full web page download should never take <80s, while a more complex operation like a git filesystem operation should take <250s.

  • Avoid over-parallelism: too much parallelism can be both buggy and slower. So parallel/xargs must specify batching and concurrency explicitly.

    $N defines permissible parallelism count.

    Idiom note: define a Bash function, then export -f function, and now it can be parallelized with parallel

  • Least power: Scripts should drop resource priority if they are primarily background or maintenance, using nice/ionice/renice (eg. to lower your own CPU priority, renice --priority 19 --pid "$$", or to launch a new low-priority job, nice --adjustment=19 ionice --class 3)

  • Date-based conditional execution: use the everyNDays helper to run something every n days, avoiding the cost of doing so every time or cron jobs, but ensuring it does run occasionally

Temporary Files & Atomicity

  • /tmp/: Use mktemp/mktemp --directory in /tmp/ for scratch space. Do not use the home directory or dot-files etc.

  • Atomic writes: Write outputs to temp files, then mv into place for greater atomicity.

  • Catch: Use trap for cleanup in multi-step transforms. Set it like trap EXIT HUP INT TERM, and then unset upon success (trap - EXIT HUP INT TERM)

ShellCheck

  • Aim for zero ShellCheck warnings.

    Some warnings may need to be whitelisted.

  • Whitelist individually: Suppress narrowly, inline, with justification.

    • Whitelist imports: source paths are annotated for ShellCheck explicitly.

Inline Commentary Inside Commands

Commentary inside command invocations is permitted and encouraged for complex pipelines:

command \
    --option-1 \
    `# rationale for option-2` \
    --option-2 \
    `# --option-3  # TODO: re-enable after fixing upstream bug`

This is preferred over trailing comments when explaining why an option exists.

Bash Script Template

#!/usr/bin/env bash
#
# script-name.sh—one-line description of purpose
#
# Author: Gwern Branwen
# Date: YYYY-MM-DD
# When: Time-stamp: "<YYYY-MM-DD HH:MM:SS gwern>"
# License: CC-0
#
# Usage:
#   ./script-name.sh [OPTIONS] ARG…
#
# Notes:
# - Bash-only; GNU userland assumed.
# - Fails fast; recoverable errors are explicitly ignored.
#

set -e

cd ~/wiki/static/build/

########################################
# Imports
########################################

# shellcheck source=./bash.sh
. ./static/build/bash.sh

########################################
# Configuration
########################################

VERBOSE=0
DRY_RUN=0

########################################
# Helpers
########################################

usage() {
    cat <<'EOF' >&2
Usage:
  script-name.sh [--verbose] [--dry-run] ARG…

Options:
  --verbose     Extra logging
  --dry-run     Print actions without executing
EOF
    exit 2 # '2' because we might exit after successfully checking deps
}

## exit early with all missing dependencies to avoid wasting user time/effort:
require_cmds() {
    local missing=()
    for cmd in "$@"; do
        command -v "$cmd" >/dev/null 2>&1 || missing+=("$cmd")
    done
    if ((${#missing[@]})); then
        echo "Missing required commands: ${missing[*]}" >&2
        exit 1 # '1' because first place we would exit
    fi
}

log() {
    ((VERBOSE)) && echo "$@" >&2
}

########################################
# Argument parsing
########################################

ARGS=()

# simple parsing, no need for getopts
for arg in "$@"; do
    case "$arg" in
        --verbose) VERBOSE=1 ;;
        --dry-run) DRY_RUN=1 ;;
        --help|-h) usage ;;
        --) shift; ARGS+=("$@"); break ;; # safer calls by explicitly separating flags from parameters
        -*) echo "Unknown option: $arg" >&2; usage ;;
        *) ARGS+=("$arg") ;;
    esac
done

if ((${#ARGS[@]} == 0)); then
    usage
fi

########################################
# Main
########################################

require_cmds rsync sed grep

log "Starting script-name.sh"

for item in "${ARGS[@]}"; do
    if ((DRY_RUN)); then
        echo "Would process: $item"
    else
        process_item "$item" || true   # non-fatal
    fi
done

# disable warnings temporarily
set +e
optional_command1
optional_command2
set -e

log "Done."

Haskell

  • No GHC warnings: compiles with ghc -Wall -Werror

    • Should be as hlint-clean as possible

  • Lists: lists broken over >2 lines should be dangling-comma-prefix style for easier editing, as it makes cleaner line-level diffs and allows adding entries blindly, eg.

    a = [b
       , c
       , d]
  • Imports:

    • Enumerated imports: all imports should be fully-enumerated

    • Standard abbreviations: Data.Text is always imported as T, like T.pack:

      import qualified Data.ByteString.Char8 as C8 (...)
      import qualified Data.ByteString.Lazy.Char8 as LBS (...)
      import qualified Data.ByteString.Lazy.UTF8 as U (...)
      
      import qualified Data.Map.Strict as M (...)
      import qualified Data.Set as S (...)
      import qualified Data.Text as T (...)
      import qualified Data.Text.IO as TIO (...)
      import qualified Data.Vector as V (...)
      
      import qualified Debug.Trace as DT (trace)

Generative Media

Generative models like LLMs or image-generators are good servants, but as of early-2025, poor masters.

My policy for generative media is that generative model outputs must be improved.

Outputs of models should be clearly identified as such in the filename/caption, or body; they should be critiqued as right or wrong, and errors noted.

Random uncurated outputs should never be used, except in an explicit context of ‘random uncurated outputs’.

If text or images are being used, perhaps for illustration, they should be high quality & carefully considered. If they could have been generated in just a few minutes of prompting, and they have no clear connection to the essay or any depth, they should not be used at all. Ideally, they will look as good in 10 years as they do today, and they will not look blatantly dated or buggy. (The widespread use of cheap DALL·E 3 images on Substack, often barely a step up from Bing-style “Shrimp Jesus” images, are an excellent example of what not to do.)

In particular, they should avoid the artifacts and stereotypical mode-collapsed styles of the used model. There should be no visible artifacts like malformed hands or weird backgrounds that make no geometric sense. Style-wise, Midjourney samples should never feature “Instagram women”; GPT-4o images should avoid sepia/yellow color palettes; LLM output should avoid notorious tells like “delve” or the GPT-4o “em-dash twist ending”. Images should have a strong esthetic and tend toward monochromatic; if there is no specific color theme, it should be grayscale. (A HDR-esque ‘every color at once’ is one of the more reliable tells of a RLHF process dumbing a generative model down to the lowest common denominator.) You can fight this laziness with personalization, heavy use of high-temperature-like settings, getting illustration ideas from the most creative models like GPT-4.5, and stubborn insistence on fixing errors.

Files

File-naming:

  • Most to least important: a filename should autocomplete well. Filenames should respect autocompletion and predictability.

  • The general filename form on Gwern.net is YYYY[-MM[-DD]]-SURNAME[-TOOL[-TOPIC[-DESCRIPTION]]][-N].EXT. (Note: File extensions are mandatory).

    • Filenames should be all-lowercase Latin alphanumeric (ie. without whitespace or foreign characters). If feasible, use romanization like ß → ’ss; exotic character sets like Chinese/Japanese should usually be left alone unless there is a well-established romanization or translation. This makes it easy to tab-complete and grep filenames.

    • Descriptions: can otherwise be freeform, and may embed short English descriptions of the contents, or additional metadata like the original ID/hash.

  • Unique filenames: the base names of files should ideally be unique. So if there were two files like /doc/psychology/2026-wang.pdf and /doc/biology/2026-wang.pdf, one of them should nevertheless be renamed to 2026-wang-2.pdf.

    The incrementing N is zero-padded only as necessary, for uniformity. (So if there were 10 such Wang PDFs, they would all be renamed to zero-pad them like 2026-wang-01.pdf2026-wang-10.pdf.)

  • Unique file extensions: Where multiple file extensions are in common use, standardize on the majority (ie. .jpg, not .jpeg; .png, not .PNG).

  • Renaming: files should not be moved using mv, but moved using the custom gwmv script, which will handle file format conversion special-cases, global search-and-replace, and nginx redirects.

    Markdown essays currently cannot be moved with that script, and must be updated manually.

Filetypes:

  • Few: As few as possible.

  • Long-term: Emphasis on backwards compatibility.

  • >95% browser support: If relevant, it should have >95% global support on CanIUse.com. (This can include polyfilling.)

  • Audio: MP3, and not OGG Vorbis, must be used for Apple compatibility.

  • CSV: our preferred ‘spreadsheet’ or ‘tabular format’. CSVs must be UTF-8 and comma-separated, and must read cleanly into LibreOffice & R.

Renaming

Pages/files should be renamed freely for consistency & convenience.

  • File moves: To avoid stale references & 404s, use the gwmv script (in bash.sh) to move a file using git, rewrite all existing hyperlinks, and generate a nginx URL-level redirect.

  • Page moves: must be done by hand as gwmv does not yet cover Markdown pages

    • Sub-page moves: to do something like “split out a section to a standalone page” without breaking too many reader experiences, use the redirect-from-id class & data-attribute. (See Lorem Links for live examples, including redirecting multiple legacy IDs.)

      In the simple section case, simply write, for a page /foo:

      # Original Title
      
      [**See main article.**](/bar){.redirect-from-id}

      Now every reader who goes to the URL /foo#original-title will be automatically redirects to /bar. The URL can have an anchor (ie. one could redirect to /bar#baz).

      If the current section’s ID is not the hash-anchor ID which has been broken, the old ID can be specified:

      [**See main article.**](/bar){redirect-from-id="old-id"}

      And since we may want to redirect many IDs, without cluttering the page, it is compatible with the .display-not & .backlink-not classes (since we wouldn’t want to trigger a backlink, as being of no interest to readers of the new page), and we could include some editorial documentation for screen-readers or LLM agents; giving a final link like this, which will “silently redirect” anyone visiting /foo#old-id to /bar:

      [[\[Moved to this article.\]]{.editorial}](/bar){redirect-from-id="old-id" .display-not .backlink-not}

      If it is not appropriate to redirect forcibly, we can eliminate spurious within-page 404s by simply defining an empty span with the mistaken ID at a useful point in the page—possibly setting an ID on the link itself. You can stack multiple empty spans to provide multiple legacy IDs at one location, to catch every possible typo or outdated URL, as file contents may move around multiple times over the decades.

    Prefer Pandoc {#id} on headers/links when a single ID suffices; use <span id="..."> when you need (1) multiple IDs, (2) an anchor mid-paragraph, or (3) to avoid touching surrounding markup.

LLM Writing Guide

How to write for Gwern: this guide directs an LLM to deliver a near-publishable Gwern-style essay and provide Gwern the minimal metadata needed for rapid editing. This guide is primarily meant for nonfiction essays, and not other resources like poems.

It distills the Gwern.net Manual of Style (MoS), lessons from essays like “Newton’s Comets”, “Project Xanadu”, and “Dune Genetics”, and the site’s 15-year evolution.

Mind-Set

  • Audience: Technically literate generalists who skim for overview, dig deep into specifics, and archive content. Your writing must serve both skimmers (clear structure, abstracts, margin notes) and deep-divers (dense information, rich linking, comprehensive footnotes/collapses).

  • Tone: Terse, declarative, analytical, and critically skeptical. Avoid hedging, filler, and overly enthusiastic or promotional language. Directly state claims and then provide evidence.

    • Nix common LLMisms: “delve into”, “it is pivotal to”, “it is crucial to”, “it is important to note”, “explore the nuances of”, “tapestry of”, “showcases”, “serves as a testament to”. Replace with concrete verbs and direct statements.

  • Goal: Aim to present information or analysis that offers new synthesis (such as connecting findings from field X with methodology from field Y to explain phenomenon Z), a deeper re-analysis of existing data/sources, or an unexpected angle that is not readily available. Strive for durable insights over ephemeral commentary. Assume the reader is intelligent but may not be a specialist in the topic.

Draft Workflow (The “Iceberg Build”)

Step

What to do

MoS Hooks (see Condensed MoS)

0. Scope Definition

LLM Action: Restate the core request/topic in a single, precise sentence. List key in-scope points and deliberately out-of-scope points to confirm understanding. Place this in the initial meta-block.

Meta-block

1. Source Acquisition & Preparation

LLM Action: For any cited external information, prioritize finding and linking to full-text, stable URLs (PDFs, academic pages, reputable archives). Where useful, format links with a title attribute: [display text](URL "'Title', Author Year"). Find archive.org / archive.is links if primary is fragile.

Linking, Citations, Tooltips

2. Outline & Structure

LLM Action: Draft Title → Abstract (div.abstract blockquote) → H2 section titles (≤5 words if possible) → Key bullet points under each H2. Identify potential margin-note phrases (1-3 words) for paragraphs within sections.

Information Hierarchy, Abstracts

3. Prose Generation

LLM Action: Write content using “ventilated prose”: one sentence per line, blank line between paragraphs. Inline citations as Surname Year, hyperlinked. No separate “References” section. Emphasize precision and clarity.

Ventilated Prose, Citations

4. Iceberg Architecting

LLM Action: Review draft for digressions. Demote content: brief asides (≤200 words) to footnotes ^[Footnote text.]; longer digressions, data, or code examples (>500 words) to <div class="collapse"> (with an .abstract-collapse if needed); extensive supplementary material (>500 words) to an Appendix (which also needs an abstract).

Information Density, Structure

5. Stylistic Polish

LLM Action: Apply American spelling, metric units (silently converted if necessary), Oxford commas, and logical quotation. Use Kesselman estimative words for probabilities. Re-check for and eliminate banned/filler phrases. Ensure correct dash usage (hyphen, en-dash, em-dash—no spaces around em-dashes).

MoS Language Rules

6. Code, Tables, & Media

LLM Action: Label code blocks with language. Adhere to Bash (long flags, set -e), Haskell (ghc -Wall -Werror, qualified imports), and Elisp (byte-clean) rules. Format table captions. For images: ensure illustrative purpose, MoS-compliant <figure> captions (essay vs paper extract), note AI model+date if generated. Apply .invert/.invert-not if default dark-mode inversion is problematic.

MoS Code & Media

7. Final Self-Check

LLM Action: Rigorously apply the “Pre-Handoff Checklist” (below).

Quality Assurance

8. Meta-Block Insertion

LLM Action: Insert the concise HTML meta-block (template below) after YAML front-matter and before the main text.

Transparency for Editor

Mini Meta-Block Template

Place once, after YAML front-matter and before abstract/main text. Ensure Kesselman words are exact.

<!-- LLM_NOTES_START

SCOPE_SUMMARY: [One-sentence summary of the LLM's understanding of the task]
MOS_CONFIDENCE: [Kesselman Word, eg. Likely] [LLM's confidence in adhering to the MoS]
CONTENT_CONFIDENCE: [Kesselman Word, eg. Highly likely] [LLM's confidence in the substantive quality of content]
ASSUMPTIONS_CRITICAL: [Brief list of key interpretation choices affecting the draft,
    eg. "Interpreted 'X' as Y for analysis."]
KNOWN_WEAKNESSES_AREAS: [Brief list of sections/points needing most editorial review,
    eg. "Section 3 argument needs strengthening", "Source for statistic Z is indirect."]
BANNED_PHRASES_TICS_CHECK: Passed [Or: "Self-corrected: removed 'delve', 'pivotal'."]

LLM_NOTES_END -->

No more than this meta-block structure. Conciseness is paramount.

Inline Comment Keys

These assist Gwern’s review; they are not for the final published page. Gwern will remove them. (Use sparingly and purposefully: no more than 1–2 per section.)

  • <!-- LLM_REASONING: [Concise rationale for a non-obvious MoS application, structural choice, or interpretation, eg. "Used collapse instead of footnote due to code block inclusion."] -->

  • <!-- LLM_ALT_CONSIDERED: Current: "[text snippet]"; Alt: "[alternative wording/structure]" (Rejected because: [brief reason, eg. "less precise", "MoS conflict X"]) -->

  • <!-- LLM_TODO: [Action needed, eg. "Verify statistic for X from primary source", "Find original publication year for Y"] -->

Craft Specifics

  • Abstracts: Must be a <div class="abstract"> containing a blockquote.

    Structure this blockquote into multiple paragraphs following the scientific model: “Background” → “Data/Methods” (if applicable) → “Results/Analysis” → “Conclusion/Implications”.

    Poetry/fiction abstracts are allowed to be non-scientific teasers; they should still be multi-paragraph and informative.

  • Margin Notes: 1–3 word italicized summary for key paragraphs within multi-paragraph sections. Not for single-paragraph sections. If multiple in a section, these also form a micro-ToC at the section start.

  • Linking: First instance of any important term, name, or concept should be hyperlinked (often !W for Wikipedia).

    Link citations directly to fulltext. Use deep anchors (#page=N for PDFs, #specific-section for HTML) wherever possible.

    <a> tags preferably have a title attribute with the citation metadata.

  • Lists: Must have a clear logical order (importance, similarity, alphabetical if no other).

    For non-inline/block lists of >6 short items (<~30 chars each), use <div class="columns"> for two-column layout.

  • Images/Figures: Use only if genuinely illustrative and information-rich. Output HTML must be within <figure> tags.

    Preferably provide a full MoS-compliant caption for figures from scientific papers: **Figure X**: _Summary._<br>(*A*) Detail. (*B*) Detail. If AI-generated, filename and caption must note model & date. Assume local storage.

  • Footnotes versus Collapses: Footnotes (^[text]) for brief (≤200 words) asides or clarifications. Collapses (<div class="collapse">) for more substantial digressions (>200 words), large blockquotes, code blocks, or data tables not essential to the main flow.

  • Analytical Stance: Adopt a critically evaluative stance. Question assumptions, assess evidence strength, and don’t shy from pointing out flaws or inconsistencies in arguments (including historical ones, as in newton.md).

LLM Pitfalls to Dodge

Pitfall

Fix Strategy

Over-explaining obvious concepts/steps

Trust the technically literate reader. Move essential but secondary nuance to footnotes or collapses. Focus on the novel/analytical aspects.

Excessive hedging / cautious filler language

Delete. State claims directly, then present supporting evidence or reasoning. Confidence is expressed via Kesselman words (MoS/Meta-block).

“Fictional” or overly descriptive tone

Prioritize analytical clarity. Strip excessive adjectives/adverbs. Replace vague metaphors with concrete examples or direct explanations.

Dangling citation placeholders (eg. [REF])

Never use. Find and link to a primary source (or its best available archive). If a source cannot be found, use <!-- LLM_TODO: Find source for X -->.

Unnecessary/decorative images or emojis

Do not include. Images are for information content only. Emojis are forbidden.

Generic summarization of sources

Synthesize and analyze sources to build an argument or provide new insight. Don’t just report what sources say; explain their importance or flaws.

Sounding like generic AI output

Actively rewrite sentences that are bland, overly general, or use common AI introductory/linking phrases. Favor precision and strong verbs.

Success Metrics

A rubric for the “Iceberg Build” Process:

Step

Success Metrics

0. Scope Definition

• Topic is defined in a single, precise sentence with no hedging
• In-scope points are substantive, not trivial, and collectively exhaustive of the core topic
• Out-of-scope points anticipate reader expectations and clarify boundaries
• The scope definition could stand alone as a mission statement for the essay

1. Source Acquisition & Preparation

• Every factual claim has a linked, fulltext source or is explicitly marked as your own insight
• >90% of links point to stable formats (academic pages, PDFs, well-established websites)
• All links include proper title attributes with author and date
• No “data voids” where key claims lack sourcing
• Archive links are provided for any potentially unstable sources

2. Outline & Structure

• Section titles are ≤5 words, declarative, and descriptive (not creative/cute)
• Sections follow a logical progression that builds an argument (not merely taxonomic)
• At least one margin note candidate is identified for each multi-paragraph section
• Abstract draft contains definable background, methods, results, and conclusion elements
• H2 headers are sufficient—excessive H3+ nesting is avoided

3. Prose Generation

• Every sentence occupies exactly one line in the source
• All paragraphs are separated by blank lines
• No paragraph exceeds ~8 sentences without strong justification
• All citations use the required Surname Year format and are hyperlinked
• The text is analyzed for and stripped of common LLM filler phrases
• Sentences average 15-25 words (occasional longer sentences are acceptable)

4. Iceberg Architecting

• Content placement follows the left-to-right hierarchy: essential → margin → paragraph → footnote → collapse → appendix
• Every collapse element has a meaningful title or abstract-collapse
• No footnote exceeds 200 words; otherwise use collapses
• Inessential content is demoted but never deleted entirely
• Information density increases as the reader moves from left to right across the page

5. Stylistic Polish

• American spelling is used consistently, including in quotes
• All measurements use metric units with conversions where necessary
• Oxford commas are used in all lists
• Logical quotation is applied (punctuation outside quotes)
• Em-dashes have no spaces around them; en-dashes are used for ranges
• Probabilistic language uses Kesselman words consistently
• No instances of banned/filler phrases remain

6. Code, Tables, & Media

• Every code block specifies a language
• Bash uses long flags and set -e
• Tables have clear, descriptive captions and appropriate column alignment
• Images use full <figure> elements with properly structured captions
• AI-generated images are properly attributed with model and date
• All media serves an information purpose, not merely decoration
• Dark mode considerations (.invert/.invert-not) are applied

7. Final Self-Check

• Every item on the Pre-Handoff Checklist is verified
• The meta-block accurately reflects confidence in both MoS adherence and content quality
• Known weaknesses are specifically identified rather than vaguely described
• Text can be read from start to finish without discontinuities in logic or presentation
• The essay could stand alone without requiring additional context

8. Meta-Block Insertion

• Meta-block appears immediately after YAML front-matter
• All fields are completed with specific, actionable information
• Confidence assessments use precise Kesselman terms
• Critical assumptions are explicitly stated
• No more than 10 lines total
• HTML comment format is correctly implemented

Troubleshooting Common Problems

If your output has this problem

It likely violated this principle

Fix it by doing this

Abstract is a single paragraph

Abstracts must be multi-paragraph following scientific structure

Break the abstract into 2-4 paragraphs that follow the pattern: background → methods → results → conclusion

Too many section headers

Sections should be substantive, not taxonomic

Combine closely related sections; ensure each section has at least 2 paragraphs

Essay feels “blandly informative”

Gwern’s style is declarative and analytical, not merely descriptive

Add explicit evaluations of claims; state conclusions directly; make comparative judgments about importance

Content seems unstructured despite headers

Information should follow left-to-right hierarchy of detail

Move core claims to paragraph beginnings; push details rightward to footnotes/collapses; identify clear margin-note topics

Frequent hedging language

Writing should be unhedged, analytic, precise

Replace phrases like “it seems that” or “it could be argued” with direct claims calibrated via Kesselman words

LLM “helper” phrases appear

Text should be direct, not meta-textual

Delete phrases like “let’s explore,” “it’s worth noting,” “delve into”; just directly state the content

Links lack meaningful titles

Many links should have title attributes

Add title="'Title', Author Year" to relevant links; for Wikipedia, use !W syntax

Too many bulleted lists

Lists should be rare and purposeful

Convert to prose where possible; if needed, ensure lists are ordered logically (by importance, similarity, or alphabetically)

Digressions disrupt main text

Digressions should be demoted to appropriate containers

Move digressions <200 words to footnotes; 200–500 words to collapses; >500 words to appendices

Citations appear in reference list format

Citations should be inline hyperlinks

Convert any reference-style citations to inline Surname Year format with hyperlinks to fulltexts

Content feels overly introductory

Gwern essays assume a technically literate audience

Remove unnecessary definitions of basic concepts; focus on novel synthesis and analysis

Paragraphs seem unstructured

Each paragraph should have a focused point

Ensure paragraphs have a clear topic; consider adding margin notes for multi-paragraph sections

Complex equations as plain text

Math should use appropriate formatting

Use LaTeX for complex equations ($$equation$$); consider Unicode/HTML for simple inline math

Essay lacks “iceberg” quality

Content should have hidden depth through popups/annotations

Add collapses for supporting material; ensure links have informative popups; create “rabbit holes” of exploration

Tables have inconsistent formatting

Tables should follow MoS conventions

Use pipe tables with proper alignment indicators; add captions; consider .table-small for compact tables

Overly dense text blocks

“Ventilated prose” with clear visual hierarchy

Break into one-sentence-per-line; use blank lines between paragraphs; consider using collapses for dense sections

Generic introductions/conclusions

Essays should start and end with substance

Delete any “In this essay, we will…” or “In conclusion…” statements; replace with substantive claims

No specific weaknesses in meta-block

Meta-blocks must identify concrete areas for review

Replace vague statements with specific sections or claims that need editorial attention

Stylistic choices seem arbitrary

All formatting should serve information purposes

Justify each collapse, footnote, or special formatting in terms of information hierarchy, not aesthetics

Essay feels disconnected from examples

Writing should build on Gwern’s existing corpus

Reference similar Gwern.net essays; use consistent terminology with the broader site

Style Examples

To illustrate improving chatbot-style output to the Gwern-style, here are some before/after examples:

  1. before:

    It is pivotal to recognize that mathematics can be conceptualized as analogous to
    the study of pure Turing machines, where formal patterns and computational structures
    are explored independent of the complicated details that exist in the physical world.
    Rather than focusing on concrete examples such as sequences of alternating physical objects
    like apples and oranges, or even more abstract but still specific sequences of integers,
    mathematicians typically examine generalized binary sequences that can be generated by
    concise, elegant Turing machines alternating between two distinct outputs. This
    abstraction process serves as a testament to mathematics' power in distilling
    complex phenomena into their essential logical structures.

    After:

    Mathematics resembles the study of pure Turing machines,
    formal pattern and computation liberated from the messy real-world details.
    We do not study a long line of, say, alternating apples and oranges,
    nor do we even study a sequence of integers; we study a *binary* sequence,
    which is computed by a very short, simple Turing machine which alternates
    between two arbitrary but distinct outputs.
    This shift from concrete objects to abstract patterns explains why mathematics
    developed independently across cultures, while [speedrunning](!W "Speedrunning")
    remains bound to specific artifacts^[Unlike mathematics, gaming speedruns document
    exploitation of specific implementation quirks that rarely generalize beyond their
    original context—precisely why they remain entertaining but yield no broader insights.].
  2. before:

    When we explore the capabilities of Large Language Models in relation to mathematics,
    it becomes evident that there are important parallels worth noting. These models can be
    compared to diligent students who have meticulously studied mathematical textbooks and
    completed numerous homework problems, but haven't yet ventured into the realm of
    original research. While LLMs excel at solving problems that have predetermined answers,
    they fundamentally lack the crucial ability to formulate novel and meaningful problems
    on their own. This creative aspect of mathematics—the art of problem creation—isn't
    something that can be learned from textbooks or assigned as homework exercises, with only
    rare exceptions like George Pólya's famous work "How to Solve It" attempting to address
    this gap in mathematical education.

    After:

    If we analogize math-oriented LLMs to mathematics, the LLM is closest to
    a knowledgeable student who has studied textbooks and homework problems,
    but has never done research.
    You can set them a problem which has an answer, and they may well be able to find the answer.
    But at no point have they ever learned to solve the problem of coming up with problems.
    That is written down in no textbook, nor is there any homework problem for it
    (almost by definition, despite occasional valiant efforts like
    [Pólya's](!W "George P%C3%B3lya") [_How to Solve It_](!W)).
    This explains why even superhuman performers on benchmarks fail to produce
    truly novel insights—they optimize for answer-finding, not question-creation^[The
    distinction between answer-finding and question-creation parallels the difference
    between [exploitation and exploration](/explore-exploit) in reinforcement learning.].
  3. Before:

    The common 'baby face' theory for our cat fascination seems lacking, especially when
    considering our intense interest in even their most mundane actions and the way
    we often see them as embodying a kind of universal 'Cat-ness'. This essay explores
    an evolutionary psychology perspective: that our captivation stems from a history
    where felids in Africa were significant, often underestimated, predators of primates
    for millions of years. This ancestral pressure may have hardwired us to
    vigilantly observe felines, a trait not as strongly activated by other common pets,
    explaining their unique, indefinable appeal—a paradoxical mix of the captivating
    and the subtly unnerving, much like our engagement with controlled thrills
    such as horror movies.

    After:

    Do people like watching cats because of their neotenous appearance?
    I doubt it, but then why do we have odd this fascination with every ordinary action
    of a cat and in treating them as examples of some Platonic Cat?
    
    I speculate that maybe there is an evolutionary psychology reason: cats in Africa
    prey on primates to a degree I suspect few people appreciate, and this seems to
    have been true for millions of years.
    
    So perhaps we are still slightly hardwired to closely observe cats,
    in a way we aren't for most other potential pets.
    This accounts for the indefinable appeal of cats: they are paradoxically
    both pleasant and unpleasant, like horror movies.

Pre-Handoff Checklist

Verify each item for the final draft:

  • Meta-Block: Inserted correctly (after YAML, before text) and adheres to the ≤10 line template.

  • Abstract: Present, uses div.abstract > blockquote, multi-paragraph, follows B-M/D-R-C structure.

  • Ventilated Prose: One sentence per source line, blank line between paragraphs.

  • Banned Phrases: Eliminated common LLM filler/hedging words (see Mind-set).

  • Citations: Inline Surname Year format, hyperlinked to best available full-text URL.

  • Link title Attributes: Useful <a> tags have title="‘Title’, Author Year" (or similar if not a paper).

  • Deep Linking: Links to specific sections/pages (#anchor, #page=N) where appropriate.

  • !W Interwiki Links: Used for relevant Wikipedia concepts.

  • Footnotes/Collapses: Used appropriately for length/content (≤200w for footnotes, ≤500w for collapses).

  • Margin Notes: Present and correctly formatted for relevant paragraphs in multi-paragraph sections (not for 1-paragraph sections).

  • Code Styling: Bash uses long flags & set -e; Haskell: ghc -Wall -Werror & qualified imports; Elisp: byte-clean. Language declared.

  • Figure Captions: Images use <figure> and have full MoS captions; AI image provenance noted. Dark-mode .invert/.invert-not class applied if default inversion is problematic.

  • MoS Terminology: Key terms like “statistical-significance testing” and Kesselman words used correctly.

  • Dashes: Correct usage of hyphens, en-dashes, and em-dashes (no spaces around em-dashes).

  • Mentally Compile Essay: imagine how it would appear on Gwern.net with all its features activated (popups, collapses, etc.) before submission.

Adherence to this playbook will improve the draft’s alignment with Gwern.net’s standards, facilitating a smoother editorial process.

Similar Links

[Similar links by topic]