Bibliography (330):

  1. CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3

  2. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  3. Contrastive Representation Learning: A Framework and Review

  4. WaveNet: A Generative Model for Raw Audio

  5. Malware Detection by Eating a Whole EXE

  6. Multi-trait analysis of genome-wide association summary statistics using MTAG

  7. Speech2Face: Learning the Face Behind a Voice

  8. ‘variance components’ directory

  9. Assessing the Big Five personality traits using real-life static facial images

  10. LipNet: End-to-End Sentence-level Lipreading

  11. LipNet: How Easy Do You Think Lipreading Is?

  12. Absolute Unit NNs: Regression-Based MLPs for Everything

  13. https://www.lesswrong.com/posts/K7AyY8LMrcKhwfbyj/no-really-attention-is-all-you-need-attention-can-do

  14. Scaling MLPs: A Tale of Inductive Bias

  15. Technology Forecasting: The Garden of Forking Paths

  16. Scaling Laws for Neural Language Models

  17. Chinchilla: Training Compute-Optimal Large Language Models

  18. Attention Is All You Need

  19. The Scaling Hypothesis

  20. Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers

  21. Bayesian Optimization in AlphaGo

  22. InvertOrNot.com Proposal

  23. Abandoning Objectives: Evolution Through the Search for Novelty Alone

  24. Towards a Human-like Open-Domain Chatbot

  25. Scaling Laws for Reward Model Overoptimization

  26. Timeghost

  27. crop#aspect-ratio-training

    [Transclude the forward-link's context]

  28. SDXL § Micro-Conditioning: Conditioning the Model on Image Size

  29. Choose-Your-Own-Adventure AI Dungeon Games

  30. GPT-2 Preference Learning for Music Generation § Optimization by Backprop, Not Blackbox

  31. Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction

  32. Progressive Growing of GANs for Improved Quality, Stability, and Variation

  33. PixelRNN: Pixel Recurrent Neural Networks

  34. Parallel Multiscale Autoregressive Density Estimation

  35. not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution

  36. Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

  37. CM3: A Causal Masked Multimodal Model of the Internet

  38. MAE: Masked Autoencoders Are Scalable Vision Learners

  39. https://arxiv.org/pdf/2307.01952.pdf#page=3

  40. Claude Plays Pokemon

  41. Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples

  42. Nenex: A Neural Personal Wiki Idea

  43. index#scaling-laws

    [Transclude the forward-link's context]

  44. LLM Applications I Want To See

  45. ‘AI mode collapse’ directory

  46. Virtual comments: LLM idea

  47. Hierarchical Embeddings for Text Search

  48. Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

  49. Recursively Summarizing Books with Human Feedback

  50. Co-Writing Screenplays and Theatre Scripts with Language Models (Dramatron): An Evaluation by Industry Professionals

  51. https://arxiv.org/pdf/2209.14958#page=5&org=deepmind

  52. design#future-tag-features

    [Transclude the forward-link's context]

  53. The Curious Case of Neural Text Degeneration

  54. ‘discrete diffusion model’ directory

  55. resorter#noisy-sorting

    [Transclude the forward-link's context]

  56. https://beta.openai.com/docs/guides/classifications

  57. Text and Code Embeddings by Contrastive Pre-Training

  58. 01#gzip

    [Transclude the forward-link's context]

  59. Calculating The Gaussian Expected Maximum § Probability of Bivariate Maximum

  60. The Relationship Of Validity Coefficients To The Practical Effectiveness Of Tests In Selection: Discussion And Tables

  61. Number Search Engine via NN Embeddings

  62. littlewood#media

    [Transclude the forward-link's context]

  63. Websim, Worldsim, and The Summer of Simulative AI

  64. Some Evidence of Bees and Honey in Ancient Egypt

  65. leprechaun#miscitation

    [Transclude the forward-link's context]

  66. Leprechaun Hunting & Citogenesis

  67. Chaff Bugs: Deterring Attackers by Making Software Buggier

  68. ROME: Locating and Editing Factual Associations in GPT

  69. Activation Addition: Steering Language Models Without Optimization

  70. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

  71. ‘truesight (stylometry)’ directory

  72. The Art of the Shadow: How Painters Have Gotten It Wrong for Centuries [From The Visual World of Shadows]

  73. Three Months in Monte Carlo

  74. Analytic and Algorithmic Solution of Random Satisfiability Problems

  75. Anime Crop Datasets: Faces, Figures, & Hands § Hands

  76. DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

  77. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

  78. Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

  79. gpt-2-preference-learning#differentiable-sorting

    [Transclude the forward-link's context]

  80. Unsupervised Neural Machine Translation with Generative Language Models Only

  81. https://www.crosslabs.org/blog/diffusion-with-offset-noise

  82. Progressive Distillation for Fast Sampling of Diffusion Models

  83. Consistency Models

  84. Problem 14 Dynamic Programming Solutions

  85. mixup: Beyond Empirical Risk Minimization

  86. DataMUX: Data Multiplexing for Neural Networks

  87. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

  88. Rectified Flow: A Marginal Preserving Approach to Optimal Transport

  89. InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

  90. UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

  91. TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

  92. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  93. ‘recurrent Transformer’ directory

  94. Diffusion Is Spectral Autoregression

  95. Progressive Growing of GANs for Improved Quality, Stability, and Variation

  96. Text Embeddings Reveal (Almost) As Much As Text

  97. Absolute Unit NNs: Regression-Based MLPs for Everything § Memorize All The Things

    [Transclude the forward-link's context]

  98. DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language

  99. GANs Didn’t Fail, They Were Abandoned

  100. StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

  101. GigaGAN: Scaling up GANs for Text-to-Image Synthesis

  102. BigGAN: Consistency Regularization (SimCLR-Style) Loss

  103. A Simple Framework for Contrastive Learning of Visual Representations

  104. Training GANs with Stronger Augmentations via Contrastive Discriminator (ContraD)

  105. Self-conditioned Image Generation via Generating Representations

  106. The Unusual Effectiveness of Averaging in GAN Training

  107. Stochastic Weight Averaging and the Ornstein-Uhlenbeck Process

  108. Connecting Generative Adversarial Networks and Actor-Critic Methods

  109. How AI Training Scales

  110. face#minibatch-retrieval

    [Transclude the forward-link's context]

  111. Making Anime Faces With StyleGAN § Reversing StyleGAN To Control & Modify Images

  112. face#biggan-latent-space

    [Transclude the forward-link's context]

  113. Net2Net: Accelerating Learning via Knowledge Transfer

  114. The Cost of Imbalance in Clinical Trials

  115. The Power of Twins: The Scottish Milk Experiment

  116. Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

  117. Small-GAN: Speeding Up GAN Training Using Core-sets

  118. Top-K Training of GANs: Improving GAN Performance by Throwing Away Bad Samples

  119. https://algorithmsbook.com/files/dm.pdf#page=246

  120. Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

  121. MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks

  122. Generator Knows What Discriminator Should Learn in Unconditional GANs

  123. Simple statistical gradient-following algorithms for connectionist reinforcement learning

  124. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

  125. Distilling the Knowledge in a Neural Network

  126. https://x.com/mere_mortise/status/934932000796020736

  127. Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks

  128. Sem-GAN: Semantically-Consistent Image-to-Image Translation

  129. Improving Shape Deformation in Unsupervised Image-to-Image Translation

  130. Detecting GAN generated errors

  131. A U-Net Based Discriminator for Generative Adversarial Networks

  132. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

  133. ImageNet: A Large-Scale Hierarchical Image Database

  134. Novelty Nets: Classifier Anti-Guidance

  135. [D] RL: GANs As MCTS Environment Simulator for Deep Model-Based Planning?

  136. The Shattered Gradients Problem: If resnets are the answer, then what is the question?

  137. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

  138. Data-dependent Initializations of Convolutional Neural Networks

  139. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

  140. Deep Information Propagation

  141. On weight initialization in deep neural networks

  142. Convolution Aware Initialization

  143. HyperNetworks

  144. Using Fast Weights to Attend to the Recent Past

  145. SMASH: One-Shot Model Architecture Search through HyperNetworks

  146. https://www.lesswrong.com/posts/2JJtxitp6nqu6ffak/basic-facts-about-language-models-during-training-1?commentId=M3wsmwiGBCxd4dHHW

  147. GPT-2 Preference Learning for Music Generation § Bradley-Terry Preference Learning

  148. GPT-2 Preference Learning for Music Generation § Decision Transformers: Preference Learning As Simple As Possible

  149. Gato: A Generalist Agent

  150. Learning to summarize from human feedback

  151. ‘AlphaStar’ directory

  152. Player of Games

  153. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  154. MLMAC

  155. Better Language Models and Their Implications

  156. Bigscience/bloom

  157. XLNet: Generalized Autoregressive Pretraining for Language Understanding

  158. https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation

  159. SolidGoldMagikarp II: Technical Details and More Recent Findings

  160. https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology

  161. GPT-3 Creative Fiction § BPEs

  162. scaling-hypothesis#blessings-of-scale

    [Transclude the forward-link's context]

  163. On Being The Right Size

  164. Computer Optimization: Your Computer Is Faster Than You Think § DL

    [Transclude the forward-link's context]

  165. Motion Planning for Dynamic Knotting of a Flexible Rope with a High-speed Robot Arm

  166. Motion Planning for Dynamic Folding of a Cloth with Two High-Speed Robot Hands and Two High-Speed Sliders

  167. Free-Play Periods for RL Agents

  168. Brit-Pick

  169. The Surprising Number of American Adults Who Think Chocolate Milk Comes from Brown Cows

  170. https://ru.wikipedia.org/wiki/%D0%92%D1%8F%D0%B7%D1%8C

  171. ‘A Font Inspired by Square Word Calligraphy’, Pomdepin

  172. https://fontsinuse.com/typefaces/40498/ed-interlock

  173. Utext: Rich Unicode Documents

  174. XKCD #941: Depth Perception

  175. Depth Perception

  176. Speculative Loading

  177. Prerender Pages in Chrome for Instant Page Navigations

  178. Web APIs: Speculation Rules API

  179. Banner Ads Considered Harmful

  180. Cat itecture: Better Cat Window Boxes

  181. LAION-Aesthetics

  182. Sandspiel

  183. State-Space of Drug Effects: Results

  184. Darknet Market Archives (2013–2015)

  185. Acne: a good Quantified Self topic

  186. anime#battle-angel-alita

    [Transclude the forward-link's context]

  187. movie#ready-player-one

    [Transclude the forward-link's context]

  188. https://www.juliansanchez.com/2009/12/08/the-redactors-dilemma/

  189. https://www.fastcompany.com/90692176/chinese-wikipedia

  190. Nucleus Genomics

  191. Formal Theory of Creativity & Fun & Intrinsic Motivation (1990–2010)

  192. Magic, Explanations, and Evil: The Origins and Design of Witches and Sorcerers [and replies]

  193. Wikipedia Bibliography:

    1. Gravatar  :

    2. Word embedding

    3. Perceptual hashing  :

    4. Sparklines

    5. Recognition memory

    6. Principal component analysis

    7. T-distributed stochastic neighbor embedding

    8. Fourier transform

    9. Single-nucleotide polymorphism

    10. Linkage disequilibrium

    11. Polygenic score

    12. Lasso (statistics)

    13. Recurrent neural network

    14. Big Five personality traits

    15. Empirical Bayes method

    16. Stochastic gradient descent

    17. Shrinkage (statistics)

    18. Winner’s curse

    19. Knowledge distillation

    20. Centroid

    21. N-sphere  :

    22. K-means clustering

    23. Rejection sampling

    24. Active learning (machine learning)

    25. Autoregressive model

    26. Mipmap  :

    27. Image segmentation

    28. SHRDLU

    29. Graphomanic  :

    30. TvTropes  :

    31. Brownian bridge

    32. AI Dungeon

    33. Choose Your Own Adventure

    34. Kullback-Leibler divergence

    35. Pareto front

    36. The Lottery in Babylon

    37. Jorge Luis Borges

    38. Digital watermarking  :

    39. Analogue hole  :

    40. ‘Tlön, Uqbar, Orbis Tertius’

    41. John Drewe § Career as a forger  :

    42. Honeytoken  :

    43. Trap street  :

    44. 555 (telephone number)  :

    45. Error-correcting code  :

    46. Probabilistically checkable proofs  :

    47. PCP theorem  :

    48. Chaffing and winnowing  :

    49. Ising model

    50. Reinforcement learning

    51. Brownian Bridge  :

    52. Random walk

    53. Diffusion model

    54. JPEG

    55. Embeddings  :

    56. Simulated annealing

    57. Variance

    58. Law of large numbers

    59. Stratified sampling

    60. Blocking (statistics)

    61. Coresets

    62. Quasi-Monte Carlo method

    63. Low-discrepancy sequence

    64. Antithetic variates

    65. Order statistic § Order statistics sampled from a uniform distribution  :

    66. Order statistic

    67. U-Net

    68. AlphaGo

    69. Monte Carlo tree search

    70. Edit distance

    71. ‘There’s Plenty of Room at the Bottom’  :

    72. Dollhouse  :

    73. RoboCup  :

    74. Meccano  :

    75. K'Nex  :

    76. Experience curve effects § Reasons for the effect  :

    77. Delta robot

    78. General Social Survey

    79. Hangul

    80. Ligature (writing)

    81. Constructed writing system  :

    82. Xu Bing § Square Word Calligraphy  :

    83. Display typeface  :

    84. Ed Benguiat  :

    85. Tiki culture  :

    86. Stereoscopy

    87. Unmanned aerial vehicle

    88. Global Positioning System

    89. Geo-fence  :

    90. Natural experiment

    91. Regression discontinuity design

    92. The Market for Lemons

    93. Home inspection  :

    94. Delivery drone  :

    95. Zipline (drone delivery company)  :

    96. Amazon Prime Air  :

    97. Webcomic

    98. Girl Genius  :

    99. Web fiction § Web serial  :

    100. Greasemonkey  :

    101. WordPress

    102. Zen sand gardens  :

    103. Sokoban

    104. Stephen's Sausage Roll  :

    105. Ichi-go ichi-e  :

    106. The Witness (2016 video game)  :

    107. Boustrophedon

    108. Ōkami  :

    109. Particle system

    110. Falling-sand game  :

    111. Sandbox game  :

    112. The Powder Toy  :

    113. Regression toward the mean

    114. Alita: Battle Angel

    115. Battle Angel Alita  :

    116. Ready Player One  :

    117. Ready Player One (film)

    118. Wikidata

    119. Chinese Wikipedia  :

    120. 2021 Wikimedia Foundation actions on the Chinese Wikipedia  :

    121. Baidu Baike  :

    122. Weibo

    123. English Wikipedia

    124. 23andMe

    125. Nominative determinism

    126. Schizophrenia

    127. Pareidolia

    128. Hygiene hypothesis  :

    129. Escape room  :

    130. Predictive coding

    131. Gang stalking  :

    132. Helminthic therapy

    133. Isolation tank

    134. Nicotine

    135. Diplomacy (game)

    136. Social deduction games  :

    137. Mafia (party game)

    138. Dan Rather § ‘Kenneth, what is the frequency?’  :