Bibliography (126):

  1. Gwern.net newsletter (Substack subscription page)

  2. March 2021 News

  3. ‘newsletter’ directory

  4. Changelog

  5. Gwern Branwen Creating Essays on Gwern.net

  6. Rare Greek Variables

  7. Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks

  8. Perceiver: General Perception with Iterative Attention

  9. Attention Is All You Need

  10. Do Transformer Modifications Transfer Across Implementations and Applications?

  11. Predictive Coding Can Do Exact Backpropagation on Any Neural Network

  12. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates

  13. Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets

  14. The large learning rate phase of deep learning: the catapult mechanism

  15. https://www.reddit.com/r/MachineLearning/comments/ba1wg5/d_thoughts_about_superconvergence_and/

  16. Rip van Winkle’s Razor, a Simple New Estimate for Adaptive Data Analysis

  17. Ambigrammatic Figures: 55 Grotesque Ambigrams

  18. Making Anime Faces With StyleGAN § Reversing StyleGAN To Control & Modify Images

  19. ML Scaling subreddit

  20. The Akronomicon: an Extreme-Scale Leaderboard

  21. Naver unveils first ‘hyperscale’ AI platform

  22. PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

  23. PCL-Platform.Intelligence/PanGu-Alpha: 2000亿开源中文预训练语言模型

  24. ChinAI #141: The PanGu Origin Story: Notes from an informative Zhihu Thread on PanGu

  25. LaMDA: Our Breakthrough Conversation Technology

  26. MUM: A New AI Milestone for Understanding Information

  27. [Ali released PLUG: 27 billion parameters, the largest pre-trained language model in the Chinese community]

  28. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

  29. PALM: Pre-training an Autoencoding & Autoregressive Language Model for Context-conditioned Generation

  30. CogView: Mastering Text-to-Image Generation via Transformers

  31. DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language

  32. M6: A Chinese Multimodal Pretrainer

  33. VideoGPT: Video Generation using VQ-VAE and Transformers

  34. GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

  35. Efficient Large-Scale Language Model Training on GPU Clusters

  36. NVIDIA/Megatron-LM: Ongoing Research Training Transformer Models at Scale

  37. GSPMD: General and Scalable Parallelization for ML Computation Graphs

  38. GTC 2021 Keynote With NVIDIA CEO Jensen Huang: NVIDIA CEO Jensen Huang Delivers the #GTC21 Keynote, Where He Introduced Amazing Breakthroughs in Building Virtual Worlds With NVIDIA Omniverse; in Advancing Enterprise Computing With New NVIDIA DGX Systems and Software; in Turning the Data Center into the New Unit of Computing With the New NVIDIA Grace CPU, BlueField-3 DPU, and DOCA 1.0 SDK; in Broadening the Reach of AI to All Companies and Industries With NVIDIA EGX and Aerial 5G; and in Transforming Transportation With NVIDIA DRIVE Orin and Atlan.

  39. 2021-04-12-jensenhuang-gtc2021keynote-ean_oizwuxa.en.vtt.txt

  40. Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters

  41. China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.

  42. Exploring Sparse Expert Models and Beyond

  43. MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

  44. MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model

  45. 2021-schrittwieser-figure1-mspacmanmuzerologrewardscaling.jpg

  46. Decision Transformer: Reinforcement Learning via Sequence Modeling

  47. Learning and Planning in Complex Action Spaces

  48. Continuous Control for Searching and Planning with a Learned Model

  49. Muesli: Combining Improvements in Policy Optimization

  50. Visualizing MuZero Models

  51. Scaling Scaling Laws with Board Games

  52. Andy Jones

  53. Computer Optimization: Your Computer Is Faster Than You Think

  54. Scaling Laws for Language Transfer Learning

  55. Scaling Laws for Transfer

  56. Carbon Emissions and Large Neural Network Training

  57. How to Train BERT with an Academic Budget

  58. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  59. https://web.archive.org/web/20211101000000/https://bls.gov/news.release/ecec.nr0.htm

  60. https://bls.gov/news.release/archives/ecec_031986.pdf

  61. Precision exercise medicine: understanding exercise response variability

  62. Analysis of genomic DNA from medieval plague victims suggests long-term effect of Yersinia pestis on human immunity genes

  63. China officially bans CRISPR babies, human clones and animal-human hybrids

  64. Reflecting Sunlight: Recommendations for Solar Geoengineering Research and Research Governance

  65. Should We Block the Sun? Scientists Say the Time Has Come to Study It. The National Academies said the United States must study technologies that would artificially cool the planet by reflecting away some sunlight, citing the lack of progress fighting global warming.

  66. Improving Public Sector Management at Scale? Experimental Evidence on School Governance India

  67. Jay-Z’s 99 Problems, Verse 2: A Close Reading with Fourth Amendment Guidance for Cops and Perps

  68. Oxylipin biosynthesis reinforces cellular senescence and allows detection of senolysis

  69. Inside the Secret Sting Operations to Expose Celebrity Psychics: Are some celebrity mediums fooling their audience members by reading social media pages in advance? A group of online vigilantes is out to prove it

  70. If I fits I sits: A citizen science investigation into illusory contour susceptibility in domestic cats (Felis silvestris catus)

  71. Cetaceans, sex and sea serpents: an analysis of the Egede accounts of a "most dreadful monster" seen off the coast of Greenland in 1734

  72. Paxo's Pot-Pourri

  73. Building the perfect curse word: A psycholinguistic investigation of the form and meaning of taboo words

  74. How Developers Choose Names

  75. Bringing GNU Emacs to Native Code

  76. Hosting SQLite databases on Github Pages (or any static file hoster)

  77. Check out This Demo: I Run the SQL Query select Country_code, Long_name from Wdi_country Order by Rowid Desc Limit 100 and It Fetches Just 54.2KB of New Data (Across 49 Small HTTP Requests) to Return 100 Results—From a Statically Hosted Database File That’s 668.8MB!

  78. Fontemon

  79. How I Did Relay Quine

  80. Surprisingly Turing-Complete

  81. https://sigbovik.org/2021/proceedings.pdf

  82. https://sigbovik.org/2021/proceedings.pdf#page=8

  83. https://sigbovik.org/2021/proceedings.pdf#page=83

  84. https://sigbovik.org/2021/proceedings.pdf#page=126

  85. https://sigbovik.org/2021/proceedings.pdf#page=167

  86. https://sigbovik.org/2021/proceedings.pdf#page=216

  87. https://sigbovik.org/2021/proceedings.pdf#page=252

  88. Time Travel and Computing

  89. https://sigbovik.org/2021/proceedings.pdf#page=282

  90. The Association for Computational Heresy

  91. On the Impossibility of Supersized Machines

  92. https://journals.le.ac.uk/index.php/pst/issue/archive

  93. BMJ Christmas Issue

  94. Bahfest

  95. Possible Girls

  96. The Kelly Criterion in Blackjack Sports Betting, and the Stock Market

  97. The Performance Pay Nobel

  98. Evolution as Backstop for Reinforcement Learning

  99. The Ocean’s Hot Dog: The Development of the Fish Stick

  100. The esthetics of Smelly Art

  101. The Odor Value Concept in the Formal Analysis of Olfactory Art

  102. Hedonic Tone, Memetics, Scent, Sex, Spirituality

  103. Qualia Research Diary: Scents [Consciousness Research, Experiment, Genetics, Memetics, Scent, Valence]

  104. The Scent of the Nile: Jean-Claude Ellena creates a new perfume

  105. Mechanisms of scent-tracking in humans

  106. 2006-porter-humanscenttracking-41593_2007_bfnn1819_moesm2_esm.mp4

  107. Poor human olfaction is a 19th-century myth

  108. Perceptual convergence of multi-component mixtures in olfaction implies an olfactory white

  109. History of Combinatorial Generation (The Art of Computer Programming: Volume 4: Pre-Fascicle 4B: §7.2.1.7) § Pg22

  110. Https://x.com/add_hawk/status/1357071738731814912

  111. The Best in Fragrance…and More