--- title: February 2021 News description: February 2021 Gwern.net newsletter with links on AI scaling, semaglutide, and ethicist ethics. created: 2020-01-02 thumbnail: /doc/ai/scaling/2021-hernandez-transferlearning-figure1-transfervsfinetuning.png thumbnailText: "Hernandez et al 2021, 'Scaling Laws for Transfer': 'Figure 1: We display the performance of a 40M parameter Transformer model on python, both trained from scratch on python and pre-trained on text then fine-tuned on python. DT is the amount of additional python characters that a from-scratch model of the same size would have needed to achieve the same loss on python as a fine-tuned model. In the labeled example, we see that for a 40M parameter transformer fine-tuned on 3e5 characters, DT is approximately 1000× bigger than DF. The less fine-tuning data is available, the more pre-training helps.'" status: finished previous: /newsletter/2021/01 next: /newsletter/2021/03 confidence: log cssExtension: dropcaps-de-zs backlink: False ... February 2021's [Gwern.net](/newsletter/2021/02 "'February 2021 News', Branwen 2020") [newsletter](https://gwern.substack.com/ "'Gwern.net newsletter (Substack subscription page)', Branwen 2013") is now out; previous, [January 2021](/newsletter/2021/01 "'January 2021 News', Branwen 2020") ([archives](/doc/newsletter/index)). This is a collation of links and summary of major changes, overlapping with my [Changelog](/changelog); brought to you by my donors on [Patreon](https://www.patreon.com/gwern). # Writings - **Gwern.net**: popups: can now be moved, stickied, and full-screened (another step towards our ambition of Windows-95-in-the-browser!) # Links ## AI - ["Controllable Neural Text Generation"](https://lilianweng.github.io/lil-log/2021/01/02/controllable-neural-text-generation.html#openai), Lilian Weng; ["Recent Advances in Language Model Fine-tuning"](https://www.ruder.io/recent-advances-lm-fine-tuning/ "This article provides an overview of recent methods to fine-tune large pre-trained language models"), Sebastian Ruder (review) - ["Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm"](https://arxiv.org/abs/2102.07350), Reynolds & McDonell 2021 (original 10-shot Fr → En translation can be beaten by the better 0-shot prompt: "French: XYZ / English:..."; this is "true of most worst-performing prompts..."); ["Calibrate Before Use: Improving Few-Shot Performance of Language Models"](https://arxiv.org/abs/2102.09690), Zhao et al 2021 (huge boost from calibrating unstable prompts; both demonstrate, [as always](/gpt-3#prompts-as-programming "'GPT-3 Creative Fiction § Prompts As Programming', Branwen 2020"), that "sampling can prove the presence of knowledge but not the absence.") - ["TransGAN: Two Transformers Can Make One Strong GAN"](https://arxiv.org/abs/2102.07074), Jiang et al 2021 (Transformer-only GAN: attention is all you need) - ["PACT: Proof Artifact Co-training for Theorem Proving with Language Models"](https://arxiv.org/abs/2102.06203 "'Proof Artifact Co-training for Theorem Proving with Language Models', Han et al 2021"), Han et al 2021 ([GPT-f](https://arxiv.org/abs/2009.03393#openai "'GPT-f: Generative Language Modeling for Automated Theorem Proving', Polu & Sutskever 2020") for [Lean](https://en.wikipedia.org/wiki/Lean_\(proof_assistant\))) - ["Towards End-to-End In-Image Neural Machine Translation"](https://arxiv.org/abs/2010.10648#google), Mansimov et al 2020 (sure why not) - **Brains**: - ["Artificial Neural Nets Finally Yield Clues to How Brains Learn"](https://www.quantamagazine.org/artificial-neural-nets-finally-yield-clues-to-how-brains-learn-20210218/ "The learning algorithm that enables the runaway success of deep neural networks doesn’t work in biological brains, but researchers are finding alternatives that could"); [Whittington & Bogacz 2019](https://www.sciencedirect.com/science/article/pii/S1364661319300129 "Theories of Error Back-Propagation in the Brain") (short overview of biologically-plausible backprop: feedback alignment, target propagation, predictive coding, & attentional feedback; also of recent interest, [VS-ML](https://arxiv.org/abs/2012.14905#schmidhuber "'VS-ML: Meta Learning Backpropagation And Improving It', Kirsch & Schmidhuber 2021"); given their increasing success in training while respecting more biological constraints, the increasing power of backprop-trained ANNs and the neurological success of ANNs in predicting & imitating brain signals, it is increasingly clear that brains *really do* do backprop in some sense) - ["NSD: A massive 7-tesla fMRI dataset to bridge cognitive and computational neuroscience"](https://www.biorxiv.org/content/10.1101/2021.02.22.432340.full "'A massive 7T fMRI dataset to bridge cognitive and computational neuroscience', Allen et al 2021"), Jean et al 2021 ("...The availability of NSD thus opens the door to using brain activity to directly guide the optimization of deep neural networks.") - ["Brain2Pix: Fully convolutional naturalistic video reconstruction from brain activity"](https://www.biorxiv.org/content/10.1101/2021.02.02.429430.full), Le et al 2021 (reconstructing [_Dr. Who_](https://www.biorxiv.org/content/10.1101/687681.full "'A large single-participant fMRI dataset for probing brain responses to naturalistic stimuli in space and time', Seeliger et al 2019")) - ["High-performance brain-to-text communication via imagined handwriting"](https://www.biorxiv.org/content/10.1101/2020.07.01.183384.full), Willett et al 2020 - ["Brain-computer interface for generating personally attractive images"](/doc/reinforcement-learning/preference-learning/2021-spape.pdf), Spape et al 2021 (simple EEG-based optimization of ProGAN faces; many ways to improve this...) [Matters Of Scale](https://www.reddit.com/r/mlscaling/ "'ML Scaling subreddit', Branwen 2020"): - ["Scaling Laws for Transfer"](https://arxiv.org/abs/2102.01293#openai), Hernandez et al 2021 ("We find that pre-training effectively multiplies the fine-tuning dataset size"; a shot across the bow of anyone floating on a proprietary-dataset moat: large models can drop data requirements by orders of magnitude overnight, even surpassing you) - ["ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision"](https://arxiv.org/abs/2102.05918#google), Jia et al 2021 (see also [CC-12M](https://arxiv.org/abs/2102.08981#google "'Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts', Changpinyo et al 2021"); [CLIP](https://openai.com/research/clip "'CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and GPT-3', Radford et al 2021")-like w/EfficientNet trained on 1.8 billion images on a TPUv3-1024---[DM](https://arxiv.org/abs/2102.00529#deepmind "'Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers', Hendricks et al 2021") argues that fancier cross-modal Transformers are better, nevertheless, ['TPUs go brrr'](http://www.incompleteideas.net/IncIdeas/BitterLesson.html "'The Bitter Lesson', Sutton 2019"). Given DALL·E 1, CLIP, ALIGN, [VDVAE](https://arxiv.org/abs/2011.10650#openai "'VDVAE: Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images', Child 2020"), [CW-VAE](https://arxiv.org/abs/2102.09532 "'Clockwork Variational Autoencoders', Saxena et al 2021"), [AIPO](https://arxiv.org/abs/2102.12037 "'AIPO: Image Completion via Inference in Deep Generative Models', Harvey et al 2021"), [DCTransformer](https://arxiv.org/abs/2103.03841#deepmind "'Generating Images with Sparse Representations', Nash et al 2021") neural radiance fields et al, are GANs already dead, and just don't realize it yet? Or at least soon to be relegated to only DRL-like uses as a final finetuning phase to sharpen up a self-supervised model?); ["WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training"](https://arxiv.org/abs/2103.06561), Huo et al 2021 - ["DALL·E 1: Zero-Shot Text-to-Image Generation"](https://arxiv.org/abs/2102.12092#openai "'Zero-Shot Text-to-Image Generation', Ramesh et al 2021"), Ramesh et al 2021 ([original blog](https://openai.com/research/dall-e "'DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E 1 that creates images from text captions for a wide range of concepts expressible in natural language', Ramesh et al 2021"){#ramesh-blog}); ["M6: A Chinese Multimodal Pretrainer"](https://arxiv.org/abs/2103.00823#alibaba), Lin et al 2021 (Chinese DALL·E 1: 1.9TB images/0.29TB text for 10b-parameter dense/100b-parameter MoE Transformer; shockingly fast Chinese replication of DALL·E 1/CLIP) - ["Explaining Neural Scaling Laws"](https://arxiv.org/abs/2102.06701#deepmind), Bahri et al 2021/["Learning Curve Theory"](https://arxiv.org/abs/2102.04074#deepmind), Hutter 2021 ([Rohin Shah commentary](https://www.lesswrong.com/posts/Yt5wAXMc7D2zLpQqx/an-140-theoretical-models-that-predict-scaling-laws#HIGHLIGHTS); more on the manifold hypothesis) ## Genetics Everything Is Heritable: - ["Phenotypic covariance across the entire spectrum of relatedness for 86 billion pairs of individuals"](https://www.nature.com/articles/s41467-021-21283-4), Kemper et al 2021 - ["Genetic variation, brain, and intelligence differences"](https://www.nature.com/articles/s41380-021-01027-y), Deary et al 2021 - ["Pathfinder: A gamified measure to integrate general cognitive ability into the biological, medical and behavioural sciences"](https://www.biorxiv.org/content/10.1101/2021.02.10.430571.full "‘Pathfinder: A gamified measure to integrate general cognitive ability into the biological, medical and behavioral sciences’, Malanchini et al 2021"), Malanchini et al 2021 (not the focus, but the IQ PGS is a slight improvement over [Allegrini et al 2018](https://www.biorxiv.org/content/10.1101/418210.full "Genomic prediction of cognitive traits in childhood and adolescence") due to less phenotype measurement error?) - ["Polygenic burden has broader impact on health, cognition, and socioeconomic outcomes than most rare and high-risk copy number variants"](https://www.nature.com/articles/s41380-021-01026-z), Saarentaus et al 2021 - [On candidate-genes & COMT](https://www.scielo.br/j/rbp/a/fCXVCnz7PGRpbwNgX6DkJwC/?format=pdf "'Ditching candidate gene association studies: lessons from psychiatric genetics', Duarte et al 2021") Recent Evolution: - ["Million-Year-Old DNA Rewrites the Mammoth Family Tree: Genomic data---the oldest ever recovered from a fossil---reveals the origin and evolution of the Columbian mammoth"](https://www.nytimes.com/2021/02/17/science/DNA-mammoth.html) - ["Kin selection explains the evolution of cooperation in the gut microbiota"](https://www.pnas.org/doi/10.1073/pnas.2016046118), Simonet & McNally 2021 Engineering: - [First Black-Footed Ferret cloned](https://www.nytimes.com/2021/02/18/science/black-footed-ferret-clone.html "Meet Elizabeth Ann, the First Cloned Black-Footed Ferret: Her birth represents the first cloning of an endangered species native to North America, and may bring needed genetic diversity to the species") ## Statistics/Meta-Science - ["Lessons from Gerolamo Cardano's _The Book of My Life_"](https://www.lesswrong.com/posts/9YDk52NPrfq7nqLvd/lessons-from-the-book-of-my-life) (progress studies; see also [Newton's anthropic argument](/newton "'Newton’s System of the World and Comets', Branwen 2016"), [Bakewell & inventing progress](/review/bakewell "'Origins of Innovation: Bakewell & Breeding', Branwen 2018"), [_The Autobiography of Benvenuto Cellini_](/review/book#the-autobiography-of-benvenuto-cellini-cellini-1999)) - ["How Many Microcovids Would You Spend on a Burrito?"](https://www.wired.com/story/group-house-covid-risk-points/) (on the [microCOVID Project Calculator](https://www.microcovid.org/)) - [On Piffles](/note/lion#piffles){.include} - ["Artifact and Recording Concepts in EEG"](/doc/statistics/bias/2011-tatum.pdf), Tatum et al 2011 (on the [EEG](https://en.wikipedia.org/wiki/Electroencephalography) signals of [Jell-O](https://en.wikipedia.org/wiki/Jell-O), or, the importance of [negative controls](https://en.wikipedia.org/wiki/Scientific_control#Negative)) ## Politics/Religion - [Fads](/note/fashion "'Fashion Cycles', Branwen 2021"): ["The Logic of Fashion Cycles"](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0032541), Acerbi et al 2012; ["Fashion and art cycles are driven by counter-dominance signals of elite competition: quantitative evidence from music styles"](https://royalsocietypublishing.org/doi/10.1098/rsif.2018.0731), Klimek et al 2019; ["The hipster effect: When anti-conformists all look the same"](https://arxiv.org/abs/1410.8001), Touboul 2019; ["Right Is The New Left"](https://slatestarcodex.com/2014/04/22/right-is-the-new-left/), Scott Alexander (see also [Han et al 2010](/doc/culture/2010-han.pdf "Signaling Status with Luxury Goods: The Role of Brand Prominence"), [Downs 1972](/doc/sociology/1972-downs.pdf "Up and down with ecology---the 'issue-attention cycle'")/[Gupta & Jenkins-Smith 2015](/doc/sociology/2015-gupta.pdf "On Anthony Downs's 'Up and Down with Ecology: The "Issue-Attention" Cycle'"), [Lorenz-Spreen et al 2019](https://www.nature.com/articles/s41467-019-09311-w "Accelerating dynamics of collective attention")/[Candia et al 2019](/doc/culture/2019-candia.pdf "The universal decay of collective memory and attention"), [Loury 1994](/doc/sociology/preference-falsification/1994-loury.pdf "Self-Censorship in Public Discourse: A Theory of 'Political Correctness' and Related Phenomena")) - ["What can we learn from the lunar pandemic that never was?"](https://aeon.co/essays/what-can-we-learn-from-the-lunar-pandemic-that-never-was) (NASA's lunar quarantine was a sham intended to mollify the public as they covered up repeated major failures & lab leaks both before & after---had there been any dangerous lunar organisms, they would have escaped easily) - [MrBeast](https://en.wikipedia.org/wiki/MrBeast) (the new aristocracy [of](https://www.nytimes.com/2023/06/12/magazine/mrbeast-youtube.html) [prestige](https://meltingasphalt.com/social-status-down-the-rabbit-hole/)? Borrowed plumage, perhaps, but effective...) - ["Russia’s new Lysenkoism"](https://www.cell.com/current-biology/fulltext/S0960-9822\(17\)30949-1), Kolchinsky et al 2017 ## Psychology/Biology - ["Lessons from the host defences of bats, a unique viral reservoir"](/doc/biology/2020-irving.pdf "'Lessons from the host defences of bats, an unique viral reservoir', Irving et al 2021"), Irving et al 2021 ([bat-borne viruses](https://en.wikipedia.org/wiki/Bat_virome); previously, [Trevor Klee](https://get21stnight.com/2020/03/30/why-do-we-keep-getting-diseases-from-bats/ "'Why do human beings keep getting diseases from bats?', Klee 2020")) - ["Beneficial & Detrimental Effects of Reactive Oxygen Species on Lifespan: A Comprehensive Review of Comparative & Experimental Studies"](https://www.frontiersin.org/articles/10.3389/fcell.2021.628157/full "'Beneficial and Detrimental Effects of Reactive Oxygen Species on Lifespan: A Comprehensive Review of Comparative and Experimental Studies', Shields et al 2021"), Shields et al 2021 (antioxidants still aren't the fountain of youth, and may be harmful; animal studies still frequently inconsistent) - ["Positive expectations predict improved mental-health outcomes linked to psychedelic microdosing"](https://www.nature.com/articles/s41598-021-81446-7), Kaertner et al 2021 (placebo) - ["The Effects of Fluoride in Drinking Water"](/doc/iq/2021-aggeborn.pdf), Aggeborn & Öhman 2021 - ["Sleep & Sex: What Can Go Wrong? A Review of the Literature on Sleep Related Disorders and Abnormal Sexual Behaviors & Experiences"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1978350/ "'Sleep and sex: what can go wrong? A review of the literature on sleep related disorders and abnormal sexual behaviors and experiences', Schenck et al 2007"), Schenck et al 2007 ### Semaglutide - [WP](https://en.wikipedia.org/wiki/Semaglutide): ["Once-Weekly Semaglutide in Adults with Overweight or Obesity"](/doc/longevity/glp/semaglutide/2021-wilding.pdf), Wilding et al 2021; ["Effect of Subcutaneous Semaglutide vs Placebo as an Adjunct to Intensive Behavioral Therapy on Body Weight in Adults With Overweight or Obesity: The STEP 3 Randomized Clinical Trial"](/doc/longevity/glp/semaglutide/2021-wadden.pdf), Wadden et al 2021 A longer-acting version of the insulin/appetite peptide [liraglutide](https://en.wikipedia.org/wiki/Liraglutide), semaglutide greatly reduces weight, fat, blood sugar, cholesterol etc, with an [upcoming oral version](https://link.springer.com/article/10.1007/s40262-018-0728-4 "'Safety and pharmacokinetics of single and multiple ascending doses of the novel oral human GLP-1 analogue, oral semaglutide, in healthy subjects and subjects with type 2 diabetes', Granhall et al 2019"); background: [Kushner et al 2020](/doc/longevity/glp/semaglutide/2020-kushner.pdf "Semaglutide 2.4 mg for the Treatment of Obesity: Key Elements of the STEP Trials 1 to 5"), [Aroda et al 2019](/doc/longevity/glp/semaglutide/2019-aroda.pdf "Comparative efficacy, safety, and cardiovascular outcomes with once-weekly subcutaneous semaglutide in the treatment of type 2 diabetes: Insights from the SUSTAIN 1--7 trials"), [Nauck & Meier 2019](/doc/longevity/glp/semaglutide/2019-nauck.pdf "Management Of Endocrine Disease: Are all GLP-1 agonists equal in the treatment of type 2 diabetes?"), [O'Neil et al 2018](/doc/longevity/glp/semaglutide/2018-oneil.pdf "Efficacy and safety of semaglutide compared with liraglutide and placebo for weight loss in patients with obesity: a randomized, double-blind, placebo and active controlled, dose-ranging, phase 2 trial"), [Blundell et al 2017](/doc/longevity/glp/semaglutide/2017-blundell.pdf "Effects of once-weekly semaglutide on appetite, energy intake, control of eating, food preference and body weight in subjects with obesity"), [Nauck et al 2016](/doc/longevity/glp/semaglutide/2016-nauck.pdf "A Phase 2, Randomized, Dose-Finding Study of the Novel Once-Weekly Human GLP-1 Analog, Semaglutide, Compared With Placebo and Open-Label Liraglutide in Patients With Type 2 Diabetes"), [Lau et al 2015](/doc/longevity/glp/semaglutide/2015-lau.pdf "Discovery of the Once-Weekly Glucagon-Like Peptide-1 (GLP-1) Analogue Semaglutide"). Quick-fixes like semaglutide may be our only hope, however unvirtuous they seem, because [society is fixed but biology mutable](https://slatestarcodex.com/2014/09/10/society-is-fixed-biology-is-mutable/ "Society Is Fixed, Biology Is Mutable"). ## Technology - [New X-Prize: \$100m in prizes for Carbon Removal](https://www.xprize.org/prizes/carbonremoval) - [Wringing gauge blocks](https://en.wikipedia.org/wiki/Gauge_block) ("With their precisely-flat metal faces, gauge blocks can be stuck together non-magnetically via a process calling 'wringing', requiring substantial effort to separate. Scientists are still uncertain exactly how wringing works.") - [Armored train](https://en.wikipedia.org/wiki/Armoured_train) ## Economics - ["Why did renewables become so cheap so fast? And what can we do to use this global opportunity for green growth?"](https://ourworldindata.org/cheap-renewables-growth), Max Roser (specifically, why such an extreme [experience curve](https://en.wikipedia.org/wiki/Experience_curve_effects)?) - ["IQ, trading behavior, and performance"](/doc/iq/ses/2012-grinblatt.pdf), Grinblatt et al 2012; ["Genetic Endowments and Wealth Inequality"](/doc/economics/2020-barth.pdf), Barth et al 2020 (why, despite notorious setbacks, did Isaac Newton & LTCM's founders die wealthy? Why, in general, are more intelligent people so much better investors? 'The indifference of the indicator': it's not one thing, it's everything---more intelligent people have lower discount rates, save more for longer & are less risk-averse, more accurately predict future growth or inflation, are more likely to participate in +EV opportunities like the stock market, to use low-fee rather than high-fee (and thus, underperforming) mutual funds, succumb less to biases like herding as they trade better & at better times, trade less, and harvest losses more efficiently when trading poorly.) ## Philosophy - [**Are ethics experts more ethical?**](/doc/philosophy/ethics/ethicists/index) ["The Behavior of Ethicists"](/doc/philosophy/ethics/ethicists/2016-schwitzgebel.pdf), Schwitzgebel & Rust 2016 (most recently: ["The moral behavior of ethics professors: A replication-extension in German-speaking countries"](/doc/philosophy/ethics/ethicists/2019-schonegger.pdf), Schönegger et al 2019; given moral licensing & activism, perhaps we should be surprised we don't hear about more ethicists doing things like trying to dox reviewers, posting enemy lists, or dumping files to leak. "Woe to you Pharisees!") - ["Meta-analysis on belief in free will manipulations"](https://osf.io/preprints/psyarxiv/quwgr), Genschow et al 2021 (another noble lie turns out to be ignoble) - [Gricean maxims of communication](https://en.wikipedia.org/wiki/Cooperative_principle) ## Fiction - [_Bunnies & Burrows_](https://en.wikipedia.org/wiki/Bunnies_%26_Burrows) ## Miscellaneous - ["Caesar Lives"](/doc/history/1995-pop.pdf), [Iggy Pop](https://en.wikipedia.org/wiki/Iggy_Pop) 1995 (on [Gibbon](https://en.wikipedia.org/wiki/The_History_of_the_Decline_and_Fall_of_the_Roman_Empire)) - [Mad honey](https://en.wikipedia.org/wiki/Grayanotoxin#Mad_honey_intoxication) - [Imperial Court System](https://en.wikipedia.org/wiki/Imperial_Court_System)