Bibliography (109):

  1. GPT-2 Neural Network Poetry

  2. GPT-2 Folk Music

  3. RNN Metadata for Mimicking Author Style

  4. CTRL: A Conditional Transformer Language Model For Controllable Generation

  5. P≟NP § AI

  6. DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

  7. Reward learning from human preferences and demonstrations in Atari

  8. Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

  9. Learning Human Objectives by Evaluating Hypothetical Behavior

  10. Synthesizing Programs for Images using Reinforced Adversarial Learning

  11. Scaling data-driven robotics with reward sketching and batch reinforcement learning

  12. Deep reinforcement learning from human preferences

  13. Homepage of Paul F. Christiano

  14. Learning through human feedback [blog]

  15. Learning from Human Preferences

  16. Fine-Tuning Language Models from Human Preferences

  17. Fine-Tuning GPT-2 from Human Preferences

  18. lm-human-preferences

  19. Learning to summarize from human feedback

  20. Reducing Non-Normative Text Generation from Language Models

  21. Learning Norms from Stories: A Prior for Value Aligned Agents

  22. The Curious Case of Neural Text Degeneration

  23. Neural Text Generation with Unlikelihood Training

  24. gsutil config: Obtain credentials and create configuration file

  25. GPT-2 Folk Music § Spaceless Model

  26. The abc music standard 2.1: §3.1.1: X:—reference number

  27. 2019-12-22-gpt2-preferencelearning-gwern-abcmusic.patch

  28. Scale: The Data Platform for AI; High quality training and validation data for AI applications

  29. 2019-12-21-gwern-gpt2-preferencelearning-abc-combinedmodel-divergence.png

  30. 2020-01-15-gwern-gpt2-preferencelearning-abc-combinedmodel-klregularized-finalrun.png

  31. 2020-01-26-gwern-gpt2-preferencelearning-datacode.tar.xz

  32. 2020-01-15-gwern-gpt2-preference-learning-abccombined-23-20200113.tar.xz

  33. Strange Planet (Instagram)

  34. The Power of Twins: The Scottish Milk Experiment

  35. Language Generation with Recurrent Generative Adversarial Networks without Pre-training

  36. GPT-2 Neural Network Poetry § Cleaning Project Gutenberg & Contemporary Poetry

  37. AI Dungeon 2

  38. https://x.com/nickwalton00/status/1221836962396426240

  39. AI Dungeon 2 Public Disclosure Vulnerability Report—GraphQL Unpublished Adventure Data Leak

  40. This Waifu Does Not Exist § Results

  41. This Person Does Not Exist

  42. Waifu Labs

  43. https://x.com/SizigiStudios/status/1221982089932763136

  44. Artbreeder

  45. https://x.com/OpenAI/status/1120421259274334209

  46. MuseNet: a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles

  47. https://x.com/OpenAI/status/1121937897869864960

  48. Fine-Tuning Language Models from Human Preferences

  49. Making Anime Faces With StyleGAN § Reversing StyleGAN To Control & Modify Images

  50. Plug and Play Language Models: A Simple Approach to Controlled Text Generation

  51. Controlling Text Generation with Plug and Play Language Models

  52. What does BERT dream of? A visual investigation of nightmares in Sesame Street

  53. Transformers As Variational Autoencoders

  54. Transformer-VAE for Program Synthesis

  55. https://github.com/sanjeevanahilan/nanoChatGPT

  56. Deep reinforcement learning from human preferences § Appendix A.2: Atari

  57. Stochastic Optimization of Sorting Networks via Continuous Relaxations

  58. Fast Differentiable Sorting and Ranking

  59. PiRank: Learning To Rank via Differentiable Sorting

  60. Connecting Generative Adversarial Networks and Actor-Critic Methods

  61. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models

  62. Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

  63. NoGAN: Decrappification, DeOldification, and Super Resolution

  64. Self-Play Learning Without a Reward Metric

  65. Resorting Media Ratings

  66. Adversarial Examples Are Not Bugs, They Are Features

  67. BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis § 4.2 Characterizing Instability: The Discriminator

  68. Making Anime Faces With StyleGAN § Discriminator Ranking: Using a Trained Discriminator to Rank and Clean Data

  69. Self-Blinded Mineral Water Taste Test

  70. GPT-3 Creative Fiction § Prompts As Programming

  71. Software 2.0. I Sometimes See People Refer to Neural

  72. Decision Transformer: Reinforcement Learning via Sequence Modeling

  73. scaling-hypothesis#blessings-of-scale

    [Transclude the forward-link's context]

  74. Choose-Your-Own-Adventure AI Dungeon Games

  75. https://x.com/AstraliteHeart

  76. Measuring the Intrinsic Dimension of Objective Landscapes

  77. GPT-J-6B: 6B JAX-Based Transformer

  78. Surprisingly Turing-Complete

  79. https://lvwerra.github.io/trl/

  80. Huggingface/trl: Train Transformer Language Models With Reinforcement Learning

  81. Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment

  82. Controllable Neural Text Generation

  83. This Article Provides an Overview of Recent Methods to Fine-Tune Large Pre-Trained Language Models

  84. Prefix-Tuning: Optimizing Continuous Prompts for Generation

  85. Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

  86. Gradient-based Adversarial Attacks against Text Transformers