Bibliography:

  1. ​ GPT-2 Neural Network Poetry

  2. ​ GPT-2 Folk Music

  3. ​ RNN Metadata for Mimicking Author Style

  4. ​ CTRL: A Conditional Transformer Language Model For Controllable Generation

  5. ​ Pβ‰ŸNP Β§ AI

  6. ​ DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

  7. ​ Reward learning from human preferences and demonstrations in Atari

  8. ​ Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

  9. ​ Learning Human Objectives by Evaluating Hypothetical Behavior

  10. ​ Synthesizing Programs for Images using Reinforced Adversarial Learning

  11. ​ Scaling data-driven robotics with reward sketching and batch reinforcement learning

  12. ​ Deep reinforcement learning from human preferences

  13. ​ Homepage of Paul F. Christiano

  14. ​ Learning through human feedback [blog]

  15. ​ Learning from Human Preferences

  16. ​ Fine-Tuning Language Models from Human Preferences

  17. ​ Fine-Tuning GPT-2 from Human Preferences

  18. ​ lm-human-preferences

  19. ​ Learning to summarize from human feedback

  20. ​ Reducing Non-Normative Text Generation from Language Models

  21. ​ Learning Norms from Stories: A Prior for Value Aligned Agents

  22. ​ The Curious Case of Neural Text Degeneration

  23. ​ Neural Text Generation with Unlikelihood Training

  24. ​ gsutil config: Obtain credentials and create configuration file

  25. ​ GPT-2 Folk Music Β§ Spaceless Model

  26. ​ The abc music standard 2.1: Β§3.1.1: X:β€”reference number

  27. ​ 2019-12-22-gpt2-preferencelearning-gwern-abcmusic.patch

  28. ​ Scale: The Data Platform for AI; High quality training and validation data for AI applications

  29. ​ 2019-12-21-gwern-gpt2-preferencelearning-abc-combinedmodel-divergence.png

  30. ​ 2020-01-15-gwern-gpt2-preferencelearning-abc-combinedmodel-klregularized-finalrun.png

  31. ​ 2020-01-26-gwern-gpt2-preferencelearning-datacode.tar.xz

  32. ​ https://mega.nz/#!vboDEAxb!l4V1LR10bsMl0qR71umYgiFwoGccoZlyntGZrrcl1wI

  33. ​ Strange Planet (Instagram)

  34. ​ The Power of Twins: The Scottish Milk Experiment

  35. ​ Language Generation with Recurrent Generative Adversarial Networks without Pre-training

  36. ​ GPT-2 Neural Network Poetry Β§ Cleaning Project Gutenberg & Contemporary Poetry

  37. ​ $2020

  38. ​ AI Dungeon 2

  39. ​ https://x.com/nickwalton00/status/1221836962396426240

  40. ​ AI Dungeon Public Disclosure Vulnerability Reportβ€”GraphQL Unpublished Adventure Data Leak

  41. ​ This Waifu Does Not Exist Β§ Results

  42. ​ This Person Does Not Exist

  43. ​ Waifu Labs

  44. ​ https://x.com/SizigiStudios/status/1221982089932763136

  45. ​ Artbreeder

  46. ​ https://x.com/OpenAI/status/1120421259274334209

  47. ​ MuseNet: a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles

  48. ​ https://x.com/OpenAI/status/1121937897869864960

  49. ​ Fine-Tuning Language Models from Human Preferences

  50. ​ Making Anime Faces With StyleGAN Β§ Reversing StyleGAN To Control & Modify Images

  51. ​ Plug and Play Language Models: A Simple Approach to Controlled Text Generation

  52. ​ Controlling Text Generation with Plug and Play Language Models

  53. ​ What does BERT dream of? A visual investigation of nightmares in Sesame Street

  54. ​ Transformers As Variational Autoencoders

  55. ​ Transformer-VAE for Program Synthesis

  56. ​ https://github.com/sanjeevanahilan/nanoChatGPT

  57. ​ Deep reinforcement learning from human preferences Β§ Appendix A.2: Atari

  58. ​ Stochastic Optimization of Sorting Networks via Continuous Relaxations

  59. ​ Fast Differentiable Sorting and Ranking

  60. ​ PiRank: Learning To Rank via Differentiable Sorting

  61. ​ Connecting Generative Adversarial Networks and Actor-Critic Methods

  62. ​ A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models

  63. ​ Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

  64. ​ NoGAN: Decrappification, DeOldification, and Super Resolution

  65. ​ Self-Play Learning Without a Reward Metric

  66. ​ Resorting Media Ratings

  67. ​ Adversarial Examples Are Not Bugs, They Are Features

  68. ​ BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis Β§ 4.2 Characterizing Instability: The Discriminator

  69. ​ Making Anime Faces With StyleGAN Β§ Discriminator Ranking: Using a Trained Discriminator to Rank and Clean Data

  70. ​ Self-Blinded Mineral Water Taste Test

  71. ​ GPT-3 Creative Fiction Β§ Prompts As Programming

  72. ​ Software 2.0. I Sometimes See People Refer to Neural

  73. ​ Decision Transformer: Reinforcement Learning via Sequence Modeling

  74. ​ scaling-hypothesis#blessings-of-scale

    [Transclude the forward-link's context]

  75. ​ Choose-Your-Own-Adventure AI Dungeon Games

  76. ​ https://x.com/AstraliteHeart

  77. ​ Measuring the Intrinsic Dimension of Objective Landscapes

  78. ​ GPT-J-6B: 6B JAX-Based Transformer

  79. ​ Surprisingly Turing-Complete

  80. ​ https://lvwerra.github.io/trl/

  81. ​ Huggingface/trl: Train Transformer Language Models With Reinforcement Learning

  82. ​ Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment

  83. ​ Controllable Neural Text Generation

  84. ​ This Article Provides an Overview of Recent Methods to Fine-Tune Large Pre-Trained Language Models

  85. ​ Prefix-Tuning: Optimizing Continuous Prompts for Generation

  86. ​ Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

  87. ​ Gradient-based Adversarial Attacks against Text Transformers