β CTRL: A Conditional Transformer Language Model For Controllable Generation
β DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
β Reward learning from human preferences and demonstrations in Atari
β Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
β Learning Human Objectives by Evaluating Hypothetical Behavior
β Synthesizing Programs for Images using Reinforced Adversarial Learning
β Scaling data-driven robotics with reward sketching and batch reinforcement learning
β Reducing Non-Normative Text Generation from Language Models
β Learning Norms from Stories: A Prior for Value Aligned Agents
β gsutil config: Obtain credentials and create configuration file
β The abc music standard 2.1: Β§3.1.1: X:βreference number
β Scale: The Data Platform for AI; High quality training and validation data for AI applications
β 2019-12-21-gwern-gpt2-preferencelearning-abc-combinedmodel-divergence.png
β 2020-01-15-gwern-gpt2-preferencelearning-abc-combinedmodel-klregularized-finalrun.png
β 2020-01-26-gwern-gpt2-preferencelearning-datacode.tar.xz
β https://mega.nz/#!vboDEAxb!l4V1LR10bsMl0qR71umYgiFwoGccoZlyntGZrrcl1wI
β Language Generation with Recurrent Generative Adversarial Networks without Pre-training
β GPT-2 Neural Network Poetry Β§ Cleaning Project Gutenberg & Contemporary Poetry
β AI Dungeon Public Disclosure Vulnerability ReportβGraphQL Unpublished Adventure Data Leak
β MuseNet: a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles
β Making Anime Faces With StyleGAN Β§ Reversing StyleGAN To Control & Modify Images
β Plug and Play Language Models: A Simple Approach to Controlled Text Generation
β Controlling Text Generation with Plug and Play Language Models
β What does BERT dream of? A visual investigation of nightmares in Sesame Street
β Deep reinforcement learning from human preferences Β§ Appendix A.2: Atari
β Stochastic Optimization of Sorting Networks via Continuous Relaxations
β Connecting Generative Adversarial Networks and Actor-Critic Methods
β A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
β Improving GAN Training with Probability Ratio Clipping and Sample Reweighting
β NoGAN: Decrappification, DeOldification, and Super Resolution
β BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis Β§ 4.2 Characterizing Instability: The Discriminator
β Making Anime Faces With StyleGAN Β§ Discriminator Ranking: Using a Trained Discriminator to Rank and Clean Data
β Decision Transformer: Reinforcement Learning via Sequence Modeling
β scaling-hypothesis#blessings-of-scale
β Measuring the Intrinsic Dimension of Objective Landscapes
β Huggingface/trl: Train Transformer Language Models With Reinforcement Learning
β Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment
β This Article Provides an Overview of Recent Methods to Fine-Tune Large Pre-Trained Language Models
β Prefix-Tuning: Optimizing Continuous Prompts for Generation
β Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data
β Gradient-based Adversarial Attacks against Text Transformers
Wikipedia Bibliography: