 See Also
 Gwern

Links
 “Predicting the Direction of Phenotypic Difference”, Gokhman et al 2024
 “Diffusion Model Alignment Using Direct Preference Optimization”, Wallace et al 2023
 “A General Theoretical Paradigm to Understand Learning from Human Preferences”, Azar et al 2023
 “Direct Preference Optimization (DPO): Your Language Model Is Secretly a Reward Model”, Rafailov et al 2023
 “Reputation Inflation”, Filippas et al 2022
 “Bayesian Inference of the Climbing Grade Scale”, Drummond & Popinga 2021
 “PiRank: Learning To Rank via Differentiable Sorting”, Swezey et al 2020
 “RankSmoothed Pairwise Learning In Perceptual Quality Assessment”, Talebi et al 2020
 “SelfPlay Learning Without a Reward Metric”, Schmidt et al 2019
 “Group Testing: An Information Theory Perspective”, Aldridge et al 2019
 “TopK OffPolicy Correction for a REINFORCE Recommender System”, Chen et al 2018
 “Comparison Based Learning from Weak Oracles”, Kazemi et al 2018
 “OptionGAN: Learning Joint RewardPolicy Options Using Generative Adversarial Inverse Reinforcement Learning”, Henderson et al 2017
 “Analogicalbased Bayesian Optimization”, Le et al 2017
 “Spectral Method and Regularized MLE Are Both Optimal for TopK Ranking”, Chen et al 2017
 “The Competitiveness of Games in Professional Sports Leagues”, Wills 2017
 “Deep Reinforcement Learning from Human Preferences”, Christiano et al 2017
 “PBO: Preferential Bayesian Optimization”, Gonzalez et al 2017
 “DTS: Double Thompson Sampling for Dueling Bandits”, Wu & Liu 2016
 “Just Sort It! A Simple and Effective Approach to Active Preference Learning”, Maystre & Grossglauser 2015
 “On the Complexity of Best Arm Identification in MultiArmed Bandit Models”, Kaufmann et al 2014
 “Bayesian Active Learning for Classification and Preference Learning”, Houlsby et al 2011
 “Case Studies in Bayesian Computation Using INLA”, Martino & Rue 2010
 “Sorting from Noisy Information”, Braverman & Mossel 2009
 “Can People Distinguish Pâté From Dog Food? [preprint]”, Bohannon et al 2009
 “Aggregating Inconsistent Information: Ranking and Clustering”, Ailon et al 2008
 “Pure Exploration for MultiArmed Bandit Problems”, Bubeck et al 2008
 “Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings”, Goldstein et al 2008
 “Noisy Sorting Without Resampling”, Braverman & Mossel 2007
 “Noisy Binary Search and Its Applications”, Karp & Kleinberg 2007
 “Paired Comparison Models for Ranking National Soccer Teams”, Hallinan 2005
 “Bayesian Adaptive Exploration”, Loredo & Chernoff 2003
 “How Dangerous Are Drinking Drivers?”, Levitt & Porter 2001
 “Sympercents: Symmetric Percentage Differences on the 100 Log_{e} Scale Simplify the Presentation of Log Transformed Data”, Cole 2000
 “Born Again Group Testing: Multiaccess Communications”, Wolf 1985
 “The Rating of Chessplayers, Past and Present (Second Edition)”, Elo 1978
 “Inconsistencies in a Schedule of Paired Comparisons”
 “Metacritic Has A (FileDrawer) Problem”
 “Bisection (software Engineering)”
 Sort By Magic
 Miscellaneous
See Also
Gwern
“GPT2 Preference Learning for Music Generation”, Gwern 2019
“Open Questions”, Gwern 2018
“Resorting Media Ratings”, Gwern 2015
Links
“Predicting the Direction of Phenotypic Difference”, Gokhman et al 2024
“Diffusion Model Alignment Using Direct Preference Optimization”, Wallace et al 2023
“Diffusion Model Alignment Using Direct Preference Optimization”
“A General Theoretical Paradigm to Understand Learning from Human Preferences”, Azar et al 2023
“A General Theoretical Paradigm to Understand Learning from Human Preferences”
“Direct Preference Optimization (DPO): Your Language Model Is Secretly a Reward Model”, Rafailov et al 2023
“Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model”
“Reputation Inflation”, Filippas et al 2022
“Bayesian Inference of the Climbing Grade Scale”, Drummond & Popinga 2021
“PiRank: Learning To Rank via Differentiable Sorting”, Swezey et al 2020
“RankSmoothed Pairwise Learning In Perceptual Quality Assessment”, Talebi et al 2020
“RankSmoothed Pairwise Learning In Perceptual Quality Assessment”
“SelfPlay Learning Without a Reward Metric”, Schmidt et al 2019
“Group Testing: An Information Theory Perspective”, Aldridge et al 2019
“TopK OffPolicy Correction for a REINFORCE Recommender System”, Chen et al 2018
“TopK OffPolicy Correction for a REINFORCE Recommender System”
“Comparison Based Learning from Weak Oracles”, Kazemi et al 2018
“OptionGAN: Learning Joint RewardPolicy Options Using Generative Adversarial Inverse Reinforcement Learning”, Henderson et al 2017
“Analogicalbased Bayesian Optimization”, Le et al 2017
“Spectral Method and Regularized MLE Are Both Optimal for TopK Ranking”, Chen et al 2017
“Spectral Method and Regularized MLE Are Both Optimal for TopK Ranking”
“The Competitiveness of Games in Professional Sports Leagues”, Wills 2017
“The competitiveness of games in professional sports leagues”
“Deep Reinforcement Learning from Human Preferences”, Christiano et al 2017
“PBO: Preferential Bayesian Optimization”, Gonzalez et al 2017
“DTS: Double Thompson Sampling for Dueling Bandits”, Wu & Liu 2016
“Just Sort It! A Simple and Effective Approach to Active Preference Learning”, Maystre & Grossglauser 2015
“Just Sort It! A Simple and Effective Approach to Active Preference Learning”
“On the Complexity of Best Arm Identification in MultiArmed Bandit Models”, Kaufmann et al 2014
“On the Complexity of Best Arm Identification in MultiArmed Bandit Models”
“Bayesian Active Learning for Classification and Preference Learning”, Houlsby et al 2011
“Bayesian Active Learning for Classification and Preference Learning”
“Case Studies in Bayesian Computation Using INLA”, Martino & Rue 2010
“Sorting from Noisy Information”, Braverman & Mossel 2009
“Can People Distinguish Pâté From Dog Food? [preprint]”, Bohannon et al 2009
“Aggregating Inconsistent Information: Ranking and Clustering”, Ailon et al 2008
“Aggregating inconsistent information: Ranking and clustering”
“Pure Exploration for MultiArmed Bandit Problems”, Bubeck et al 2008
“Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings”, Goldstein et al 2008
“Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings”
“Noisy Sorting Without Resampling”, Braverman & Mossel 2007
“Noisy Binary Search and Its Applications”, Karp & Kleinberg 2007
“Paired Comparison Models for Ranking National Soccer Teams”, Hallinan 2005
“Paired Comparison Models for Ranking National Soccer Teams”
“Bayesian Adaptive Exploration”, Loredo & Chernoff 2003
“How Dangerous Are Drinking Drivers?”, Levitt & Porter 2001
“Sympercents: Symmetric Percentage Differences on the 100 Log_{e} Scale Simplify the Presentation of Log Transformed Data”, Cole 2000
“Born Again Group Testing: Multiaccess Communications”, Wolf 1985
“The Rating of Chessplayers, Past and Present (Second Edition)”, Elo 1978
“The Rating of Chessplayers, Past and Present (Second Edition)”
“Inconsistencies in a Schedule of Paired Comparisons”
“Metacritic Has A (FileDrawer) Problem”
“Bisection (software Engineering)”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & autolabeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearestneighbor annotations, creating a progression of topics. For more details, see the link.