‘statistical comparison’ directory

See Also
Gwern
Links
Miscellaneous
Bibliography

See Also

Gwern

“Rock-Paper-Scissors Optimality ”, Gwern 2024

Rock-Paper-Scissors Optimality

“Open Questions ”, Gwern 2018

Open Questions

“GPT-2 Preference Learning for Music Generation ”, Gwern 2019

GPT-2 Preference Learning for Music Generation

“Resorting Media Ratings ”, Gwern 2015

Resorting Media Ratings

Links

“The Missing 9: Why Some Movies Have a Hole in Their IMDb Ratings ”, Gros 2025

The Missing 9: Why Some Movies Have a Hole in Their IMDb Ratings

“`fullrank`: An Interactive CLI Tool and Python Library for Bayesian Inference of List Rankings Based on Noisy Comparisons ”, Niederman 2025

fullrank: An interactive CLI tool and Python library for Bayesian inference of list rankings based on noisy comparisons

View HTML:

/doc/www/github.com/627618867d79d6e0a238a87bbdb96d5039f1e8cf.html

“Fullrank: Bayesian Noisy Sorting ”, Niederman 2025

Fullrank: Bayesian Noisy Sorting

View HTML:

/doc/www/www.greaterwrong.com/63d6d48a5539923def5a015308db84424883bda1.html

“Epistemic Calibration and Searching the Space of Truth ”, Lee 2024

Epistemic calibration and searching the space of truth

“Analyzing Poems With LLMs ”, Toper 2024

Analyzing poems with LLMs

“Predicting the Direction of Phenotypic Difference ”, Gokhman et al 2024

Predicting the direction of phenotypic difference

“Diffusion Model Alignment Using Direct Preference Optimization ”, Wallace et al 2023

Diffusion Model Alignment Using Direct Preference Optimization

“A General Theoretical Paradigm to Understand Learning from Human Preferences ”, Azar et al 2023

A General Theoretical Paradigm to Understand Learning from Human Preferences

“On the Optimal Bounds for Noisy Computing ”, Zhu et al 2023

On the Optimal Bounds for Noisy Computing

“Direct Preference Optimization (DPO): Your Language Model Is Secretly a Reward Model ”, Rafailov et al 2023

Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model

“DPO § 6.4: Validating GPT-4 Judgments With Human Judgments ”, Rafailov et al 2023 (page 10)

DPO § 6.4: Validating GPT-4 judgments with human judgments

“Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems ”, Feng et al 2023

Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems

“Reputation Inflation ”, Filippas et al 2022

Reputation Inflation

“Bayesian Inference of the Climbing Grade Scale ”, Drummond & Popinga 2021

Bayesian inference of the climbing grade scale

“PiRank: Learning To Rank via Differentiable Sorting ”, Swezey et al 2020

PiRank: Learning To Rank via Differentiable Sorting

“Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment ”, Talebi et al 2020

Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment

“Self-Play Learning Without a Reward Metric ”, Schmidt et al 2019

Self-Play Learning Without a Reward Metric

“Group Testing: An Information Theory Perspective ”, Aldridge et al 2019

Group Testing: An Information Theory Perspective :

View PDF:

/doc/statistics/order/comparison/2019-aldridge.pdf

“Top-K Off-Policy Correction for a REINFORCE Recommender System ”, Chen et al 2018

Top-K Off-Policy Correction for a REINFORCE Recommender System

“Comparison Based Learning from Weak Oracles ”, Kazemi et al 2018

Comparison Based Learning from Weak Oracles

“OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning ”, Henderson et al 2017

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

“Analogical-Based Bayesian Optimization ”, Le et al 2017

Analogical-based Bayesian Optimization

“Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking ”, Chen et al 2017

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

“The Competitiveness of Games in Professional Sports Leagues ”, Wills 2017

The competitiveness of games in professional sports leagues

“Deep Reinforcement Learning from Human Preferences ”, Christiano et al 2017

Deep reinforcement learning from human preferences

“PBO: Preferential Bayesian Optimization ”, Gonzalez et al 2017

PBO: Preferential Bayesian Optimization

“D-TS: Double Thompson Sampling for Dueling Bandits ”, Wu & Liu 2016

D-TS: Double Thompson Sampling for Dueling Bandits

“We Tested HN Randomization and the Results Were Terrible ”, dang 2015

We tested HN randomization and the results were terrible

View HTML:

/doc/www/news.ycombinator.com/d77e26bd6c3501598ddb8e7d23c599e34b4e8f18.html

“Non-Stochastic Best Arm Identification and Hyperparameter Optimization ”, Jamieson & Talwalkar 2015

Non-stochastic Best Arm Identification and Hyperparameter Optimization

“Just Sort It! A Simple and Effective Approach to Active Preference Learning ”, Maystre & Grossglauser 2015

Just Sort It! A Simple and Effective Approach to Active Preference Learning

“On the Complexity of Best Arm Identification in Multi-Armed Bandit Models ”, Kaufmann et al 2014

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

“Bayesian Active Learning for Classification and Preference Learning ”, Houlsby et al 2011

Bayesian Active Learning for Classification and Preference Learning

“Case Studies in Bayesian Computation Using INLA ”, Martino & Rue 2010

Case studies in Bayesian computation using INLA

“Sorting from Noisy Information ”, Braverman & Mossel 2009

Sorting from Noisy Information

“Can People Distinguish Pâté From Dog Food? [Preprint] ”, Bohannon et al 2009

Can People Distinguish Pâté From Dog Food? [preprint]

“Aggregating Inconsistent Information: Ranking and Clustering ”, Ailon et al 2008

Aggregating inconsistent information: Ranking and clustering

“Pure Exploration for Multi-Armed Bandit Problems ”, Bubeck et al 2008

Pure Exploration for Multi-Armed Bandit Problems

“Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings ”, Goldstein et al 2008

Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings

“Noisy Sorting Without Resampling ”, Braverman & Mossel 2007

Noisy Sorting Without Resampling

“Noisy Binary Search and Its Applications ”, Karp & Kleinberg 2007

Noisy binary search and its applications

“Paired Comparison Models for Ranking National Soccer Teams ”, Hallinan 2005

Paired Comparison Models for Ranking National Soccer Teams

“Bayesian Adaptive Exploration ”, Loredo & Chernoff 2003

Bayesian Adaptive Exploration

“How Dangerous Are Drinking Drivers? ”, Levitt & Porter 2001

How Dangerous Are Drinking Drivers?

“Sympercents: Symmetric Percentage Differences on the 100 Log_e Scale Simplify the Presentation of Log Transformed Data ”, Cole 2000

Sympercents: symmetric percentage differences on the 100 log_e scale simplify the presentation of log transformed data

“Born Again Group Testing: Multiaccess Communications ”, Wolf 1985

Born again group testing: Multiaccess communications :

View PDF:

/doc/statistics/order/comparison/1985-wolf.pdf

“The Analysis of Sequential Experiments With Feedback to Subjects ”, Diaconis & Graham 1981

The Analysis of Sequential Experiments with Feedback to Subjects

“Rating the Ratings: Assessing the Psychometric Quality of Rating Data ”, Saal et al 1980

Rating the ratings: Assessing the psychometric quality of rating data

The Rating of Chessplayers, Past and Present (Second Edition), Elo 1978

The Rating of Chessplayers, Past and Present (Second Edition)

“Optimal Selection Based On Relative Rank (The ‘Secretary Problem’) ”, Chow 1964

Optimal Selection Based On Relative Rank (the ‘Secretary Problem’) :

View PDF:

/doc/www/www2.math.upenn.edu/7b58146dd047c771e9c48520dbaa8d978c61578d.pdf

“Inconsistencies in a Schedule of Paired Comparisons ”

Inconsistencies in a Schedule of Paired Comparisons :

View PDF:

/doc/statistics/order/comparison/1961-slater.pdf

“Metacritic Has A (File-Drawer) Problem ”

Metacritic Has A (File-Drawer) Problem

View External Link:

https://datacolada.org/72

“Valuing Research Works by Eliciting Comparisons from EA Researchers ”

Valuing research works by eliciting comparisons from EA researchers

View External Link:

https://forum.effectivealtruism.org/posts/hrdxf5qdKmCZNWTvs/valuing-research-works-by-eliciting-comparisons-from-ea

“Futurama Theorem ”

Futurama theorem

View HTML:

/doc/www/theinfosphere.org/f1a939193fef59455a7d3f2bafc5fcf4bf9bd5bb.html

“`CodingFont` ”, Typogram 2026

CodingFont

“Getting Things in Order: An Introduction to the R Package `seriation` ”

Getting Things in Order: An Introduction to the R Package seriation

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`phenotype-prediction`

[see previous entry]

[see previous entry]

`exploration-bandits`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`preference-learning`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`rating-comparison`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Wikipedia (6)

Miscellaneous

Bibliography

https://arxiv.org/pdf/2305.18290#page=10: “DPO § 6.4: Validating GPT-4 Judgments With Human Judgments ”, Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn

link-bibliography
1980-saal.pdf: “Rating the Ratings: Assessing the Psychometric Quality of Rating Data ”, Frank E. Saal, Ronald G. Downey, Mary Anne Lahey

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]