Playing around with Claude 2 and uploaded some papers I was looking at today (and an old blog post of mine). A context of 100k tokens is pretty useful (regardless if an LLM truly understands or if it's just a parrot in a Chinese room, this is impressive): Here is a summary comparing the key ideas and contributions of the two papers "Diversify and Disambiguate: Learning From Underspecified Data" by Lee et al. and "Quantification of Uncertainty with Adversarial Models" by Schweighofer et al., as well as the related blog post by Kirsch: Overview of Papers: "Diversify and Disambiguate" (Lee et al.): - Proposes a two-stage framework called DivDis for learning from underspecified data, where multiple viable explanations exist. - Stage 1 is "Diversify": Trains multiple classifier heads to make different predictions on unlabeled target data, while minimizing training error. Achieved by maximizing mutual information between head predictions. - Stage 2 is "Disambiguate": Selects the best head using a small amount of labeled target data. - Shows improved robustness in subpopulation shift settings compared to standard ERM. "Quantification of Uncertainty with Adversarial Models" (Schweighofer et al.): - Proposes QUAM method to estimate epistemic uncertainty by searching for "adversarial models" that have high posterior probability but differ from a reference model. - Adversarial models identify important modes of the posterior missed by methods like deep ensembles. - Uses adversarial models to construct a mixture distribution for variance reduction in MC estimation. - Outperforms methods like deep ensembles in estimating epistemic uncertainty on vision tasks. Connections between Papers: - Both aim to learn diverse sets of models that make different predictions, in order to handle underspecification. - Lee et al. do this by training multiple classifier heads, while Schweighofer et al. find adversarial models. - Lee et al. focus on improving predictive performance, while Schweighofer et al. focus on quantifying uncertainty. - However, Lee et al.'s "disambiguate" stage is similar to using adversarial models for uncertainty quantification. Information-Theoretic Justifications: - Lee et al. justify mutual information maximization between heads as promoting independence of predictions. - Schweighofer et al. justify adversarial models via importance sampling bounds using KL divergence. - Both connect diversity of models to information-theoretic quantities like mutual information and KL divergence. - These justify explicitly encouraging diversity, rather than relying on implicit diversity of separately trained models. Summary of Blog Post by Kirsch: - Suggests encouraging model disagreement on unlabeled pool/target data when training ensembles. - Motivated by making BALD scores more "informative" for active learning. - Discusses connections to Lee et al.'s method of minimizing mutual information between heads. - Proposes directly maximizing approximate BatchBALD scores as a novel objective. - Raises open questions around constraints needed to avoid degenerate solutions. The blog post relates model disagreement to BALD, an information-theoretic active learning metric. It proposes extending both Lee et al. and Schweighofer et al.'s approaches to directly optimize information-theoretic objectives related to model disagreement.

Jul 11, 2023 · 9:37 PM UTC

Replying to @BlackHC
It is an interesting model. I was trying to get it to explain bits back coding and on both tries it started off with something kind of right but nonsensical. But when probed with "are you sure? that doesn't seem right" it arrives at the correct explanation.