Bibliography:

  1. ‘GPT’ tag

  2. ‘AI mode collapse’ tag

  3. GPT-3 Nonfiction

  4. How Do You Change a Chatbot’s Mind? When I set out to improve my tainted reputation with chatbots, I discovered a new world of A.I. manipulation

  5. Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

  6. What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

  7. Creativity Has Left the Chat: The Price of Debiasing Language Models

  8. Can Language Models Use Forecasting Strategies?

  9. To Believe or Not to Believe Your LLM

  10. Can Language Models Explain Their Own Classification Behavior?

  11. Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience

  12. Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

  13. Few-Shot Recalibration of Language Models

  14. Do LLMs Know about Hallucination? An Empirical Investigation of LLM’s Hidden States

  15. The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4

  16. I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench

  17. Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

  18. Can AI Assistants Know What They Don’t Know?

  19. Challenges with unsupervised LLM knowledge discovery

  20. Calibrated Language Models Must Hallucinate

  21. R-Tuning: Teaching Large Language Models to Refuse Unknown Questions

  22. Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation

  23. Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament

  24. The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

  25. Representation Engineering: A Top-Down Approach to AI Transparency

  26. How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

  27. Large Language Models Are Not Robust Multiple Choice Selectors

  28. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

  29. Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

  30. How Language Model Hallucinations Can Snowball

  31. Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding

  32. GPT-4 Technical Report § Limitations: Calibration

  33. Toolformer: Language Models Can Teach Themselves to Use Tools

  34. Predicting Consumer Contracts [With GPT-3]

  35. Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards

  36. Can large language models reason about medical questions?

  37. Language Models (Mostly) Know What They Know

  38. Forecasting Future World Events with Neural Networks

  39. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

  40. Teaching Models to Express Their Uncertainty in Words

  41. Co-training Improves Prompt-based Learning for Large Language Models

  42. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts

  43. Calibrate Before Use: Improving Few-Shot Performance of Language Models

  44. Reducing conversational agents’ overconfidence through linguistic calibration

  45. Situational Awareness and Out-Of-Context Reasoning § Biased Coin Task

  46. Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity

  47. 167c6e6e4a143e68b27eb0799ea2622fc079fa47.html

  48. Can AI Outpredict Humans? Results From Metaculus’s Q3 AI Forecasting Benchmark [No]

  49. 42bce519109425c49a8c8d37ba3769f21c551f94.html

  50. Language Models Model Us

  51. GPT-3 Gives Some Interesting True and False Answers to Some Questions. But It’s Important to Note That It Gives opposite Answers Just As Often, I Cheery Picked the Most ‘Sensational’ Ones. Usually It Said the opposite Thing, and It Also Role-Plays Sometimes (eg. As a Spy)

  52. design#future-tag-features

    [Transclude the forward-link's context]

  53. 2024-paruchuri-figure3-comparingrandomnumbergenerationofllmstotargetdistributionsshowingseveremiscalibrationandmodecollapse.png

  54. 2023-openai-figure8-rlhftrainingdestroysgpt4predictioncalibration.png

  55. 2022-zou-table3-debertav3predictioncalibrationerror.png

  56. https://github.com/justinchiu/openlogprobs

  57. https://www.lesswrong.com/posts/CkhJAxHeyFCg2EcET/are-language-models-good-at-making-predictions

  58. https://www.lesswrong.com/posts/iaHk9DMCbrYsKuqgS/simple-distribution-approximation-when-sampled-100-times-can-1

  59. https://www.youtube.com/watch?v=hhiLw5Q_UFg&t=1098s

  60. https://x.com/Scobleizer/status/1706174612621680976

  61. https://x.com/alexalbert__/status/1764722513014329620

  62. https://x.com/goodside/status/1634407841556561922

  63. https://x.com/labenz/status/1628847171855388672

  64. https://x.com/repligate/status/1828266415851208803

  65. https://x.com/sdrinf/status/1629084909422931969

  66. https://x.com/voooooogel/status/1829243294641242528

  67. How Do You Change a Chatbot’s Mind? When I set out to improve my tainted reputation with chatbots, I discovered a new world of A.I. manipulation

  68. https%253A%252F%252Fwww.nytimes.com%252F2024%252F08%252F30%252Ftechnology%252Fai-chatbot-chatgpt-manipulation.html.html

  69. Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

  70. Sam Bowman

  71. https%253A%252F%252Farxiv.org%252Fabs%252F2407.04108.html

  72. Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament

  73. https%253A%252F%252Farxiv.org%252Fabs%252F2310.13014.html

  74. How Language Model Hallucinations Can Snowball

  75. https%253A%252F%252Farxiv.org%252Fabs%252F2305.13534.html

  76. GPT-4 Technical Report § Limitations: Calibration

  77. https%253A%252F%252Farxiv.org%252Fpdf%252F2303.08774%2523page%253D12%2526org%253Dopenai.html

  78. Predicting Consumer Contracts [With GPT-3]

  79. %252Fdoc%252Flaw%252F2022-kolt.pdf.html

  80. Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards

  81. https%253A%252F%252Fpapers.ssrn.com%252Fsol3%252Fpapers.cfm%253Fabstract_id%253D4335945.html

  82. Can large language models reason about medical questions?

  83. https%253A%252F%252Farxiv.org%252Fabs%252F2207.08143.html

  84. Language Models (Mostly) Know What They Know

  85. Saurav Kadavath

  86. About Me

  87. Andy Jones

  88. Sam Bowman

  89. https://jack-clark.net/about/

  90. Sam McCandlish

  91. Jared Kaplan

  92. https%253A%252F%252Farxiv.org%252Fabs%252F2207.05221%2523anthropic.html

  93. Forecasting Future World Events with Neural Networks

  94. Andy Zou

  95. Mantas Mazeika

  96. Jacob Steinhardt

  97. Owain Evans, AI Alignment Researcher

  98. https://people.eecs.berkeley.edu/~hendrycks/

  99. https%253A%252F%252Farxiv.org%252Fabs%252F2206.15474.html

  100. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

  101. About Me

  102. Andrea Santilli

  103. Andy Zou

  104. Barret Zoph

  105. Behnam Neyshabur

  106. Colin Raffel

  107. https://people.eecs.berkeley.edu/~hendrycks/

  108. Daniel Levy

  109. Eric Tang

  110. Hannaneh Hajishirzi—University of Washington

  111. Jacob Hilton's Homepage

  112. Jared Kaplan

  113. Jascha Sohl-Dickstein

  114. Jason Wei

  115. Leo Gao

  116. Luke Metz

  117. Mantas Mazeika

  118. Mohit Bansal

  119. Nikita Nangia

  120. Omer Levy

  121. Owain Evans, AI Alignment Researcher

  122. Percy Liang

  123. Sam Bowman

  124. Stefano Ermon

  125. Stella Biderman

  126. Steven T. Piantadosi

  127. Vedant Misra

  128. https%253A%252F%252Farxiv.org%252Fabs%252F2206.04615.html