Bibliography:

  1. ‘stylometry’ tag

  2. ‘Sydney (AI)’ tag

  3. ‘dark knowledge (human)’ tag

  4. ‘Decision Transformer’ tag

  5. Thoughts while watching myself be automated

  6. Investigating the Ability of LLMs to Recognize Their Own Writing

  7. Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

  8. Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

  9. Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

  10. Designing a Dashboard for Transparency and Control of Conversational AI

  11. LLM Evaluators Recognize and Favor Their Own Generations

  12. Beyond Memorization: Violating Privacy Via Inference with Large Language Models

  13. Taken out of context: On measuring situational awareness in LLMs

  14. Truesight

  15. 0c3d40875321882d1663ab5b1b018f3fcd9fac8f.html

  16. Situational Awareness and Out-Of-Context Reasoning § GPT-4-Base Has Non-Zero Longform Performance

  17. Situational Awareness in Large Language Models

  18. 4eff43f02f9323a2b2a36c62661361cfab25b9e8.html

  19. Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

  20. ce9c8f71ad54707afd165ee5607750648a998a5a.html

  21. Language Models Model Us

  22. The Case for More Ambitious Language Model Evals

  23. a1db1647e9173aaacd1968b6f0fdd0b4eecc578a.html

  24. The Case for More Ambitious Language Model Evals

  25. 1241c140cfbe7e7f2478a11b1d7413c09055724c.html

  26. Early Situational Awareness and Its Implications, a Story

  27. 20ea9a879c0915ecfa2f2f87dba168dc160967cb.html

  28. https://x.com/AndyAyrey/status/1810869652484149486

  29. https://x.com/AstronautSwing/status/1819902419272171583

  30. https://x.com/Sauers_/status/1850678934997754127

  31. https://x.com/doomslide/status/1830149217521672373

  32. https://x.com/jd_pressman/status/1808398225260569016

  33. https://x.com/repligate/status/1806993408818299166

  34. https://x.com/repligate/status/1808396202146136099

  35. https://x.com/repligate/status/1828266415851208803

  36. https://x.com/sharifshameem/status/1851059380730613776

  37. https://x.com/venturetwins/status/1822682396090937538

  38. https://x.com/voooooogel/status/1830797676243492947

  39. Thoughts while watching myself be automated

  40. https%253A%252F%252Fdynomight.net%252Fautomated%252F.html

  41. Investigating the Ability of LLMs to Recognize Their Own Writing

  42. https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FADrTuuus6JsQr5CSi%252Finvestigating-the-ability-of-llms-to-recognize-their-own.html

  43. Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

  44. Owain Evans, AI Alignment Researcher

  45. https%253A%252F%252Farxiv.org%252Fabs%252F2407.04694.html

  46. Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

  47. Sam Bowman

  48. https%253A%252F%252Farxiv.org%252Fabs%252F2407.04108.html

  49. Designing a Dashboard for Transparency and Control of Conversational AI

  50. https%253A%252F%252Farxiv.org%252Fabs%252F2406.07882.html

  51. LLM Evaluators Recognize and Favor Their Own Generations

  52. Sam Bowman

  53. Shi Feng

  54. https%253A%252F%252Farxiv.org%252Fabs%252F2404.13076.html

  55. Taken out of context: On measuring situational awareness in LLMs

  56. Owain Evans, AI Alignment Researcher

  57. https%253A%252F%252Farxiv.org%252Fabs%252F2309.00667.html

  58. Wikipedia Bibliography:

    1. Daniel Kokotajlo