-
‘stylometry’ tag
-
‘Sydney (AI)’ tag
-
‘dark knowledge (human)’ tag
-
‘Decision Transformer’ tag
-
Thoughts while watching myself be automated
-
Investigating the Ability of LLMs to Recognize Their Own Writing
-
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
-
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
-
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
-
Designing a Dashboard for Transparency and Control of Conversational AI
-
LLM Evaluators Recognize and Favor Their Own Generations
-
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
-
Taken out of context: On measuring situational awareness in LLMs
-
Truesight
-
0c3d40875321882d1663ab5b1b018f3fcd9fac8f.html
-
Situational Awareness and Out-Of-Context Reasoning § GPT-4-Base Has Non-Zero Longform Performance
-
Situational Awareness in Large Language Models
-
4eff43f02f9323a2b2a36c62661361cfab25b9e8.html
-
Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
-
ce9c8f71ad54707afd165ee5607750648a998a5a.html
-
Language Models Model Us
-
The Case for More Ambitious Language Model Evals
-
a1db1647e9173aaacd1968b6f0fdd0b4eecc578a.html
-
The Case for More Ambitious Language Model Evals
-
1241c140cfbe7e7f2478a11b1d7413c09055724c.html
-
Early Situational Awareness and Its Implications, a Story
-
20ea9a879c0915ecfa2f2f87dba168dc160967cb.html
-
https://x.com/AndyAyrey/status/1810869652484149486
-
https://x.com/AstronautSwing/status/1819902419272171583
-
https://x.com/Sauers_/status/1850678934997754127
-
https://x.com/doomslide/status/1830149217521672373
-
https://x.com/jd_pressman/status/1808398225260569016
-
https://x.com/repligate/status/1806993408818299166
-
https://x.com/repligate/status/1808396202146136099
-
https://x.com/repligate/status/1828266415851208803
-
https://x.com/sharifshameem/status/1851059380730613776
-
https://x.com/venturetwins/status/1822682396090937538
-
https://x.com/voooooogel/status/1830797676243492947
-
Thoughts while watching myself be automated
-
https%253A%252F%252Fdynomight.net%252Fautomated%252F.html
-
Investigating the Ability of LLMs to Recognize Their Own Writing
-
https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FADrTuuus6JsQr5CSi%252Finvestigating-the-ability-of-llms-to-recognize-their-own.html
-
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
-
Owain Evans, AI Alignment Researcher
-
https%253A%252F%252Farxiv.org%252Fabs%252F2407.04694.html
-
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
-
Sam Bowman
-
https%253A%252F%252Farxiv.org%252Fabs%252F2407.04108.html
-
Designing a Dashboard for Transparency and Control of Conversational AI
-
https%253A%252F%252Farxiv.org%252Fabs%252F2406.07882.html
-
LLM Evaluators Recognize and Favor Their Own Generations
-
Sam Bowman
-
Shi Feng
-
https%253A%252F%252Farxiv.org%252Fabs%252F2404.13076.html
-
Taken out of context: On measuring situational awareness in LLMs
-
Owain Evans, AI Alignment Researcher
-
https%253A%252F%252Farxiv.org%252Fabs%252F2309.00667.html
-