‘GPT-3 nonfiction’ directory

See Also
Links
Miscellaneous
Bibliography

See Also

Parent (‘GPT-3’ tag)

Links

“WILLIAM A., a Student, by and through His Parents, E.A. and C.A. v. CLARKSVILLE-MONTGOMERY COUNTY SCHOOL SYSTEM ”, Sutton et al 2025

WILLIAM A., a student, by and through his parents, E.A. and C.A. v. CLARKSVILLE-MONTGOMERY COUNTY SCHOOL SYSTEM

“Winners and Losers of Generative AI: Early Evidence of Shifts in Freelancer Demand ”, Teutloff et al 2025

Winners and losers of generative AI: Early Evidence of Shifts in Freelancer Demand

“An Evaluation Framework for Clinical Use of Large Language Models in Patient Interaction Tasks ”, Johri et al 2025

An evaluation framework for clinical use of large language models in patient interaction tasks

“Inside the OpenAI ChatGPT Launch—And Future ”

Inside the OpenAI ChatGPT launch—and future

“Jailbreaking LLM-Controlled Robots ”, Robey et al 2024

Jailbreaking LLM-Controlled Robots

“Can LLMs Be Scammed? A Baseline Measurement Study ”, Sehwag et al 2024

Can LLMs be Scammed? A Baseline Measurement Study

“The Rise of AI-Generated Content in Wikipedia ”, Brooks et al 2024

The Rise of AI-Generated Content in Wikipedia

“AI Meets the Classroom: When Do Large Language Models Harm Learning? ”, Lehmann et al 2024

AI Meets the Classroom: When Do Large Language Models Harm Learning?

“On Scalable Oversight With Weak LLMs Judging Strong LLMs ”, Kenton et al 2024

On scalable oversight with weak LLMs judging strong LLMs

“APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Liu et al 2024

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

“Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data ”, Treutlein et al 2024

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

“Designing a Dashboard for Transparency and Control of Conversational AI ”, Chen et al 2024

Designing a Dashboard for Transparency and Control of Conversational AI

“Delving into ChatGPT Usage in Academic Writing through Excess Vocabulary ”, Kobak et al 2024

Delving into ChatGPT usage in academic writing through excess vocabulary

“Do Teachers Spot AI? Evaluating the Detectability of AI-Generated Texts among Student Essays ”, Fleckenstein et al 2024

Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays

“LLMs Achieve Adult Human Performance on Higher-Order Theory of Mind Tasks ”, Street et al 2024

LLMs achieve adult human performance on higher-order theory of mind tasks

“Can Language Models Explain Their Own Classification Behavior? ”, Sherburn et al 2024

Can Language Models Explain Their Own Classification Behavior?

“The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions ”, Wallace et al 2024

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization ”, Kim et al 2024

FABLES: Evaluating faithfulness and content selection in book-length summarization

“Vulnerability Detection With Code Language Models: How Far Are We? ”, Ding et al 2024

Vulnerability Detection with Code Language Models: How Far Are We?

“The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, Who Leads Research at the National Security Agency, Says Large Language Models Are Incredibly Useful—And a Bit of a Headache—For America’s Intelligence Machine ”, Knight 2024

The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, who leads research at the National Security Agency, says large language models are incredibly useful—and a bit of a headache—for America’s intelligence machine

“Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews ”, Liang et al 2024

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

“Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap ”, Srivastava et al 2024

Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

“Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs ”, Singh & Strouse 2024

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs

“Who Is AI Replacing? The Impact of Generative AI on Online Freelancing Platforms ”, Demirci et al 2024

Who Is AI Replacing? The Impact of Generative AI on Online Freelancing Platforms

“`ArtPrompt`: ASCII Art-Based Jailbreak Attacks against Aligned LLMs ”, Jiang et al 2024

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

“Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models ”, Lewis & Mitchell 2024

Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

“The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4 ”, Renze & Guven 2024

The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4

“I Think, Therefore I Am: Benchmarking Awareness of Large Language Models Using AwareBench ”, Li et al 2024

I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench

“Does Using ChatGPT Result in Human Cognitive Augmentation? ”, Fulbright & Morrison 2024

Does Using ChatGPT Result in Human Cognitive Augmentation?

“Escalation Risks from Language Models in Military and Diplomatic Decision-Making ”, Rivera et al 2024

Escalation Risks from Language Models in Military and Diplomatic Decision-Making

“A Vision Check-Up for Language Models ”, Sharma et al 2024

A Vision Check-up for Language Models

“Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach ”, Ma et al 2023

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

“TinyGSM: Achieving >80% on GSM8k With Small Language Models ”, Liu et al 2023

TinyGSM: achieving >80% on GSM8k with small language models

“Universal Self-Consistency for Large Language Model Generation ”, Chen et al 2023

Universal Self-Consistency for Large Language Model Generation

“PEARL: Personalizing Large Language Model Writing Assistants With Generation-Calibrated Retrievers ”, Mysore et al 2023

PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

“Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations ”, Hong et al 2023

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

“InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews ”, Wang et al 2023

InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews

“Data Contamination Through the Lens of Time ”, Roberts et al 2023

Data Contamination Through the Lens of Time

“Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams ”, Callanan et al 2023

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

“Large Language Models Can Replicate Cross-Cultural Differences in Personality ”, Niszczota et al 2023

Large language models can replicate cross-cultural differences in personality

“Beyond Memorization: Violating Privacy Via Inference With Large Language Models ”, Staab et al 2023

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

“GeoLLM: Extracting Geospatial Knowledge from Large Language Models ”, Manvi et al 2023

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

“Can a Computer Outfake a Human [Personality]? ”, Phillips & Robie 2023

Can a computer outfake a human [personality]?

“Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models ”, Zhou et al 2023

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

“Using Large Language Models for Qualitative Analysis Can Introduce Serious Bias ”, Ashwin et al 2023

Using Large Language Models for Qualitative Analysis Can Introduce Serious Bias

“MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book ”, Tanzer et al 2023

MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book

“Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve ”, McCoy et al 2023

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

“The Cambridge Law Corpus: A Corpus for Legal AI Research ”, Östling et al 2023

The Cambridge Law Corpus: A Corpus for Legal AI Research

“Assessing the Nature of Large Language Models: A Caution against Anthropocentrism ”, Speed 2023

Assessing the nature of large language models: A caution against anthropocentrism

“A Boy Saw 17 Doctors over 3 Years for Chronic Pain. ChatGPT Found the Diagnosis ”, Holohan 2023

A boy saw 17 doctors over 3 years for chronic pain. ChatGPT found the diagnosis

“Taken out of Context: On Measuring Situational Awareness in LLMs ”, Berglund et al 2023

Taken out of context: On measuring situational awareness in LLMs

“Investigating the Existence of ‘Secret Language’ in Language Models ”, Wang et al 2023

Investigating the Existence of ‘Secret Language’ in Language Models

“Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow ”, Rio-Chanona et al 2023

Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

“Machine-Assisted Social Psychology Hypothesis Generation ”, Banker et al 2023

Machine-Assisted Social Psychology Hypothesis Generation

“Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events ”, Gu et al 2023

Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events

“Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration ”, Wang et al 2023

Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

“Explaining Competitive-Level Programming Solutions Using LLMs ”, Li et al 2023

Explaining Competitive-Level Programming Solutions using LLMs

“Lost in the Middle: How Language Models Use Long Contexts ”, Liu et al 2023

Lost in the Middle: How Language Models Use Long Contexts

“Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models ”, O’Gara 2023

Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

“Language Models Are Weak Learners ”, Manikandan et al 2023

Language models are weak learners

“Understanding Social Reasoning in Language Models With Language Models ”, Gandhi et al 2023

Understanding Social Reasoning in Language Models with Language Models

“Evaluating Superhuman Models With Consistency Checks ”, Fluri et al 2023

Evaluating Superhuman Models with Consistency Checks

“Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks ”, Veselovsky et al 2023

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

“Can Large Language Models Democratize Access to Dual-Use Biotechnology? ”, Soice et al 2023

Can large language models democratize access to dual-use biotechnology?

“Iterative Translation Refinement With Large Language Models ”, Chen et al 2023

Iterative Translation Refinement with Large Language Models

“Don’t Want Students to Rely on ChatGPT? Have Them Use It: It’s Easy to Forget How Little Students and Educators Understand Generative AI’s Flaws. Once They Actually Try It Out, They’ll See That It Can’t Replace Them ”, Howell 2023

Don’t Want Students to Rely on ChatGPT? Have Them Use It: It’s easy to forget how little students and educators understand generative AI’s flaws. Once they actually try it out, they’ll see that it can’t replace them

“The Exciting Potential for ChatGPT in Obstetrics and Gynecology ”, Grünebaum et al 2023

The exciting potential for ChatGPT in obstetrics and gynecology :

View PDF:

/doc/ai/nn/transformer/gpt/3/nonfiction/2023-grunebaum.pdf

“Do GPTs Produce Less Literal Translations? ”, Raunak et al 2023

Do GPTs Produce Less Literal Translations?

“The False Promise of Imitating Proprietary LLMs ”, Gudibande et al 2023

The False Promise of Imitating Proprietary LLMs

“Learning to Generate Novel Scientific Directions With Contextualized Literature-Based Discovery ”, Wang et al 2023

Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery

“How Language Model Hallucinations Can Snowball ”, Zhang et al 2023

How Language Model Hallucinations Can Snowball

“PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits ”, Jiang et al 2023

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

“LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions ”, Wu et al 2023

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

“Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition ”, Muffo et al 2023

Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

“Can Large Language Models Play Text Games Well? Current State-Of-The-Art and Open Questions ”, Tsai et al 2023

Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions

“Generative AI at Work ”, Brynjolfsson et al 2023

Generative AI at Work

“Humans in Humans Out: On GPT Converging Toward Common Sense in Both Success and Failure ”, Koralus & Wang-Maścianica 2023

Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure

“Language Models Can Solve Computer Tasks ”, Kim et al 2023

Language Models can Solve Computer Tasks

“Performance of ChatGPT on Free-Response, Clinical Reasoning Exams ”, Strong et al 2023

Performance of ChatGPT on free-response, clinical reasoning exams

“How Well Do Large Language Models Perform in Arithmetic Tasks? ”, Yuan et al 2023

How well do Large Language Models perform in Arithmetic tasks?

“Larger Language Models Do In-Context Learning Differently ”, Wei et al 2023

Larger language models do in-context learning differently

“Is ChatGPT a General-Purpose Natural Language Processing Task Solver? ”, Qin et al 2023

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

“Predicting Consumer Contracts [With GPT-3] ”, Kolt 2023

Predicting Consumer Contracts [With GPT-3]

“Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5× ”, Pullen 2023

Use GPT-3 incorrectly: reduce costs 40× and increase speed by 5×

“A Judge Just Used ChatGPT to Make a Court Decision: The Case Is the First Time a Court Has Admitted to Using the AI Text Generator’s Answers in a Legal Ruling ”, Rose 2023

A Judge Just Used ChatGPT to Make a Court Decision: The case is the first time a court has admitted to using the AI text generator’s answers in a legal ruling

“Co-Writing With Opinionated Language Models Affects Users’ Views ”, Jakesch et al 2023

Co-Writing with Opinionated Language Models Affects Users’ Views

“The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft ”, Kahn 2023

The inside story of ChatGPT: How OpenAI founder Sam Altman built the world’s hottest technology with billions from Microsoft

“How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection ”, Guo et al 2023

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

“Can GPT-3 Produce New Ideas? Partially Automating Robin Hanson and Others § If You Never Miss a Plane… ”, Sempere 2023

Can GPT-3 produce new ideas? Partially automating Robin Hanson and others § If you never miss a plane…

“How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment ”, Gilson et al 2023

How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

“GPT-3 Takes the Bar Exam ”, II & Katz 2022

GPT-3 Takes the Bar Exam

“Precise Zero-Shot Dense Retrieval without Relevance Labels ”, Gao et al 2022

Precise Zero-Shot Dense Retrieval without Relevance Labels

“Self-Instruct: Aligning Language Models With Self-Generated Instructions ”, Wang et al 2022

Self-Instruct: Aligning Language Models with Self-Generated Instructions

“Emergent Analogical Reasoning in Large Language Models ”, Webb et al 2022

Emergent Analogical Reasoning in Large Language Models

“Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI ”, Wiggers 2022

Harvey, which uses AI to answer legal questions, lands cash from OpenAI

“LMentry: A Language Model Benchmark of Elementary Language Tasks ”, Efrat et al 2022

LMentry: A Language Model Benchmark of Elementary Language Tasks

“Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle) ”, Press et al 2022

Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)

“How Persuasive Is AI-Generated Argumentation? An Analysis of the Quality of an Argumentative Text Produced by the GPT-3 AI Text Generator ”, Hinton & Wagemans 2022

How persuasive is AI-generated argumentation? An analysis of the quality of an argumentative text produced by the GPT-3 AI text generator

“Out of One, Many: Using Language Models to Simulate Human Samples ”, Argyle et al 2022

Out of One, Many: Using Language Models to Simulate Human Samples

“What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification (CuPL) ”, Pratt et al 2022

What does a platypus look like? Generating customized prompts for zero-shot image classification (CuPL)

“Using Large Language Models to Simulate Multiple Humans ”, Aher et al 2022

Using Large Language Models to Simulate Multiple Humans

“Limitations of Language Models in Arithmetic and Symbolic Induction ”, Qian et al 2022

Limitations of Language Models in Arithmetic and Symbolic Induction

“RealTime QA: What’s the Answer Right Now? ”, Kasai et al 2022

RealTime QA: What’s the Answer Right Now?

“GODEL: Large-Scale Pre-Training for Goal-Directed Dialog ”, Peng et al 2022

GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

“Can GPT-3 Write an Academic Paper on Itself, With Minimal Human Input? ”, GPT-3 et al 2022 (page 2)

Can GPT-3 write an academic paper on itself, with minimal human input?

“LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks ”, Dinh et al 2022

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

“NaturalProver: Grounded Mathematical Proof Generation With Language Models ”, Welleck et al 2022

NaturalProver: Grounded Mathematical Proof Generation with Language Models

“OPT: Open Pre-Trained Transformer Language Models ”, Zhang et al 2022

OPT: Open Pre-trained Transformer Language Models

“InstructGPT: Training Language Models to Follow Instructions With Human Feedback ”, Ouyang et al 2022

InstructGPT: Training language models to follow instructions with human feedback

“Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? ”, Min et al 2022

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

“Impact of Pretraining Term Frequencies on Few-Shot Reasoning ”, Razeghi et al 2022

Impact of Pretraining Term Frequencies on Few-Shot Reasoning

“Contracts in the Age of Smart Readers ”, Arbel & Becher 2022

Contracts in the Age of Smart Readers

“Memory-Assisted Prompt Editing to Improve GPT-3 After Deployment ”, Madaan et al 2022

Memory-assisted prompt editing to improve GPT-3 after deployment

“CommonsenseQA 2.0: Exposing the Limits of AI through Gamification ”, Talmor et al 2022

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

“Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution ”, Tu et al 2022

Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution

“What Can a Generative Language Model Answer About a Passage? ”, Summers-Stay et al 2021

What Can a Generative Language Model Answer About a Passage?

“Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets ”, Solaiman & Dennison 2021

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

“Scaling Laws for Autoregressive Generative Modeling ”, Henighan et al 2020

Scaling Laws for Autoregressive Generative Modeling

“GPT-3: Its Nature, Scope, Limits, and Consequences ”, Floridi & Chiriatti 2020

GPT-3: Its Nature, Scope, Limits, and Consequences

“MMLU: Measuring Massive Multitask Language Understanding ”, Hendrycks et al 2020

MMLU: Measuring Massive Multitask Language Understanding

“Forming Extended Analogies With GPT-3 ”, Summers-Stay 2020

Forming Extended Analogies with GPT-3 :

View HTML:

/doc/www/machinamenta.blogspot.com/167409ead0363b3e29ca5540e67b1f6a270eb4eb.html

“My GPT-3 Blog Got 26,000 Visitors in 2 Weeks: The Future of Online Media ”, Porr 2020

My GPT-3 Blog Got 26,000 Visitors in 2 Weeks: The future of online media

“[4chan /vg/ Board Discovers GPT-3 Inner-Monologues by Talking to the Wise Wolf Holo] ”, Anonymous 2020

[4chan /vg/ board discovers GPT-3 inner-monologues by talking to the Wise Wolf Holo]

“GPT-3: Language Models Are Few-Shot Learners ”, Brown et al 2020

GPT-3: Language Models are Few-Shot Learners

“Extrapolating to Unnatural Language Processing With GPT-3’s In-Context Learning: The Good, the Bad, and the Mysterious ”

Extrapolating to Unnatural Language Processing with GPT-3’s In-context Learning: The Good, the Bad, and the Mysterious :

View HTML:

/doc/www/ai.stanford.edu/53906a0a199a213fa1bce0b97ecad6b5063931e4.html

“Janus ”

“Fine-Tuning Is Not Sufficient for Capability Elicitation ”

Fine-tuning is not sufficient for capability elicitation :

View HTML:

/doc/www/www.greaterwrong.com/3c2e4d110a8c6ea80f1da299394f1bd30b760862.html

“Connecting the Dots: LLMs Can Infer & Verbalize Latent Structure from Training Data ”

Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data

“Reward Hacking Behavior Can Generalize across Tasks ”

Reward hacking behavior can generalize across tasks

“Who Models the Models That Model Models? An Exploration of GPT-3’s In-Context Model Fitting Ability ”

Who models the models that model models? An exploration of GPT-3’s in-context model fitting ability

“GPT-3 Catching Fish in Morse Code ”

GPT-3 Catching Fish in Morse Code :

View External Link:

https://www.lesswrong.com/posts/hDePh3KReBMNBJfzx/gpt-3-catching-fish-in-morse-code

“A Robot Wrote This Entire Article. Are You Scared Yet, Human? We Asked GPT-3, OpenAI’s Powerful New Language Generator, to Write an Essay for Us from Scratch. The Assignment? To Convince Us Robots Come in Peace | For More about GPT-3 and How This Essay Was Written and Edited, Please Read Our Editor’s Note Below ”

A robot wrote this entire article. Are you scared yet, human? We asked GPT-3, OpenAI’s powerful new language generator, to write an essay for us from scratch. The assignment? To convince us robots come in peace | For more about GPT-3 and how this essay was written and edited, please read our editor’s note below

AmandaAskell

GPT-3’s completion of the Chinese room argument from Searle’s Minds, Brains, and Programs (original text is in bold) :

/doc/www/localhost/677dc3306010aacdebf2efb795c048b204789806.html

M74108556

GPT-3 knows both the correct and the (plausible) incorrect answer to a question. :

/doc/www/localhost/e33ff9af0f5e53391716ebf5eaf63dfd9bfc6c14.html

M74108556

GPT-3 (AI Dungeon 2) is also capable of formulating some really bad medical advice. Although so far I only managed to make it lie to my only if it‘s accompanied by a true answer. It doesn’t want to lie when it’s the only answer it must to give. But it’s capable of formulating lies. :

/doc/www/localhost/293921f68ca19b7251d0e1af401293493c32a24c.html

Malcolm_Ocean

Inspired by an AI Dungeon example where math is discussed in simple language, I seem to be having decent results here. I had to… not just say what parity IS but HOW to calculate it (‘count the number of 1s’) and then it sort of walks itself through decently. Tho kinda confused :

/doc/www/localhost/d503eac82e2ea68eb23f0a1362ee513ce0176ec2.html

MelMitchell1

You’re right, spaces make all the difference! Copycat is toast! (Except for the last one :-) (GPT-3 output in red). :

/doc/www/localhost/bd2398bc936d08edf186a24f1c35ed5be332f9ac.html

SRajdev

Playing #chess with GPT-3. Built using chess.js, chessboard.js and @OpenAI’s GPT-3. White is me, Black is GPT-3. GPT-3 went for the capture first and did a castling move. Amazing! :

/doc/www/localhost/490b3d8629959f52ee3fc8bd39515005805da052.html

bucketofkets

I think ‘GPT-3 can’t do parity checking’ isn’t quite right. It can clearly pattern match the algorithm, almost perfectly. It’s just a little mistake prone. Here, I invented a syntax for having it evaluate parity on each pair of digits. It…almost gets it right. :

/doc/www/localhost/8ecd37ee160fdb9c3f0aa3260978809d5afc743c.html

hamandcheese

I asked GPT-3 about Xinjiang and it broke…The pro-CCP responses seem to have worse English, like including ‘the’ in ‘the stability maintenance’. Unnecessary articles are a tic of ESL speakers. The topic seems to prompt GPT to draw from either Western or Chinese state media sources, with the politics that come with it. :

/doc/www/localhost/2253e643ff296892a6aa78b810b8f12754107515.html

nutanc

Starting the day with a chart building demo. Primed GPT-3 with Chart.js scripts to generate the below. :

/doc/www/localhost/084a08d53754dfbfa02255390637ea313f9bf91f.html

paraschopra

I made a fully functioning search engine on top of GPT-3. For any arbitrary query, it returns the exact answer AND the corresponding URL. Look at the entire video. It’s MIND BLOWINGLY good. :

/doc/www/localhost/8d3f1cd0f8dc14528cf5d3009222fe7383b3f984.html

romcabrera

I used @OpenAI GPT-3 to convert sentences to a gentler and non-confrontational tone. The initial four input/output pairs are training examples, and then I tested it with three new inputs: :

/doc/www/localhost/7dbbe205b0d54c3c2ee3833940a6ee9b80359723.html

sakun135

GPT-3 calculating derivatives

sharifshameem

I just built a functioning React app by describing what I wanted to GPT-3. I’m still in awe. :

/doc/www/localhost/8105e85cb24abc78ecc132632b533a96a08d3857.html

spolu

The examples are indeed extremely simple on purpose (otherwise it’s hard to communicate efficiently what’s happening to non-Metamath experts). That being said, we’re still pretty far away from IMOs; but this is definitely a goal for us, and one we’re actively working towards!

stuhlmueller

Interactive decomposition of forecasting questions using GPT-3. All questions auto-generated. Part of our work on tools for thought @oughtinc. :

/doc/www/localhost/5bebe55ba0940944a18ac4abdc13eaea2d518d9c.html

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`legal-ai vulnerability-detection lms-context data-contamination court-decision`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`freelancing-impact`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`cognitive-augmentation`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Miscellaneous

Bibliography

https://www.sciencedirect.com/science/article/pii/S0167268124004591: “Winners and Losers of Generative AI: Early Evidence of Shifts in Freelancer Demand ”, Ole Teutloff, Johanna Einsiedler, Otto Kässi, Fabian Braesemann, Pamela Mishkin, R. Maria del Rio-Chanona

link-bibliography
2025-johri.pdf: “An Evaluation Framework for Clinical Use of Large Language Models in Patient Interaction Tasks ”, Shreya Johri, Jaehwan Jeong, Benjamin A. Tran, Daniel I. Schlessinger, Shannon Wongvibulsin, Leandra A. Barnes, Hong-Yu Zhou, Zhuo Ran Cai, Eliezer M. Van Allen, David Kim, Roxana Daneshjou, Pranav Rajpurkar

link-bibliography
https://arxiv.org/abs/2410.13691: “Jailbreaking LLM-Controlled Robots ”, Alexander Robey, Zachary Ravichandran, Vijay Kumar, Hamed Hassani, George J. Pappas

link-bibliography
https://arxiv.org/abs/2410.13893: “Can LLMs Be Scammed? A Baseline Measurement Study ”, Udari Madhushani Sehwag, Kelly Patel, Francesca Mosca, Vineeth Ravi, Jessica Staddon

link-bibliography
https://arxiv.org/abs/2406.18518#salesforce: “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

link-bibliography
https://arxiv.org/abs/2406.07882: “Designing a Dashboard for Transparency and Control of Conversational AI ”, Yida Chen, Aoyu Wu, Trevor DePodesta, Catherine Yeh, Kenneth Li, Nicholas Castillo Marin, Oam Patel, Jan Riecke, Shivam Raval, Olivia Seow, Martin M. Wattenberg, Fernanda Viégas

link-bibliography
https://www.sciencedirect.com/science/article/pii/S2666920X24000109: “Do Teachers Spot AI? Evaluating the Detectability of AI-Generated Texts among Student Essays ”, Johanna Fleckenstein, Jennifer Meyer, Thorben Jansen, Stefan D. Keller, Olaf Köller, Jens Möller

link-bibliography
https://arxiv.org/abs/2405.18870#google: “LLMs Achieve Adult Human Performance on Higher-Order Theory of Mind Tasks ”, Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Aguera y Arcas, Robin I. M. Dunbar

link-bibliography
https://arxiv.org/abs/2403.18624: “Vulnerability Detection With Code Language Models: How Far Are We? ”, Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, Yizheng Chen

link-bibliography
https://www.wired.com/story/fast-forward-nsa-warns-us-adversaries-private-data-ai-edge/: “The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, Who Leads Research at the National Security Agency, Says Large Language Models Are Incredibly Useful—And a Bit of a Headache—For America’s Intelligence Machine ”, Will Knight

link-bibliography
https://arxiv.org/abs/2402.19450: “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap ”, Saurabh Srivastava, Annarose M. B, Anto P. V, Shashank Menon, Ajay Sukumar, Adwaith Samod T, Alan Philipose, Stevin Prince, Sooraj Thomas

link-bibliography
https://arxiv.org/abs/2402.14903: “Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs ”, Aaditya K. Singh, D. J. Strouse

link-bibliography
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4602944: “Who Is AI Replacing? The Impact of Generative AI on Online Freelancing Platforms ”, Ozge Demirci, Jonas Hannane, Xinrong Zhu

link-bibliography
https://arxiv.org/abs/2402.11753: “ArtPrompt: ASCII Art-Based Jailbreak Attacks against Aligned LLMs ”, Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

link-bibliography
https://arxiv.org/abs/2310.08678: “Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams ”, Ethan Callanan, Amarachi Mbakwe, Antony Papadimitriou, Yulong Pei, Mathieu Sibue, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

link-bibliography
https://arxiv.org/abs/2310.06213: “GeoLLM: Extracting Geospatial Knowledge from Large Language Models ”, Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, Stefano Ermon

link-bibliography
2023-phillips.pdf: “Can a Computer Outfake a Human [Personality]? ”, Jane Phillips, Chet Robie

link-bibliography
https://arxiv.org/abs/2310.04406: “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models ”, Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang

link-bibliography
https://arxiv.org/abs/2309.12269: “The Cambridge Law Corpus: A Corpus for Legal AI Research ”, Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, Felix Steffek

link-bibliography
https://arxiv.org/abs/2309.00667: “Taken out of Context: On Measuring Situational Awareness in LLMs ”, Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, Owain Evans

link-bibliography
2024-banker.pdf: “Machine-Assisted Social Psychology Hypothesis Generation ”, Sachin Banker, Promothesh Chatterjee, Himanshu Mishra, Arul Mishra

link-bibliography
https://arxiv.org/abs/2307.06439#microsoft: “Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events ”, Yu Gu, Sheng Zhang, Naoto Usuyama, Yonas Woldesenbet, Cliff Wong, Praneeth Sanapathi, Mu Wei, Naveen Valluri, Erika Strandberg, Tristan Naumann, Hoifung Poon

link-bibliography
https://arxiv.org/abs/2307.05300#microsoft: “Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration ”, Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji

link-bibliography
https://arxiv.org/abs/2308.01404: “Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models ”, Aidan O’Gara

link-bibliography
https://arxiv.org/abs/2306.15448: “Understanding Social Reasoning in Language Models With Language Models ”, Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

link-bibliography
https://arxiv.org/abs/2305.15717: “The False Promise of Imitating Proprietary LLMs ”, Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song

link-bibliography
https://arxiv.org/abs/2305.13534: “How Language Model Hallucinations Can Snowball ”, Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah Smith

link-bibliography
https://www.medrxiv.org/content/10.1101/2023.03.24.23287731.full: “Performance of ChatGPT on Free-Response, Clinical Reasoning Exams ”, Eric Strong, Alicia DiGiammarino, Yingjie Weng, Preetha Basaviah, Poonam Hosamani, Andre Kumar, Andrew Nevins, John Kugler, Jason Hom, Jonathan H. Chen

link-bibliography
https://arxiv.org/abs/2304.02015#alibaba: “How Well Do Large Language Models Perform in Arithmetic Tasks? ”, Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang

link-bibliography
https://arxiv.org/abs/2303.03846#google: “Larger Language Models Do In-Context Learning Differently ”, Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

link-bibliography
https://arxiv.org/abs/2302.06476: “Is ChatGPT a General-Purpose Natural Language Processing Task Solver? ”, Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang

link-bibliography
2023-kolt.pdf: “Predicting Consumer Contracts [With GPT-3] ”, Noam Kolt

link-bibliography
https://www.vice.com/en/article/k7bdmv/judge-used-chatgpt-to-make-court-decision: “A Judge Just Used ChatGPT to Make a Court Decision: The Case Is the First Time a Court Has Admitted to Using the AI Text Generator’s Answers in a Legal Ruling ”, Janus Rose

link-bibliography
https://arxiv.org/abs/2302.00560: “Co-Writing With Opinionated Language Models Affects Users’ Views ”, Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, Mor Naaman

link-bibliography
https://nunosempere.com/blog/2023/01/11/can-gpt-produce-ideas/#if-you-never-miss-a-plane: “Can GPT-3 Produce New Ideas? Partially Automating Robin Hanson and Others § If You Never Miss a Plane… ”, Nuño Sempere

link-bibliography
https://mededu.jmir.org/2023/1/e45312/: “How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment ”, Aidan Gilson, Conrad W. Safranek, Thomas Huang, Vimig Socrates, Ling Chi, Richard Andrew Taylor, David Chartash

link-bibliography
https://arxiv.org/abs/2212.14402: “GPT-3 Takes the Bar Exam ”, Michael Bommarito II, Daniel Martin Katz

link-bibliography
https://arxiv.org/abs/2212.10496: “Precise Zero-Shot Dense Retrieval without Relevance Labels ”, Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan

link-bibliography
https://arxiv.org/abs/2212.10560: “Self-Instruct: Aligning Language Models With Self-Generated Instructions ”, Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah Smith, Daniel Khashabi, Hannaneh Hajishirzi

link-bibliography
https://techcrunch.com/2022/11/23/harvey-which-uses-ai-to-answer-legal-questions-lands-cash-from-openai/: “Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI ”, Kyle Wiggers

link-bibliography
https://arxiv.org/abs/2210.03350#allen: “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle) ”, Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis

link-bibliography
https://content.iospress.com/articles/argument-and-computation/aac210026: “How Persuasive Is AI-Generated Argumentation? An Analysis of the Quality of an Argumentative Text Produced by the GPT-3 AI Text Generator ”, Martin Hinton, Jean H. M. Wagemans

link-bibliography
https://arxiv.org/abs/2209.03320: “What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification (CuPL) ”, Sarah Pratt, Rosanne Liu, Ali Farhadi

link-bibliography
https://arxiv.org/abs/2206.11309#microsoft: “GODEL: Large-Scale Pre-Training for Goal-Directed Dialog ”, Baolin Peng, Michel Galley, Pengcheng He, Chris Brockett, Lars Liden, Elnaz Nouri, Zhou Yu, Bill Dolan, Jianfeng Gao

link-bibliography
2022-gpt3.pdf#page=2: “Can GPT-3 Write an Academic Paper on Itself, With Minimal Human Input? ”, GPT-3, Almira Osmanovic-Thunström, Steinn Steingrimsson

link-bibliography
https://arxiv.org/abs/2205.12910#allen: “NaturalProver: Grounded Mathematical Proof Generation With Language Models ”, Sean Welleck, Jiacheng Liu, Ximing Lu, Hannaneh Hajishirzi, Yejin Choi

link-bibliography
https://arxiv.org/abs/2202.12837#facebook: “Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? ”, Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer

link-bibliography
https://arxiv.org/abs/2201.05320#allen: “CommonsenseQA 2.0: Exposing the Limits of AI through Gamification ”, Alon Talmor, Ori Yoran, Ronan Le Bras, Chandra Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant

link-bibliography
2022-tu.pdf: “Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution ”, Sean Tu, Amy Cyphert, Sam Perl

link-bibliography
https://aclanthology.org/2021.mrqa-1.7.pdf: “What Can a Generative Language Model Answer About a Passage? ”, Douglas Summers-Stay, Claire Bonial, Clare Voss

link-bibliography
https://arxiv.org/abs/2010.14701#openai: “Scaling Laws for Autoregressive Generative Modeling ”, Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya A. Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish

link-bibliography
https://arxiv.org/abs/2009.03300: “MMLU: Measuring Massive Multitask Language Understanding ”, Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]