APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
Designing a Dashboard for Transparency and Control of Conversational AI
Delving into ChatGPT usage in academic writing through excess vocabulary
Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays
LLMs achieve adult human performance on higher-order theory of mind tasks
Can Language Models Explain Their Own Classification Behavior?
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
FABLES: Evaluating faithfulness and content selection in book-length summarization
Vulnerability Detection with Code Language Models: How Far Are We?
The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, who leads research at the National Security Agency, says large language models are incredibly useful—and a bit of a headache—for America’s intelligence machine
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
Who Is AI Replacing? The Impact of Generative AI on Online Freelancing Platforms
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models
The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
Does Using ChatGPT Result in Human Cognitive Augmentation?
Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach
TinyGSM: achieving >80% on GSM8k with small language models
Universal Self-Consistency for Large Language Model Generation
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams
Large language models can replicate cross-cultural differences in personality
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
GeoLLM: Extracting Geospatial Knowledge from Large Language Models
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Using Large Language Models for Qualitative Analysis Can Introduce Serious Bias
MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
Assessing the nature of large language models: A caution against anthropocentrism
A boy saw 17 doctors over 3 years for chronic pain. ChatGPT found the diagnosis
Taken out of context: On measuring situational awareness in LLMs
Investigating the Existence of ‘Secret Language’ in Language Models
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration
Explaining Competitive-Level Programming Solutions using LLMs
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models
Understanding Social Reasoning in Language Models with Language Models
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
Can large language models democratize access to dual-use biotechnology?
Iterative Translation Refinement with Large Language Models
Don’t Want Students to Rely on ChatGPT? Have Them Use It: It’s easy to forget how little students and educators understand generative AI’s flaws. Once they actually try it out, they’ll see that it can’t replace them
The Exciting Potential for ChatGPT in Obstetrics and Gynecology
Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition
Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure
Performance of ChatGPT on free-response, clinical reasoning exams
How well do Large Language Models perform in Arithmetic tasks?
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Use GPT-3 incorrectly: reduce costs 40× and increase speed by 5×
A Judge Just Used ChatGPT to Make a Court Decision: The case is the first time a court has admitted to using the AI text generator’s answers in a legal ruling
Co-Writing with Opinionated Language Models Affects Users’ Views
The inside story of ChatGPT: How OpenAI founder Sam Altman built the world’s hottest technology with billions from Microsoft
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Can GPT-3 produce new ideas? Partially automating Robin Hanson and others § If you never miss a plane…
How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment
Precise Zero-Shot Dense Retrieval without Relevance Labels
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Harvey, which uses AI to answer legal questions, lands cash from OpenAI
LMentry: A Language Model Benchmark of Elementary Language Tasks
Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)
How persuasive is AI-generated argumentation? An analysis of the quality of an argumentative text produced by the GPT-3 AI text generator
Out of One, Many: Using Language Models to Simulate Human Samples
What does a platypus look like? Generating customized prompts for zero-shot image classification (CuPL)
Limitations of Language Models in Arithmetic and Symbolic Induction
Can GPT-3 write an academic paper on itself, with minimal human input?
NaturalProver: Grounded Mathematical Proof Generation with Language Models
InstructGPT: Training language models to follow instructions with human feedback
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Impact of Pretraining Term Frequencies on Few-Shot Reasoning
Memory-assisted prompt editing to improve GPT-3 after deployment
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution
What Can a Generative Language Model Answer About a Passage?
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
Extrapolating to Unnatural Language Processing With GPT-3’s In-Context Learning: The Good, the Bad, and the Mysterious
Connecting the Dots: LLMs Can Infer & Verbalize Latent Structure from Training Data
Who Models the Models That Model Models? An Exploration of GPT-3’s In-Context Model Fitting Ability
A Robot Wrote This Entire Article. Are You Scared Yet, Human? We Asked GPT-3, OpenAI’s Powerful New Language Generator, to Write an Essay for Us from Scratch. The Assignment? To Convince Us Robots Come in Peace | For More about GPT-3 and How This Essay Was Written and Edited, Please Read Our Editor’s Note Below
You’re Right, Spaces Make All the Difference! Copycat Is Toast! (Except for the Last One :-) (GPT-3 Output in Red).
Playing #chess With GPT-3. Built Using Chess.js, Chessboard.js and @OpenAI’s GPT-3. White Is Me, Black Is GPT-3. GPT-3 Went for the Capture First and Did a Castling Move. Amazing!
I Think ‘GPT-3 Can’t Do Parity Checking’ Isn’t Quite Right. It Can Clearly Pattern Match the Algorithm, Almost Perfectly. It’s Just a Little Mistake Prone. Here, I Invented a Syntax for Having It Evaluate Parity on Each Pair of Digits. It...almost Gets It Right.
I Asked GPT-3 about Xinjiang and It Broke...The Pro-CCP Responses Seem to Have Worse English, like including ‘the’ in ‘the Stability Maintenance’. Unnecessary Articles Are a Tic of ESL Speakers. The Topic Seems to Prompt GPT to Draw from Either Western or Chinese State Media Sources, With the Politics That Come With It.
The Examples Are Indeed Extremely Simple on Purpose (otherwise It’s Hard to Communicate Efficiently What’s Happening to Non-Metamath Experts). That Being Said, We’re Still Pretty Far Away from IMOs; but This Is Definitely a Goal for Us, and One We’re Actively Working Towards!
2023-manvi-figure3-performanceofllmsandtabularmethodstopredictpopulationworldwide.png
2023-brynjolfsson-w31161-improvementincustomercomplaintresolutionperhourusinggpt3.jpg
https://apnews.com/article/brazil-artificial-intelligence-porto-alegre-5afd1240afe7b6ac202bb0bbc45e08d4
https://automated.beehiiv.com/p/aiimmunity-challenge-lessons-clinical-research-exam
https://chat.openai.com/share/25124525-0bad-4c13-ae5a-ae4beac60360
https://davidabell.substack.com/p/playing-around-with-machine-translation
https://dropbox.tech/machine-learning/prompt-injection-with-control-characters-openai-chatgpt-llm
https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2812620
https://jxnl.github.io/instructor/blog/2023/11/05/chain-of-density/
https://medium.com/@JarrettYe/casting-a-spell-on-chatgpt-let-it-write-anki-cards-for-you-a-prompt-engineering-case-fd7d577b9d94
https://model-checking.github.io/kani-verifier-blog/2023/05/01/writing-code-with-chatgpt-improve-it-with-kani.html
https://openai.com/blog/function-calling-and-other-api-updates#function-calling
https://restofworld.org/2023/ai-revolution-outsourced-workers/
https://tytonpartners.com/app/uploads/2023/10/GenAI-IN-HIGHER-EDUCATION-FALL-2023-UPDATE-TIME-FOR-CLASS-STUDY.pdf#page=4
https://www.ft.com/content/9aeb482d-f781-45c0-896f-38fdcc912139
https://www.getlibretto.com/blog/does-it-matter-which-examples-you-choose-for-few-shot-prompting
https://www.integrity-research.com/ai-fails-insider-trading-test/
https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-size
https://www.lesswrong.com/posts/qbbaF79uJqvmWZELv/real-life-sort-by-controversial
https://www.nytimes.com/2023/06/08/business/khan-ai-gpt-tutoring-bot.html
https://www.nytimes.com/2023/12/13/technology/chatbot-cheating-schools-students.html
https://www.pewresearch.org/short-reads/2024/03/26/americans-use-of-chatgpt-is-ticking-up-but-few-trust-its-election-information/
https://www.reddit.com/r/ChatGPT/comments/15et6f2/well_i_got_what_i_asked_for/
https://www.reddit.com/r/OpenAI/comments/xlvygv/artifical_intelligence_allows_me_to_get_straight/
https://www.reddit.com/r/TrueOffMyChest/comments/12zjiwq/my_wifes_company_has_started_replacing_positions/jhtkckq/
https://www.supersimple.io/blog/gpt-4-fine-tuning-early-access
https://www.theguardian.com/technology/commentisfree/2020/sep/11/artificial-intelligence-robot-writing-gpt-3
https://www.vice.com/en/article/5d93p3/what-happens-when-you-ask-ai-to-control-your-life
https://www.wired.com/story/china-chatgpt-opportunists-grifters-hard-at-work/
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
https%253A%252F%252Farxiv.org%252Fabs%252F2406.18518%2523salesforce.html
Designing a Dashboard for Transparency and Control of Conversational AI
Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays
https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS2666920X24000109.html
LLMs achieve adult human performance on higher-order theory of mind tasks
https%253A%252F%252Farxiv.org%252Fabs%252F2405.18870%2523google.html
Vulnerability Detection with Code Language Models: How Far Are We?
The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, who leads research at the National Security Agency, says large language models are incredibly useful—and a bit of a headache—for America’s intelligence machine
https%253A%252F%252Fwww.wired.com%252Fstory%252Ffast-forward-nsa-warns-us-adversaries-private-data-ai-edge%252F.html
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
Who Is AI Replacing? The Impact of Generative AI on Online Freelancing Platforms
https%253A%252F%252Fpapers.ssrn.com%252Fsol3%252Fpapers.cfm%253Fabstract_id%253D4602944.html
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams
GeoLLM: Extracting Geospatial Knowledge from Large Language Models
%252Fdoc%252Fpsychology%252Fpersonality%252F2023-phillips.pdf.html
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Taken out of context: On measuring situational awareness in LLMs
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252F3%252Fnonfiction%252F2024-banker.pdf.html
Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events
https%253A%252F%252Farxiv.org%252Fabs%252F2307.06439%2523microsoft.html
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration
https%253A%252F%252Farxiv.org%252Fabs%252F2307.05300%2523microsoft.html
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models
Understanding Social Reasoning in Language Models with Language Models
Performance of ChatGPT on free-response, clinical reasoning exams
https%253A%252F%252Fwww.medrxiv.org%252Fcontent%252F10.1101%252F2023.03.24.23287731.full.html
How well do Large Language Models perform in Arithmetic tasks?
https%253A%252F%252Farxiv.org%252Fabs%252F2304.02015%2523alibaba.html
https%253A%252F%252Farxiv.org%252Fabs%252F2303.03846%2523google.html
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
A Judge Just Used ChatGPT to Make a Court Decision: The case is the first time a court has admitted to using the AI text generator’s answers in a legal ruling
https%253A%252F%252Fwww.vice.com%252Fen%252Farticle%252Fk7bdmv%252Fjudge-used-chatgpt-to-make-court-decision.html
Co-Writing with Opinionated Language Models Affects Users’ Views
Can GPT-3 produce new ideas? Partially automating Robin Hanson and others § If you never miss a plane…
https%253A%252F%252Fnunosempere.com%252Fblog%252F2023%252F01%252F11%252Fcan-gpt-produce-ideas%252F%2523if-you-never-miss-a-plane.html
How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment
https%253A%252F%252Fmededu.jmir.org%252F2023%252F1%252Fe45312%252F.html
Precise Zero-Shot Dense Retrieval without Relevance Labels
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Harvey, which uses AI to answer legal questions, lands cash from OpenAI
https%253A%252F%252Ftechcrunch.com%252F2022%252F11%252F23%252Fharvey-which-uses-ai-to-answer-legal-questions-lands-cash-from-openai%252F.html
Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)
https%253A%252F%252Farxiv.org%252Fabs%252F2210.03350%2523allen.html
How persuasive is AI-generated argumentation? An analysis of the quality of an argumentative text produced by the GPT-3 AI text generator
https%253A%252F%252Fcontent.iospress.com%252Farticles%252Fargument-and-computation%252Faac210026.html
What does a platypus look like? Generating customized prompts for zero-shot image classification (CuPL)
Can GPT-3 write an academic paper on itself, with minimal human input?
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252F3%252Fnonfiction%252F2022-gpt3.pdf%2523page%253D2.html
NaturalProver: Grounded Mathematical Proof Generation with Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2205.12910%2523allen.html
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
https%253A%252F%252Farxiv.org%252Fabs%252F2202.12837%2523facebook.html
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
https%253A%252F%252Farxiv.org%252Fabs%252F2201.05320%2523allen.html
Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution
What Can a Generative Language Model Answer About a Passage?
https%253A%252F%252Faclanthology.org%252F2021.mrqa-1.7.pdf.html
https%253A%252F%252Farxiv.org%252Fabs%252F2010.14701%2523openai.html
Wikipedia Bibliography: