“‘GPT-4 Nonfiction’ Tag”,2022-10-17
![]()
Bibliography for tag
ai/nn/transformer/gpt/4/nonfiction, most recent first: 174 annotations & 205 links (parent).
- See Also
- Gwern
- Links
- “Business Spending on AI Surged 500% This Year to $13.8 Billion”
- “Generative Agent Simulations of 1,000 People”, et al 2024
- “Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters”, et al 2024
- “Can LLMs Be Scammed? A Baseline Measurement Study”, et al 2024
- “SimpleStrat: Diversifying Language Model Generation With Stratification”, et al 2024
- “MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering”, et al 2024
- “Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making”, et al 2024
- “Can OpenAI’s
o1-PreviewAce the 2023 Putnam Exam?”, 2024- “When a Language Model Is Optimized for Reasoning, Does It Still Show Embers of Autoregression? An Analysis of OpenAI O1”, et al 2024
- “Invisible Unicode Text That AI Chatbots Understand and Humans Can’t? Yep, It’s a Thing”
- “I Quit Teaching Because of ChatGPT”, 2024
- “Evaluation of OpenAI O1: Opportunities and Challenges of AGI”, et al 2024
- “That Message From Your Doctor? It May Have Been Drafted by ChatGPT-4”
- “LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s O1 on PlanBench”, et al 2024
- “I Have Played a Little Bit With OpenAI’s New Iteration, GPT-4 O1”, 2024
- “Thoughts While Watching Myself Be Automated”, 2024
- “Generative AI Can Harm Learning”, et al 2024
- “Does Refusal Training in LLMs Generalize to the Past Tense?”, 2024
- “GPT-4 Is Judged More Human Than Humans in Displaced and Inverted Turing Tests”, et al 2024
- “On Scalable Oversight With Weak LLMs Judging Strong LLMs”, et al 2024
- “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, et al 2024
- “Are Large Language Models Consistent over Value-Laden Questions?”, et al 2024
- “Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation”, et al 2024
- “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets”, et al 2024
- “A Real-World Test of Artificial Intelligence Infiltration of a University Examinations System: A ‘Turing Test’ Case Study”, et al 2024
- “Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data”, et al 2024
- “OlympicArena: Benchmarking Multi-Discipline Cognitive Reasoning for Superintelligent AI”, et al 2024
- “What Are the Odds? Language Models Are Capable of Probabilistic Reasoning”, et al 2024
- “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, et al 2024
- “Development Cost of ARC GPT-4o Prototype”, 2024
- “GUI-WORLD: A Dataset for GUI-Oriented Multimodal LLM-Based Agents”, et al 2024
- “Are We Done With MMLU?”, et al 2024
- “Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-Modal LLMs in Video Analysis”, et al 2024
- “LLMs Achieve Adult Human Performance on Higher-Order Theory of Mind Tasks”, et al 2024
- “Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models”, et al 2024
- “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, et al 2024
- “DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data”, et al 2024
- “Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization”, et al 2024
- “Can Language Models Explain Their Own Classification Behavior?”, et al 2024
- “ChatGPT Will Be Able to Talk to You like Scarlett Johansson in Her / Upgrades to ChatGPT’s Voice Mode Bring It Closer to the Vision of a Responsive AI Assistant—And Sam Altman Seems to Know It”, 2024
- “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, et al 2024
- “Aligning LLM Agents by Learning Latent Preference from User Edits”, et al 2024
- “Automated Social Science: Language Models As Scientist and Subjects”, et al 2024
- “Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience”, et al 2024
- “LLM Evaluators Recognize and Favor Their Own Generations”, et al 2024
- “Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation”, et al 2024
- “Is ChatGPT Transforming Academics’ Writing Style?”, 2024
- “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, et al 2024
- “Election Workers Are Drowning in Records Requests. AI Chatbots Could Make It Worse: Experts Worry That Election Deniers Could Weaponize Chatbots to Overwhelm and Slow down Local Officials”, 2024
- “Visualization-Of-Thought Elicits Spatial Reasoning in Large Language Models”, et al 2024
- “FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, et al 2024
- “Re-Evaluating GPT-4’s Bar Exam Performance”, 2024
- “A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round Could Increase Startup’s Valuation Nearly Sixfold in a Matter of Weeks, Reflecting AI Frenzy”, 2024
- “Vulnerability Detection With Code Language Models: How Far Are We?”, et al 2024
- “Long-Form Factuality in Large Language Models”, et al 2024
- “Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A New Startup Called Cognition AI Can Turn a User’s Prompt into a Website or Video Game”, 2024
- “Playing NetHack With LLMs: Potential & Limitations As Zero-Shot Agents (NetPlay)”, et al 2024
- “Teaching Large Language Models an Unseen Language on the Fly”, et al 2024
- “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, et al 2024
- “Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, 2024
- “
ArtPrompt: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, et al 2024- “Tasks That Language Models Don’t Learn”, 2024
- “Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models”, 2024
- “The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4”, 2024
- “I Think, Therefore I Am: Benchmarking Awareness of Large Language Models Using AwareBench”, et al 2024
- “Better Call GPT, Comparing Large Language Models Against Lawyers”, et al 2024
- “I Am a Strange Dataset: Metalinguistic Tests for Language Models”, et al 2024
- “GPT-4-V(ision) Is a Human-Aligned Evaluator for Text-To-3D Generation”, et al 2024
- “A Vision Check-Up for Language Models”, et al 2024
- “Leveraging Large Language Models to Boost Dafny’s Developers Productivity”, et al 2024
- “Originality Dies When Being Average Is Easier”
- “Testing Theory of Mind in Large Language Models and Humans”
- “GPT-4 Passes the Bar Exam”, et al 2024
- “Large Language Models Are Able to Downplay Their Cognitive Abilities to Fit the Persona They Simulate”, et al 2024
- “WaveCoder: Widespread And Versatile Enhanced Instruction Tuning With Refined Data Generation”, et al 2023
- “PRER: Modeling Complex Mathematical Reasoning via Large Language Model Based MathAgent”, et al 2023
- “Can Linguists Distinguish between ChatGPT and Human Writing?: A Study of Research Ethics and Academic Publishing”, 2023
- “Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine”, et al 2023
- “GPQA: A Graduate-Level Google-Proof Q&A Benchmark”, et al 2023
- 42irrationalist @ “2023-11-19”
- “Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation”, et al 2023
- “Comparing Humans, GPT-4, and GPT-4-V On Abstraction and Reasoning Tasks”, et al 2023
- “In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search”, et al 2023
- “The Impact of Large Language Models on Scientific Discovery: a Preliminary Study Using GPT-4”, AI42023
- “Accuracy of a Vision-Language Model on Challenging Medical Cases”, et al 2023
- “Large Language Models Can Strategically Deceive Their Users When Put Under Pressure”, et al 2023
- “Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves”, et al 2023
- “Augmenting Large Language Models With Chemistry Tools”, et al 2023
- “FANToM: A Benchmark for Stress-Testing Machine Theory of Mind in Interactions”, et al 2023
- “Branch-Solve-Merge Improves Large Language Model Evaluation and Generation”, et al 2023
- “Eureka: Human-Level Reward Design via Coding Large Language Models”, et al 2023
- “Set-Of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4-V”, et al 2023
- “Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, 2023
- “Data Contamination Through the Lens of Time”, et al 2023
- “Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams”, et al 2023
- “Large Language Models Can Replicate Cross-Cultural Differences in Personality”, et al 2023
- “Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, et al 2023
- “SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, et al 2023
- “Can a Computer Outfake a Human [Personality]?”, 2023
- “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, et al 2023
- “FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, et al 2023
- “Police Officers Are Starting to Use AI to Write Crime Reports”
- “Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis”, et al 2023
- “Low-Resource Languages Jailbreak GPT-4”, et al 2023
- “An Evolutionary Model of Personality Traits Related to Cooperative Behavior Using a Large Language Model”, 2023
- “UltraFeedback: Boosting Language Models With High-Quality Feedback”, et al 2023
- “MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, et al 2023
- “Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve”, et al 2023
- “The Cambridge Law Corpus: A Corpus for Legal AI Research”, Östling et al 2023
- “The Reversal Curse: LLMs Trained on “A Is B” Fail to Learn “B Is A””, et al 2023
- “From Sparse to Dense: GPT-4 Summarization With Chain of Density (CoD) Prompting”, et al 2023
- “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, et al 2023
- “ExpeL: LLM Agents Are Experiential Learners”, et al 2023
- “LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models”, et al 2023
- “Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-Based Self-Verification”, et al 2023
- “OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?”, Blair- et al 2023
- “Testing GPT-4 With Wolfram Alpha and Code Interpreter Plug-Ins on Math and Science Problems”, 2023
- “The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain”, et al 2023
- “I’m a Screenwriter. These AI Jokes Give Me Nightmares”, 2023
- “A LLM Assisted Exploitation of AI-Guardian”, 2023
- “OpenAI Worries About What Its Chatbot Will Say About People’s Faces: An Advanced Version of ChatGPT Can Analyze Images and Is Already Helping the Blind. But Its Ability to Put a Name to a Face Is One Reason the Public Doesn’t Have Access to It”, 2023
- “GPT-4, an Artificial Intelligence Large Language Model, Exhibits High Levels of Accuracy on Dermatology Specialty Certificate Exam Questions”, et al 2023
- “Machine-Assisted Social Psychology Hypothesis Generation”, et al 2023
- “Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events”, et al 2023
- “Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration”, et al 2023
- “Explaining Competitive-Level Programming Solutions Using LLMs”, et al 2023
- “Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models”, 2023
- “LeanDojo: Theorem Proving With Retrieval-Augmented Language Models”, et al 2023
- “ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews”, et al 2023
- “Understanding Social Reasoning in Language Models With Language Models”, et al 2023
- “Evaluating Superhuman Models With Consistency Checks”, et al 2023
- “Evaluating the Robustness of Text-To-Image Diffusion Models against Real-World Attacks”, et al 2023
- “ChessGPT: Bridging Policy Learning and Language Modeling”, et al 2023
- “Large Language Models As Tax Attorneys: A Case Study in Legal Capabilities Emergence”, et al 2023
- “Can Large Language Models Democratize Access to Dual-Use Biotechnology?”, et al 2023
- “Let’s Verify Step by Step”, et al 2023
- “GPT4GEO: How a Language Model Sees the World’s Geography”, et al 2023
- “LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-Based Representations”, et al 2023
- “Learning to Generate Novel Scientific Directions With Contextualized Literature-Based Discovery”, et al 2023
- “WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia”, et al 2023
- “How Language Model Hallucinations Can Snowball”, et al 2023
- “C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models”, et al 2023
- “Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns”, 2023
- “Boosting Theory-Of-Mind Performance in Large Language Models via Prompting”, 2023
- “Today Was the First Day That I Could Definitively Say That GPT-4 Has Saved Me a Substantial Amount of Tedious Work”, 2023
- “Humans in Humans Out: On GPT Converging Toward Common Sense in Both Success and Failure”, Koralus & Wang-2023
- “Advances in Apparent Conceptual Physics Reasoning in GPT-4”, 2023
- “Performance of ChatGPT on Free-Response, Clinical Reasoning Exams”, et al 2023
- “Reflexion: Language Agents With Verbal Reinforcement Learning”, et al 2023
- “How Well Do Large Language Models Perform in Arithmetic Tasks?”, et al 2023
- “GPT-4 Technical Report § Limitations: Calibration”, OpenAI 2023 (page 12 org openai)
- “Salesforce Announces Einstein GPT, the World’s First Generative AI for CRM”, 2023
- “Large Language Models Are State-Of-The-Art Evaluators of Translation Quality”, 2023
- “Not What You’ve Signed up For: Compromising Real-World LLM-Integrated Applications With Indirect Prompt Injection”, et al 2023
- “Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, 2022
- “Janus”
- “Something Weird Is Happening With LLMs and Chess”, 2024
- “Trading Off Compute in Training and Inference”
- “A Basic Test of OpenAI’s Structured Output Feature against Financial Disclosure Reports and a Newspaper’s Police Blotter”
- “Prompt Engineering Techniques With Azure OpenAI”
- “LLM Powered Autonomous Agents”
- “There’s a Running Theme in Here of Programming Problems LLMs Solve Where It’s…”
- “Prompting Diverse Ideas: Increasing AI Idea Variance”
- “OpenAI API § Prompt Caching”
- “Situational Awareness and Out-Of-Context Reasoning § GPT-4-Base Has Non-Zero Longform Performance”, 2024
- “I Finally Got ChatGPT to Sound like Me”, lsusr 2024
- “Connecting the Dots: LLMs Can Infer & Verbalize Latent Structure from Training Data”
- “How Good Are LLMs at Doing ML on an Unknown Dataset?”
- “Language Models Model Us”
- “The Case for More Ambitious Language Model Evals”
- “What Kind of Writer Is ChatGPT?”
- “AI Will Increase the Quantity—And Quality—Of Phishing Scams”
- “Is Finetuning GPT-4o worth It?”
- michael_nielsen
- Sort By Magic
- Miscellaneous
- Bibliography