Back in March, I was approached to contribute to a then-upcoming anthology project to evaluate an early access version of the GPT-4 large language model, and write a short essay about my experiences. Our prompt was to focus on two core questions:
- How might this technology and its successors contribute to human flourishing?
- How might we as society best guide the technology to achieve maximal benefits for humanity?
The anthology is now in the process of being rolled out, with twelve of the twenty essays, including mine, public at this time of writing.
As an experiment, I also asked GPT-4 itself to contribute an essay to the anthology from the same prompts (and playing the role of a research mathematician), then I gave it my own essay (which I wrote independently) and asked it both to rewrite its own essay in the style of my own, or to copyedit my essay into what it deemed to be a better form. I recorded the results of those experiments here; the output was reasonably well written and on topic, but not exceptional in content.
11 comments
Comments feed for this article
19 June, 2023 at 3:36 pm
Savitha
What if generative AI improved, where it can learn a person’s style of thinking, even mathematical. Do you consider that a personal threat?
19 June, 2023 at 5:29 pm
Terence Tao
As I noted at this MathOverflow answer (with a concurrence by Bill Thurston), one of the most intellectually satisfying experiences as a research mathematician is interacting at the blackboard with one or more human co-authors who are exactly on the same wavelength as oneself while working collaboratively on the same problem. I do look forward to the day that I can have a similar conversation with an AI attuned to my way of thinking, or (in the more distant future) talking to an attuned AI version of a human colleague when that human colleague is not available for whatever reason. (Though in the latter case there are some non-trivial issues regarding security, privacy, intellectual property, liability, etc. that would likely need to be resolved first before such public AI avatars could be safely deployed.)
I have experimented with prompting GPT-4 to play the role of precisely such a collaborator on a test problem, with the AI instructed to suggest techniques and directions rather than to directly attempt solve the problem (which the current state-of-the-art LLMs are still quite terrible at). Thus far, the results have been only mildly promising; the AI collaborator certainly serves as an enthusiastic sounding board, and can sometimes suggest relevant references or potential things to try, though in most cases these are references and ideas that I was already aware of and could already evaluate, and were also mixed in with some less relevant citations and strategies. But I could see this style of prompting being useful for a more junior researcher, or someone such as myself exploring an area further from my own area of expertise. And there have been a few times now where this tool has suggested to me a concept that was relevant to the problem in a non-obvious fashion, even if it was not able to coherently state why it was in fact relevant. So while it certainly isn’t at the level of a genuinely competent collaborator yet, it does have potential to evolve into one as the technology improves (and is integrated with further tools, as I describe in my article).
19 June, 2023 at 6:39 pm
mitchellporter
Any thoughts on preventing the AI apocalypse? People worried about superintelligent AI steamrolling the human race as it pursues some goal, regularly say, “if only we could get people like Terry Tao working to make AI safe”.
20 June, 2023 at 8:39 am
Terence Tao
In my opinion, the ordering of risks in the near and medium term will be: “Autonomous AI malfunction” << “Humans using AI incorrectly” < "Socioeconomic disruption caused by AI adoption" < "Beneficial uses of AI shut down due to AI panic" < "Malicious humans" < "Malicious humans assisted by AI". There are already examples for instance of AI being used to improve the quality and targetting of spam emails, phishing scams, and the like, and we should definitely be focusing a significant amount of effort to defending against such threats and building community resilience to AI-generated disinformation, as well as cyberresilience to infrastructure attacks that may well be AI-assisted in the near future. The Presidential Council of Advisors on Science and Technology (PCAST) that I serve on is currently working on both these topics and is seeking public input (see here and here respectively). I believe that efforts such as these will also help build defenses, expertise, and “antibodies” that will be able to manage and contain the future risks of malfunctioning autonomous AI in future decades, and may in fact be the most practical and grounded way to arrive at such workable defenses.
20 June, 2023 at 2:06 am
Lars Ericson
Here are some ChatGPT math experiments:
“What are the 10 most cited theorems in math that are not about category theory use theorems in category theory in their proof?”: https://www.linkedin.com/pulse/what-10-most-cited-theorems-math-category-theory-use-proof-ericson
“What are the axioms of Category Theory? (Google Bard edition)”: https://www.linkedin.com/pulse/what-axioms-category-theory-google-bard-edition-lars-warren-ericson
“What are the axioms of Category Theory? (Chat GPT edition)”: https://www.linkedin.com/pulse/what-axioms-category-theory-chat-gpt-edition-lars-warren-ericson
“How do I get started with Coq and Isabelle?”: https://www.linkedin.com/pulse/how-do-i-get-started-coq-isabelle-lars-warren-ericson
“Google Bard is here to talk about Fermat’s Last Theorem and Lean”: https://www.linkedin.com/pulse/google-bard-here-talk-fermats-last-theorem-lean-lars-warren-ericson
“ChatGPT finally succeeds in writing ZFC in Lean 4, but it wasn’t easy”: https://www.linkedin.com/pulse/chatgpt-finally-succeeds-writing-zfc-lean-4-wasnt-easy-ericson
“Perplexity.AI on ZFC in Lean 4”: https://www.linkedin.com/pulse/perplexityai-zfc-lean-4-lars-warren-ericson
“I ask Bing Chat to write Lean 4 ZFC (Part 1 of ?)”: https://www.linkedin.com/pulse/i-ask-bing-chat-write-lean-4-zfc-part-1-lars-warren-ericson
“Fermat’s Last Theorem”: https://www.linkedin.com/pulse/fermats-last-theorem-lars-warren-ericson
20 June, 2023 at 3:27 am
DSWP
Hi Terry, I am looking forward to reading your essay on this leading edge topic. I am a friend of Trevor, we were in music class together at BHS. I last saw Trevor at Jeff’s farewell concert. I also had a lovely conversation that night with your Mum and Dad. I still catch up with Jeff K every now and then. I look forward to staying in touch with your leading edge information. Thanks for your inspirational insights. Kind regards, D Story
20 June, 2023 at 4:47 am
Arman
Genius🧠👍
20 June, 2023 at 3:10 pm
Yahiry Olivier
Très intéressant… Je valide
20 June, 2023 at 5:13 pm
LAVA
How do you think AI models like GPT-4 could be used to augment peer review processes in academic mathematics, if trained extensively on mathematical literature? Could they potentially spot inconsistencies or errors that human reviewers might miss?
20 June, 2023 at 10:05 pm
Terence Tao
I don’t think that unassisted error checking will be feasible any time soon, but one can already use AI tools (such as ChatPDF) to ask other questions about a given paper, such as a summary of the main results and how they are proved. (See also the discussion on this recent blog post.)
One related direction where some progress is likely to be made in the near future is in using LLMs to semi-automate some aspects of formalizing a mathematical proof in a formal language such as Lean; see this recent talk by Jason Rute for a survey of the current state of the art. There are already some isolated examples in which a research paper is submitted in conjunction with a formally verified version of the proofs, and these new tools may make this practice more common. One could imagine journals offering an expedited refereeing process for such certified submissions in the near future, as the referee is freed to focus on other aspects of the paper such as exposition and impact.
5 August, 2023 at 10:36 pm
AJ
Hey Terry, layperson here but I use AI for work in software development.
Do you have access to a less nerf’d version of ChatGPT? ex: A researcher edition such as the ones Microsoft have access to?
I have been very impressed by the research results as outlined in arXiv papers from folks at Microsoft, but it seems that the publicly released, “consumer grade” edition of GPT-4 is MUCH less powerful than what researchers have had a chance to play with.
I know that GPT-4’s capabilities in planning and complex multi-step (in terms of abstractions) problems are not great. I wonder if embeddings or API plugins to reference arXiv papers could help with certain specific areas of reasoning…
I apologize if any of this is addressed in your essays or paper. I’ll go ahead and try and read now.